In this python tutorial, you will see how we can use a python programming language to extract all links from a web page. We can use this python script to extract all the absolute and relative links from a website.
In this tutorial, we have used the beautifulsoup and requests module. You can also do web scraping in python, by using only the requests-html library.
Extract all Links from a Website
To extract all links from a website, use the requests and beautiful soup library in python. requests are used for making a request to the website and beautifulsoup helps us parse the HTML in the webpage.
pip install requests
If you have a problem installing the requests library. you can check out this article, how-to-install-requests-library-in-python
Install Beautifulsoup
pip install bs4
Python code to extract all links from a website
import requests as rq
from bs4 import BeautifulSoup
url = input("Enter Link: ")
if ("https" or "http") in url:
data = rq.get(url)
else:
data = rq.get("https://" + url)
soup = BeautifulSoup(data.text, "html.parser")
links = []
for link in soup.find_all("a"):
links.append(link.get("href"))
# Writing the output to a file (myLinks.txt) instead of to stdout
# You can change 'a' to 'w' to overwrite the file each time
with open("myLinks.txt", 'a') as saved:
print(links[:10], file=saved)
Recommended Python Projects:
Summary and Conclusion:
In this article, we have seen how we can use the requests library to extract all the links from a webpage. If you have any questions, leave them in the comment section.