Requests-HTML is a Python Library that is specially created to make HTML Parsing as much easy as possible. Sometimes we need to Parse HTML to get our Required Data from the Webpage that we are scrapping. So in these scenarios, Requests-HTML will be a good candidate to choose for this task.
How to install Requests-HTML?
In order to use Requests-HTML, we first have to install it. For the installation, we can use the pip. the following command will help us to install Requests-Html. Note: Python 3.6 or greater Version is needed for the installation of this Library.
pip install requests-html
Making a get Request with Requests-HTML?
The following code is used to make a get request to the website. in our case, we are making a get request to the blooger.com website. This will return a response with the response code. in our case, it is Response  means that we have successfully make a get request.
# We have to import HTML Session from requests_html import HTMLSession # create the Object of the HTMLSession session = HTMLSession() #call the get Method of the HTMLSession class request = session.get('https://blogger.com')
Getting the contents of the response from the get method?
The content can be extracted by using the contents attribute on the response object that is returned from the get method.
the response will be printed as a binary string.
the below code can be used to print the content of the response object.
from requests_html import HTMLSession session = HTMLSession() request = session.get('https://blogger.com') data = request.content print(data)
Find the mode of the request from the response?
if we need to find the mood[get, post] of the response that we have made. For example, we need to know that which type of request is made to the website, in which this type of content is returned to us. we can do it as follows.
data = request.request print(data)
Find the status code of the response by using requests-html?
The status code can give us information about the request. if the response code is 200, it means this is a good response. we can get the response code by calling the status_code attribute on a Response object.
data = request.status_code print(data)
Find the header information by using requests-HTML?
The header of the response contains all information about the response. We can get all the information of the header by just using the header attribute on the response object.
data = request.headers print(data)
Getting all links of a webpage by using requests-HTML?
to get all the available anchor tag (links) we can use the html.link attribute. it will return a set of all links that are out there in the response that we have got from the request to the specific site.
from requests_html import HTMLSession session = HTMLSession() request = session.get('https://google.com/') data = request.html.links print(type(data))
How to Parse HTML from a Local File with Requests-Html?
We can read the Html from a file and then we have to parse it with the Requests-HTML. we can do it as follows.
from requests_html import HTML with open("htmlfile.html") as htmlfile: sourcecode = htmlfile.read() parsedHtml = HTML(html=sourcecode) print(parsedHtml)
Find the title tag in Html Page by using Requests-Html?
We can find any tag by using the find method. For example incase we want to find the title tag we can do it as follows.
from requests_html import HTML with open("htmlfile.html") as htmlfile: sourcecode = htmlfile.read() parsedHtml = HTML(html=sourcecode) print(parsedHtml.find("title"))
This will print a list of all the title tag out there in the HTML. the type of each will be an element of course. so we can select any element from the list by using the index of the list.
from requests_html import HTML with open("htmlfile.html") as htmlfile: sourcecode = htmlfile.read() parsedHtml = HTML(html=sourcecode) parsedHtml.render()