How to Use Requests-HTML Library in Python

Requests-HTML is a Python Library that is specially created to make HTML Parsing as much easy as possible. Sometimes we need to Parse HTML to get our Required Data from the Webpage that we are scrapping. So in these scenarios, Requests-HTML will be a good candidate to choose for this task.

How to install Requests-HTML?

In order to use Requests-HTML, we first have to install it. For the installation, we can use the pip. the following command will help us to install Requests-Html. 
Note: Python 3.6 or greater Version is needed for the installation of this Library.

pip install requests-html

Making a get Request with Requests-HTML?

The following code is used to make a get request to the website. 
in our case, we are making a get request to the blooger.com website. This will return a response with the response code. in our case, it is <Response [200]> means that we have successfully make a get request. 

# We have to import HTML Session
from requests_html import HTMLSession
# create the Object of the HTMLSession
session = HTMLSession()

#call the get Method of the HTMLSession class
request = session.get('https://blogger.com')

Getting the contents of the response from the get method?

The content can be extracted by using the contents attribute on the response object that is returned from the get method. 
the response will be printed as a binary string. 
the below code can be used to print the content of the response object.

from requests_html import HTMLSession
session = HTMLSession()
request = session.get('https://blogger.com')
data = request.content
print(data)

Find the mode of the request from the response?

if we need to find the mood[get, post] of the response that we have made. For example, we need to know that which type of request is made to the website, in which this type of content is returned to us. we can do it as follows. 

data = request.request
print(data)

Find the status code of the response by using requests-html?

The status code can give us information about the request. if the response code is 200, it means this is a good response. we can get the response code by calling the status_code attribute on a Response object.

data = request.status_code
print(data)

Find the header information by using requests-HTML?

The header of the response contains all information about the response. We can get all the information of the header by just using the header attribute on the response object. 

data = request.headers
print(data)

TThe output of this code is like below. Though it will be different in your case. 

Getting all links of a webpage by using requests-HTML?

to get all the available anchor tag (links) we can use the html.link attribute. it will return a set of all links that are out there in the response that we have got from the request to the specific site.

from requests_html import HTMLSession
session = HTMLSession()
request = session.get('https://google.com/')
data = request.html.links
print(type(data))

How to Parse HTML from a Local File with Requests-Html?

We can read the Html from a file and then we have to parse it with the Requests-HTML. we can do it as follows.

from requests_html import HTML
with open("htmlfile.html") as htmlfile:
    sourcecode = htmlfile.read()
    parsedHtml = HTML(html=sourcecode)
print(parsedHtml)

Find the title tag in Html Page by using Requests-Html?

We can find any tag by using the find method. For example incase we want to find the title tag we can do it as follows.

from requests_html import HTML
with open("htmlfile.html") as htmlfile:
    sourcecode = htmlfile.read()
    parsedHtml = HTML(html=sourcecode)
print(parsedHtml.find("title"))

This will print a list of all the title tag out there in the HTML. the type of each will be an element of course. so we can select any element from the list by using the index of the list.

Get the Javascript generated contents with requests-html

To get the Javascript-generated data we have to wait until the website is fully loaded. Requests-Html gives us the ability to do it. we can call the render method and then it will wait for the Javascript. Though it will take some time, but it will do the job.

from requests_html import HTML
with open("htmlfile.html") as htmlfile:
    sourcecode = htmlfile.read()
    parsedHtml = HTML(html=sourcecode)
    parsedHtml.render()

Post a Comment

0 Comments