Python urllib module – A complete Guide to URL handling with Python

Handling URLs is not an easy-to-do task. Python provides urllib module to make the rough and tough task of handling URLs easy for you by providing the urllib module. urllib module in python can be used for all kinds of handling URLs in a python programming language.

The python urllib module provides 5 submodules that can be used for very related jobs. We are going to discuss them one by one with the help of a suitable example.

  1. urllib.request
  2. urllib.response
  3. urllib.parse
  4. urllib.robotparser
  5. urllib.error

We will discuss each of these python modules with a python example and proper details so that you better get the knowledge that you know about these python modules for handling different types of URLs.

urllib.request module in Python With Examples

The python urllib.request is used for opening URLs. Python urllib request is a python built-in module used for making a request, getting a response, and opening ULRs. urllib.request is somehow a low level than the other similar libraries like requests-html library and the other module is requests module which is a high-level interface.

urllib.request python module contains functions and classes which can help us open URLs, do authentication, redirections, cookie, and much more. There is more to discover with the urllib request but my focus is to know what we can do with the urllib.request module in python. I will answer some of the popular questions that are related to the urllib.request python module.

Methods or urllib.request and their examplese:

The best way to learn about a function or a method is to see it working and check the performance of the code. We will use each of them in every detail.

install_opener(opener):

This is a method in the urllib.request module which is used for installing an opener and it is only necessary if you want urlopen to use that opener. You will need to use the build_opener() method in order to return the opener for you.

Why use the install_opener(opener) method in Python urllib?

You might be wondering why should I use an open If I have already the urllopen method for opening a URL. It is used if you want to have an extra handler for the URL or if you want to use the proxy to open an URL below is an example of the install_opener and build_opener() method. but I have not used any kind of handler or opener.

See the following code Example

# using the install_opener method of urllib.request
import urllib.request as req
from urllib.request import build_opener,install_opener
from urllib.request import HTTPSHandler
opener = build_opener(HTTPSHandler())
install_opener(opener)
res = req.urlopen("https://www.alixaprodev.com")
print(res)

urllib.request.urlopen() method Python

It opens the URL which can be a string or a Request object. To send additional data to the server the urllib.request.urlopen() take another argument “data”. by using the data attribute we can use it to make a post request to the server.

urllib.request.urlopen(url)

Open the url and return a response.

How to Open a link with the Python urllib.request module?

The best way to open a link with python is to use the “urllib.request.urlopen()” method. This method takes certain parameters but the required parameter is the only one that is URL.

See the following code Example

def open_link(url):
    import urllib.request
    return urllib.request.urlopen(url)

r =open_link("https://www.alixaprodev.com")
print(r)   

OutPut of the Above Python Code <http.client.HTTPResponse object at 0x0000028E08A0ABE0>

How to download an Image with urllib.request module in Python?

To download an image with python, use the “urllib.request.urlopen()”. There are a few steps you need to follow to download an image with Python. In the following code example, we have downloaded the image then we have cropped it and then save it on the disk.

from urllib.request import urlopen
import os
from PIL import Image
from io import BytesIO
def download_image(image_id, url, x1, y1, x2, y2, output_dir):
    output_filename = os.path.join(output_dir, image_id + '.png')
    if os.path.exists(output_filename):
        # Don't download image if it's already there
        return True
    try:
        # Download image
        url_file = urlopen(url)
        if url_file.getcode() != 200:
            return False
        image_buffer = url_file.read()
        # Crop, resize and save image
        image = Image.open(BytesIO(image_buffer)).convert('RGB')
        w = image.size[0]
        h = image.size[1]
        image = image.crop((int(x1 * w), int(y1 * h), int(x2 * w),
                            int(y2 * h)))
        image = image.resize((299, 299), resample=Image.ANTIALIAS)
        image.save(output_filename)
    except IOError:
        return False
    return True 

How to check the status of a URL with Python?

The status code tells us about the status of the response that we are getting after requesting. TO check the status of a URL we can use the urlopen() method of the urllib python module. The following code example will describe the whole scenario.See the following code Example

from urllib.request import urlopen
from urllib.request import Request
def checkStatus(url):
    print("Attempting to crawl URL: {}".format(url))
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    response = urlopen(req)
    return response.getcode(), url

How to read an Online File with Python?

You can read an online file with the help of urlopen() method of “urllib.request” python module. You have to make a get request to the file. below is the code that can handle the file to read it.

See the following code Example

def read_file_function(self, filepath):
        if filepath.startswith('http://'):
            from urllib.request import urlopen
            s = urlopen(filepath, timeout=5).read().decode('utf-8')
            return re.sub('\r+\n*', '\n', s)

        if self.base_path:
            filepath = os.path.join(self.base_path, filepath)
        filepath = os.path.abspath(filepath)
        view = CompileKspThread.find_view_by_filename(filepath, self.base_path)
        if view is None:
            s = codecs.open(filepath, 'r', 'utf-8').read()
            return re.sub('\r+\n*', '\n', s)
        else:
            return view.substr(sublime.Region(0, view.size())) 

How to Make a Post request with Python

To make a post request use the urllib.request module. It provides a urlopen() method, passes URL and data to the urlopen() method.

See the following code Example

import urllib.request
data = urllib.parse.urlencode({"a_key": "a_value"})
data = data.encode('ascii')

url = "http://httpbin.org/post"
response = urllib.request.urlopen(url, data)

print(response.info())

Converting a file path to URL with Python

To convert a file path to a URL we use the urllib.request.path2url(path) method. We use this method to return an object that is to be treated like a URL. We can do the vice versa by using the url2pathname(URL) method of the urllib.request module.

urllib.response module in Python urllib

The urllib.response module defines functions and classes which define a minimal file-like interface, including read() and readline(). Functions defined by this module are used internally by the urllib.request module.

reading data from the URL

The urllib.response.read() method is used to read the data out of a URL response. The URL response is returned by the request to return the urllib.response object. Look at the following code that tells us all about the urllib.response object returned from a urllib.request object.See the following code Example

from urllib import request
with request.urlopen('https://www.alixaprodev.com') as response:
    print(response.read())
    print(response.status)
    print(response.headers)

The Python urllib.parse module

This module defines a standard interface to break Uniform Resource Locator (URL) strings up into components (addressing scheme, network location, path, etc.), to combine the components back into a URL string, and to convert a “relative URL” to an absolute URL given a “base URL.”See the following code Example

from urllib.parse import urlparse
urlparse("scheme://netloc/path;parameters?query#fragment")

o = urlparse("http://docs.python.org:80/")
print(o)
print(o.scheme)
print(o.netloc)
print(o.hostname)
print(o.port)

Differences between the urllib, urllib2, urllib3 and requests module

urllib module was the original Python HTTP client, added to the standard library in Python 1.2. Earlier documentation for urllib can be found in Python 1.4.

urllib2 module was a more capable HTTP client, added in Python 1.6, intended as a replacement for urllib.

urllib3 module is a third-party package (i.e., not in CPython’s standard library). Despite the name, it is unrelated to the standard library packages, and there is no intention to include it in the standard library in the future.

The Python 3 standard library has a new urllib which is a merged/refactored/rewritten version of the older modules.

Leave a Comment

Scroll to Top