How to Use Python Glob Module

0
Python Glob Module

Python Glob Module is Used for filename Matching in Python Programming. Python Glob Module can also be used for finding a specific pattern of file and the most important is it can be used to search directories for files that have a specific pattern by using the wildcard characters.

Python Glob module makes it easy to find patterns which will be a difficult job otherwise. You are right, we can use regular expressions to find patterns, but Using the python glob library can help us find it more easily. Module glob in python provides ready-made functions to work with.

In this Python tutorial, you will learn how to use glob module in Python

Table of Content (toc)

What is glob in Programming?

Though glob is a python module for searching files in a directory using some pattern, in programming, it is short for global. In programming, glob is used to find filenames using patterns.

glob can be used in any programming language or even in a UNIX terminal. Though the name glob is irrelevant in those cases. Look at the following picture. We have used glob as a command to display the list of python files in a directory with Powershell.

using dir command to display only py files

What does Python Glob Module do?

Glob Module Python can be used to search for a file that has a specific name. glob module in python is useful when you are reading from several files that have similar names. You can concatenate these files once found and make a single data frame out of it for further analysis on that file.

Install glob module in Python

glob is a python built-in module and you do not need to install it. But in case it is missing in your python interpreter. you can install it using pip. use pip install glob2 command in your terminal to install the glob module in python.

There is no glob3 module in Python. It is just glob module that can be installed as glob2.

install python glob module on windows

In windows operating system, use pip install glob2 command to install glob in python3.

pip install glob2

Now you have successfully installed the python glob module. You do not need to import it as glob2. You can use import glob as an import statement.

import glob

install python glob module in MacOS

To install glob module in MacOS use pip3 install glob2 command on your terminal. Python glob module can work independently on the operating system you are using.

Make sure you have pip installed before running the following command.

pip3 install glob2

install python glob module in Linux

Installing glob module in Linux is different than windows and MacOs. Use sudo apt-get install glob2 command on your Linux terminal to install glob in Linux operating system.

sudo apt-get install glob2

Search for a Specific File with Python Glob Module

If we want to look for all HTML files in a directory we can use the Regular Expression (re Module) or we can use Glob. The Regular expression will be hard to write while the glob can just do it with a single line of code. 

In the following glob python example will find all files that are HTML files in the current directory. 

Code Using Only Regex To find HTML Files



import re
import os
currentdir = os.getcwd()
files = os.listdir(currentdir)
pattern = "^*.html$"
prog = re.compile(pattern)
htmlfiles=[]
for file in files:
    result = prog.findall(file)
    if len(result)!=0:
        htmlfiles.append(result[0])

for htmlfile in htmlfiles:
    print(htmlfile)


Code Using Glob Module to find Html Files


import glob
# path of the current directory
path = './'
curfiles = glob.glob(path + '**/*.html')
for file in curfiles:
    print(file)
    

The output of the Both codes is the same

using python glob module

How to recursively search all directories with Glob

You can search all the subdirectories with a single method of the Glob Module in python. It is easy by passing another parameter to the glob method. the second parameter that we can pass is recursive which takes a boolean value either true or false. Below is the sample that will search for all the subdirectories for the HTML files.



import glob

# path to search file
path = './'
# add another param to glog() funciton
for file in glob.glob(path+"**/*.html", recursive=True):
    print(file)




glob.escape() method in Python Glob Module 

This method is used to enable the pattern that includes the special characters is well. these special characters include _,#,$, and many others. both the glob and the escape method can be used at the same time for searching filenames that contains special characters. The best way to explain this method will be to go with an example.



import glob

files = glob.glob("D:\\**\\*.jpg",recursive=True)
# All jpg files
print(files)

#JPEGs files with special characters in their name
# set of special characters _, $, #
char_seq = "_$#"
for char in char_seq:
    results = "*" + glob.escape(char) + "*" + ".jpg"
    for file in (glob.glob(results)):
        print(file)





Print all files names in a Drive with Python

To print all the files names present in a file we can use the python glob module to do so. Below is the code which will help you with how we use the python glob module with wild cards to print the file names of all the files present in a drive.


import glob

print('Inside current directory')
files = glob.glob("D:\\**",recursive=True)
for item in files:
    print(item)
    


Find a String in a list of Text files in Python

Manually writing a regular expression for finding a string in a list of text files will be hard. The best way to find a string in a bunch of text files is to use the Python Glob module.

Follow the following steps to get a string inside of text files.

Step No 1: Get the List of all text files in a directory.

Step No 2: Open and read each file using the read() function in Python.

Step No 3: Check if there is the Required string or not using the contains() Function.

Python code to search for a string inside a list of all text files available in the drive.

import glob
# path of the current directory
path = './'
text_files = glob.glob(path + '**/*.txt',recursive=True)
print(text_files)


for text_file in text_files:
    required_str= 'Python'
    try:
        with open(text_file) as f:
            # read the file as a string
            text_data = f.read()
            # if the string is find
            if(required_str in text_data):
                print(text_file)
    except:
        pass
     

Difference between glob.iglob() and glob.glob() funcitons in python

Both functions perform the same task, except the iglob() function returns an iterator that yields the same values as glob() without actually storing them all simultaneously.

Check the differnce between iglob() and glob() functions in Python


import glob
# path of the current directory
path = './Rough'
htmlfilesusing_glob = glob.glob(path + '/*.html')
print(htmlfilesusing_glob)


htmlfilesusing_iglob = glob.iglob(path + '/*.html')
print(htmlfilesusing_iglob)

# output
# ['./Rough\\html.html', './Rough\\index.html']
# <generator object _iglob at 0x0000018A0F54A030>
    

Look at the output of the code. glob() funciton return a list while iglob() function returns a generator.

['./Rough\\html.html', './Rough\\index.html']
<generator object _iglob at 0x0000018A0F54A030>

Glob VS Regular Expresson

Though under the hood glob is using regex to make the searching process easy for us. But we can not control the way it searches while with a regular expression we can control who searches for a file. We can specify our own pattern. Though it is easy for basic searches.

Difference between glob2 and glob in Python

They are both the same thing. If you need to use glob you have to install glob2. It is just the name difference.

Summary and Conclusion:-

We have learned how we can traverse the whole drive and match certain files in the directory. there is a lot we can do with the python module Glob. If you are interested in other python tutorials please visit my youtube channel Code with Ali.

Post a Comment

0Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.
Post a Comment (0)

#buttons=(Accept !) #days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !