Python Glob Module is Used for filename Matching in Python Programming. Python Glob Module can also be used for finding a specific pattern of file and the most important is it can be used to search directories for files that have a specific pattern by using the wildcard characters.
Python Glob module makes it easy to find patterns which will be a difficult job otherwise. You are right, we can use regular expressions to find patterns, but Using the python glob library can help us find it more easily. Module glob in python provides ready-made functions to work with.
In this Python tutorial, you will learn how to use glob module in Python
What is glob in Programming?
Though glob is a python module for searching files in a directory using some pattern, in programming, it is short for global. In programming, glob is used to find filenames using patterns.
glob can be used in any programming language or even in a UNIX terminal. Though the name glob is irrelevant in those cases. Look at the following picture. We have used glob as a command to display the list of python files in a directory with Powershell.
What does Python Glob Module do?
Glob Module Python can be used to search for a file that has a specific name. glob module in python is useful when you are reading from several files that have similar names. You can concatenate these files once found and make a single data frame out of it for further analysis on that file.
Install glob module in Python
glob is a python built-in module and you do not need to install it. But in case it is missing in your python interpreter. you can install it using
pip. use pip install glob2 command in your terminal to install the glob module in python.
There is no glob3 module in Python. It is just glob module that can be installed as glob2.
install python glob module on windows
In windows operating system, use pip install glob2 command to install glob in python3.
pip install glob2
Now you have successfully installed the python glob module. You do not need to import it as glob2. You can use import glob as an import statement.
install python glob module in MacOS
To install glob module in MacOS use pip3 install glob2 command on your terminal. Python glob module can work independently on the operating system you are using.
Make sure you have pip installed before running the following command.
pip3 install glob2
install python glob module in Linux
Installing glob module in Linux is different than windows and MacOs. Use
sudo apt-get install glob2 command on your Linux terminal to install glob in Linux operating system.
sudo apt-get install glob2
Search for a Specific File with Python Glob Module
If we want to look for all HTML files in a directory we can use the Regular Expression (re Module) or we can use Glob. The Regular expression will be hard to write while the glob can just do it with a single line of code.
In the following glob python example will find all files that are HTML files in the current directory.
Code Using Only Regex To find HTML Files
import re import os currentdir = os.getcwd() files = os.listdir(currentdir) pattern = "^*.html$" prog = re.compile(pattern) htmlfiles= for file in files: result = prog.findall(file) if len(result)!=0: htmlfiles.append(result) for htmlfile in htmlfiles: print(htmlfile)
Code Using Glob Module to find Html Files
import glob # path of the current directory path = './' curfiles = glob.glob(path + '**/*.html') for file in curfiles: print(file)
How to recursively search all directories with Glob
You can search all the subdirectories with a single method of the Glob Module in python. It is easy by passing another parameter to the glob method. the second parameter that we can pass is recursive which takes a boolean value either true or false. Below is the sample that will search for all the subdirectories for the HTML files.
import glob # path to search file path = './' # add another param to glog() funciton for file in glob.glob(path+"**/*.html", recursive=True): print(file)
glob.escape() method in Python Glob Module
This method is used to enable the pattern that includes the special characters is well. these special characters include _,#,$, and many others. both the glob and the escape method can be used at the same time for searching filenames that contains special characters. The best way to explain this method will be to go with an example.
import glob files = glob.glob("D:\\**\\*.jpg",recursive=True) # All jpg files print(files) #JPEGs files with special characters in their name # set of special characters _, $, # char_seq = "_$#" for char in char_seq: results = "*" + glob.escape(char) + "*" + ".jpg" for file in (glob.glob(results)): print(file)
Print all files names in a Drive with Python
To print all the files names present in a file we can use the python glob module to do so. Below is the code which will help you with how we use the python glob module with wild cards to print the file names of all the files present in a drive.
import glob print('Inside current directory') files = glob.glob("D:\\**",recursive=True) for item in files: print(item)
Find a String in a list of Text files in Python
Manually writing a regular expression for finding a string in a list of text files will be hard. The best way to find a string in a bunch of text files is to use the Python Glob module.
Follow the following steps to get a string inside of text files.
Step No 1: Get the List of all text files in a directory.
Step No 2: Open and read each file using the read() function in Python.
Step No 3: Check if there is the Required string or not using the contains() Function.
Python code to search for a string inside a list of all text files available in the drive.
import glob # path of the current directory path = './' text_files = glob.glob(path + '**/*.txt',recursive=True) print(text_files) for text_file in text_files: required_str= 'Python' try: with open(text_file) as f: # read the file as a string text_data = f.read() # if the string is find if(required_str in text_data): print(text_file) except: pass
Use glob() function to find files recursively
To find files recusively, set recursive parameter as True, the pattern ** will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.
import glob for f in glob.glob('/path/**/*.c', recursive=True): print(f)
Difference between glob.iglob() and glob.glob() funcitons in python
Both functions perform the same task, except the iglob() function returns an iterator that yields the same values as glob() without actually storing them all simultaneously.
Check the differnce between iglob() and glob() functions in Python
import glob # path of the current directory path = './Rough' htmlfilesusing_glob = glob.glob(path + '/*.html') print(htmlfilesusing_glob) htmlfilesusing_iglob = glob.iglob(path + '/*.html') print(htmlfilesusing_iglob) # output # ['./Rough\\html.html', './Rough\\index.html'] # <generator object _iglob at 0x0000018A0F54A030>
Look at the output of the code. glob() funciton return a list while iglob() function returns a generator.[‘./Rough\\html.html’, ‘./Rough\\index.html’]
<generator object _iglob at 0x0000018A0F54A030>
Glob Module VS Regular Expressions in Python
Though under the hood glob is using regex to make the searching process easy for us. But we can not control the way it searches while with a regular expression we can control who searches for a file. We can specify our own pattern. Though it is easy for basic searches.
Difference between glob2 and glob in Python
They are both the same thing. If you need to use glob you have to install glob2. It is just the name difference.
Summary and Conclusion:-
We have learned how we can traverse the whole drive and match certain files in the directory. there is a lot we can do with the python module Glob. If you are interested in other python tutorials please visit my youtube channel Code with Ali.
I am a software Engineer having 4+ Years of Experience in Building full-stack applications.