Sometimes you will need to remove some columns from a pandas dataframe. This is the most frequent task that you will be doing in your data science project. In Python programming, the pandas library is widely used for data analysis in python.
In this python tutorial, I will introduce you to all methods that can help you remove a column from dataframe.
There is always more than one way to achieve a result in the python programming language. So, to remove a column from pandas dataframe in python, we will discuss 5 different ways. Apply the method which you like the most.
Possible ways of Deleting a column from a dataframe in python are:
- using drop() function to remove a column from dataframe
- using del keyword to delete a column from pandas dataframe
- using df.pop() function to remove a column from pandas dataframe
- creating a new dataframe from the existing one with required columns
- using df.iloc[] function to remove a column from dataframe
In this tutorial, we will discuss each method along with an example to remove or delete one or more columns from pandas dataframe in python.
Method No 1: Delete a column from dataframe using drop() function
In python, the most recommended way to delete a column from a dataframe, is to use the df.drop() function. You should pass the column name that you want to delete from the dataframe. the df.drop() function will remove that column from the dataframe and return a dataframe. It takes one additional required argument called the axis. In the case of removing a column, you should provide it as 1.
Example of using df.drop() function to remove a column from dataframe
Let’s say we have a CSV file, with the following data. We will convert that CSV file to pandas dataframe. Once we have the dataframe we will then remove the column from that dataframe.
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# remove the 'age' column
new_df=df.drop('Name',1)
print('________new with Name column removed')
print(new_df)
Output of the code:
_______origninal__________
Name "Height(inches)" "Weight(lbs)" "Age"
0 Adam Donachie 74 180 22.99
1 Paul Bako 74 215 34.69
2 Ramon Hernandez 72 210 30.78
3 Kevin Millar 72 210 35.43
4 Chris Gomez 73 188 35.71
5 Brian Roberts 69 176 29.39
6 Miguel Tejada 69 209 30.77
7 Melvin Mora 71 200 35.07
8 Aubrey Huff 76 231 30.19
________new with Name column removed _________
"Height(inches)" "Weight(lbs)" "Age"
0 74 180 22.99
1 74 215 34.69
2 72 210 30.78
3 72 210 35.43
4 73 188 35.71
5 69 176 29.39
6 69 209 30.77
7 71 200 35.07
8 76 231 30.19
You can use the inplace=True which will make sure to change the original dataframe and not create a new dataframe.
see the following coding
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# remove the 'age' column
df.drop('Name',1,inplace=True)
print('________new with Name column removed')
print(df)
Remove a column by its name from a dataframe
To remove a column by its name you should specify the column name and pass it to the pandas dataframe. For example, I want to remove ‘Name’ Column I should use df.drop(‘Name’,1).
Remove a column from dataframe by column number
You can also use the column number to remove the column from a dataframe. To remove a column from a dataframe by column number, pass the number of the column to df.drop() function. Remember, column numbers start from 0, not 1. Use df.drop(df.columns[[0, 1]],1) to remove two columns from the dataframe.
See the following code
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# remove the 'age' column
df.drop(df.columns[[0, 1]],1,inplace=True)
print('________new with Name column removed')
print(df)
Output of the code
________new with Name column removed
"Weight(lbs)" "Age"
0 180 22.99
1 215 34.69
2 210 30.78
3 210 35.43
4 188 35.71
5 176 29.39
6 209 30.77
7 200 35.07
8 231 30.19
Delete multiple columns from a dataframe
To delete multiple columns from a dataframe, you can use either column names or column indices. pass the column indices in df.columns[] as a list and pass it to the df.drop() function. it will delete all the mentioned columns.
See the following code
# delete columns using numbers
df.drop(df.columns[[0, 1]],1,inplace=True)
Delete columns using column names
# delete columns using column names
df.drop(['Age', 'Name'], axis=1, inplace=True)
Method No 2: Delete a column from dataframe using ‘del’ keyword
Another way to delete a column from the pandas dataframe in python is to use the ‘del’ keyword. the del keyword is used to delete a column from the dataframe.
see the following code
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# delete columns using del
del df['Name']
print('________new with Name column removed')
print(df)
Method No 3: Delete a column from a dataframe using df.pop() function
In pandas, the df.pop() function is used to remove a column from the dataframe. To delete a column from a pandas dataframe you can use the df.pop() function. For example to remove the ‘Name column’, you can use df.pop(‘Name’) function.
See the following code
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# delete columns using pop() funciton
df.pop('Name')
print('________new with Name column removed')
print(df)
Output of the code
________new with Name column removed
"Height(inches)" "Weight(lbs)" "Age"
0 74 180 22.99
1 74 215 34.69
2 72 210 30.78
3 72 210 35.43
4 73 188 35.71
5 69 176 29.39
6 69 209 30.77
7 71 200 35.07
8 76 231 30.19
Method No 4: Create new dataframe from existing dataframe with the required columns
Well, this is the classic way of deleting one or more columns from a dataframe. You will have to extract the columns that you want to have in your dataframe and create a new dataframe.
If your original dataframe df is not too big, you have no memory constraints, and you only need to keep a few columns, then you might as well create a new dataframe with only the columns you need:
See the following code
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# delete columns using pop() funciton
new_df = df[['Name']]
print('________new with Name column removed')
print(new_df)
Output of the code
________new with Name column removed
Name
0 Adam Donachie
1 Paul Bako
2 Ramon Hernandez
3 Kevin Millar
4 Chris Gomez
5 Brian Roberts
6 Miguel Tejada
7 Melvin Mora
8 Aubrey Huff
Method No 5: Delete a Column from dataframe using df.iloc[] function
Deleting a column using the df.iloc[] function of dataframe and slicing, when we have a typical column name with unwanted values:
import pandas as pd
df = pd.read_csv('Sample.csv')
print('_______origninal__________')
print(df)
# delete columns using pop() funciton
new_df = df.iloc[:,2:] # Removing two columns
print('________new with Name column removed')
print(new_df)
Here 0 is the default row and 2 is the first column, hence :,2: is our parameter for deleting the two columns.
Output of the code
_______origninal__________
Name "Height(inches)" "Weight(lbs)" "Age"
0 Adam Donachie 74 180 22.99
1 Paul Bako 74 215 34.69
2 Ramon Hernandez 72 210 30.78
3 Kevin Millar 72 210 35.43
4 Chris Gomez 73 188 35.71
5 Brian Roberts 69 176 29.39
6 Miguel Tejada 69 209 30.77
7 Melvin Mora 71 200 35.07
8 Aubrey Huff 76 231 30.19
________new with Name column removed
"Weight(lbs)" "Age"
0 180 22.99
1 215 34.69
2 210 30.78
3 210 35.43
4 188 35.71
5 176 29.39
6 209 30.77
7 200 35.07
8 231 30.19
Summary and Conclusion
This is it for this tutorial. I hope you have sorted out the problem of removing or deleting columns from a dataframe in python, just like adding a new column to pandas dataframe. If you have anything to say please let me know in the comment section.
I am a software Engineer having 4+ Years of Experience in Building full-stack applications.