Removing pandas rows with duplicate indices can be achieved using the drop_duplicates
method in pandas. This method allows you to drop the duplicate rows from a DataFrame based on a specific column or index. Here is how you can use it to remove duplicate rows based on the index:
import pandas as pd
# create a sample DataFrame with duplicate indices
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'b', 'c'])
# remove duplicate rows based on index
df = df[~df.index.duplicated(keep='first')]
print(df)
The output of this code will be:
A B
a 1 5
b 2 6
c 4 8
Using the duplicated
method
You can also use the duplicated
method on the Pandas Index itself to check for duplicate indices. The duplicated
method returns a boolean array indicating which indices are duplicated.
See the below example:
import pandas as pd
# create a sample DataFrame with duplicate indices
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'b', 'c'])
# check for duplicate indices
is_duplicated = df.index.duplicated()
print(is_duplicated)
# Output:
# array([False, False, True, False])
Using reset_index, drop_duplicates, and set_index
Another way to remove duplicate rows based on the index in a Pandas DataFrame is to use the reset_index
, drop_duplicates
, and set_index
methods together. Here’s how it works:
import pandas as pd
# create a sample DataFrame with duplicate indices
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'b', 'c'])
# remove duplicate rows based on index
df = (df.reset_index()
.drop_duplicates(subset='index', keep='last')
.set_index('index').sort_index())
print(df)
The output is below:
A B
a 1 5
b 2 6
c 4 8
Summary and Conclusion
In this article, we have explained how to Remove pandas rows with duplicate indices. I hope this was meaningful. If you have any questions please leave them in the comment section.
Nazia is a certified Computer System Engineer working as a Freelancer.