Remove pandas rows with duplicate indices

Removing pandas rows with duplicate indices can be achieved using the drop_duplicates method in pandas. This method allows you to drop the duplicate rows from a DataFrame based on a specific column or index. Here is how you can use it to remove duplicate rows based on the index:

import pandas as pd

# create a sample DataFrame with duplicate indices
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'b', 'c'])

# remove duplicate rows based on index
df = df[~df.index.duplicated(keep='first')]

print(df)

The output of this code will be:

   A  B
a  1  5
b  2  6
c  4  8

Using the duplicated method

You can also use the duplicated method on the Pandas Index itself to check for duplicate indices. The duplicated method returns a boolean array indicating which indices are duplicated.

See the below example:

import pandas as pd

# create a sample DataFrame with duplicate indices
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'b', 'c'])

# check for duplicate indices
is_duplicated = df.index.duplicated()

print(is_duplicated)
# Output:
# array([False, False,  True, False])

Using reset_index, drop_duplicates, and set_index

Another way to remove duplicate rows based on the index in a Pandas DataFrame is to use the reset_index, drop_duplicates, and set_index methods together. Here’s how it works:

import pandas as pd

# create a sample DataFrame with duplicate indices
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data, index=['a', 'b', 'b', 'c'])

# remove duplicate rows based on index
df = (df.reset_index()
        .drop_duplicates(subset='index', keep='last')
        .set_index('index').sort_index())

print(df)

The output is below:

   A  B
a  1  5
b  2  6
c  4  8

Summary and Conclusion

In this article, we have explained how to Remove pandas rows with duplicate indices. I hope this was meaningful. If you have any questions please leave them in the comment section.

Leave a Comment

Scroll to Top