Pandas - Remove Duplicates from a DataFrame

Introduction

In this tutorial, we want to drop duplicates from a Pandas DataFrame. In order to do this, we use the the drop_duplicates() method of Pandas.

Import Libraries

First, we import the following python modules:

import numpy as np
import pandas as pd

Create Pandas DataFrame

Next, we create a Pandas DataFrame with some example data from a dictionary:

mydict = {
    "column1": [3, 7, 7, 8,  np.nan],
    "column2": [11.3, 12.5, 12.5, 0.77, 9.4],
    "column3": ["AI", "Python", "Python", np.nan, "AI"],
}
df = pd.DataFrame(mydict)
df

Remove duplicate Rows

Keep First Occurences

Now, we would like to remove duplicate rows from the DataFrame based on all columns. The first occurrences should be kept.

To do this, we use the drop_duplicates() method of Pandas and set the parameter "keep" to "first":

df_cleaned = df.drop_duplicates(keep='first')
df_cleaned

Keep Last Occurences

If we want to keep the last occurrences, we have to set the parameter "keep" to "last":

df_cleaned = df.drop_duplicates(keep='last')
df_cleaned

Remove duplicate Rows based on a certain Column

Next, we would like to remove duplicate rows from the DataFrame based on the column "language". The first occurrences should be kept.

To do this, we use the drop_duplicates() method of Pandas with the parameters "keep" and "subset":

df_cleaned = df.drop_duplicates(keep='first', subset=['column3'])
df_cleaned

Conclusion

Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to drop duplicates from a Pandas DataFrame. We can simply use the drop_duplicates() method of Pandas. Try it yourself!

Instagram

Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.

Sieh dir diesen Beitrag auf Instagram an

Ein Beitrag geteilt von Deep Learning Nerds | AI, Data Science & Machine Learning (@deeplearningnerds)

Pandas - Remove Duplicates from a DataFrame

Data Scientist

How to install and use Packages in dbt

How to connect Claude with Confluence using an MCP Server

How to Install Claude Desktop on Mac: A Step-by-Step Guide

Introduction

Import Libraries

Create Pandas DataFrame

Remove duplicate Rows

Keep First Occurences

Keep Last Occurences

Remove duplicate Rows based on a certain Column

Conclusion

Instagram

Build your first Python model in dbt: A Step-by-Step Tutorial

Connect FastAPI to PostgreSQL with SQLModel and Pydantic Settings

Manage Settings and Environment Variables in FastAPI

Build a Chatbot with Gradio and the Hugging Face Inference API

Build a LLM Application with FastAPI and Hugging Face Inference API