Introduction
In this tutorial, we want to concatenate multiple Pandas DataFrames. In order to do this, we use the the concat() function of Pandas.
Import Libraries
First, we import the following python modules:
import pandas as pd
Create Pandas DataFrames
We create two Pandas DataFrames with some example data from a dictionaries.
First, we create the PySpark DataFrame "df1":
data = {
"language": ["Python", "JavaScript"],
"framework": ["FastAPI", "ReactJS"],
"users": [9000, 7000]
}
df1 = pd.DataFrame(data)
df1
Next, we create the Pandas DataFrame "df2". The DataFrame has exactly the same schema like DataFrame "df1":
data = {
"language": ["Python", "Python", "Java"],
"framework": ["FastAPI", "Django", "Spring"],
"users": [9000, 20000, 12000]
}
df2 = pd.DataFrame(data)
df2
Concatenate DataFrames
Now, we would like to concatenate the DataFrames "df1" and "df2".
To do this, we use the concat() function of Pandas:
df_merged = pd.concat([df1, df2], ignore_index=True)
df_merged
Concatenate DataFrames without Duplicates
Next, we would like to concatenate the DataFrames "df1" and "df2" without duplicates.
To do this, we use the concat() function in combination with the drop_duplicates() method of Pandas:
df_merged = pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True)
df_merged
Conclusion
Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to concatenate multiple Pandas DataFrames. We can simply use the concat() function of Pandas. Try it yourself!