Introduction
In this tutorial, we want to join Pandas DataFrames. In order to do this, we use the the merge() method of Pandas.
Import Libraries
First, we import the following python modules:
import pandas as pd
Create Pandas DataFrames
Next, we create two Pandas DataFrames with some example data from dictionaries:
First, we create the PySpark DataFrame "df_languages":
data = {
"id": [1, 2, 3, 4],
"language": ["Python", "JavaScript", "C++", "Visual Basic"]
}
df_languages = pd.DataFrame(data)
df_languages
Next, we create the PySpark DataFrame "df_frameworks":
data = {
"framework_id": [1, 2, 3, 4, 5, 6],
"framework": ["Spring", "FastAPI", "ReactJS", "Django", "Flask", "AngularJS"],
"language_id": [5, 1, 2, 1, 1, 2]
}
df_frameworks = pd.DataFrame(data)
df_frameworks
Inner Join
Now, we would like to join the two DataFrames over an inner join. The DataFrame "df_languages" has the primary key "id" and the foreign key in the DataFrame "df_frameworks" is "language_id".
To join the DataFrames, we use the merge() method of Pandas. We have to specify the join type and the key columns of both DataFrames. For an inner join, we set the parameter "how" to "inner":
df_joined = df_languages.merge(
df_frameworks,
how="inner",
left_on="id",
right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
df_joined
Full Outer Join
In order to join the two DataFrames over a full outer join, we have to set the parameter "how" to "outer":
df_joined = df_languages.merge(
df_frameworks,
how="outer",
left_on="id",
right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
Left Join
In order to join the two DataFrames over a left join, we have to set the parameter "how" to "left":
df_joined = df_languages.merge(
df_frameworks,
how="left",
left_on="id",
right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
df_joined
Right Join
In order to join the two DataFrames over a right join, we have to set the parameter "how" to "right":
df_joined = df_languages.merge(
df_frameworks,
how="right",
left_on="id",
right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
df_joined
Conclusion
Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to join Pandas DataFrames. We can simply use the merge() method of Pandas. Try it yourself!
Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.