Introduction
In this tutorial, we want to rename a PySpark DataFrame column. In order to do this, we use the the withColumnRenamed() method of PySpark.
Import Libraries
First, we import the following python modules:
from pyspark.sql import SparkSession
Create SparkSession
Before we can work with Pyspark, we need to create a SparkSession. A SparkSession is the entry point into all functionalities of Spark.
In order to create a basic SparkSession programmatically, we use the following command:
spark = SparkSession \
.builder \
.appName("Python PySpark Example") \
.getOrCreate()
Create PySpark DataFrame
Next, we create a PySpark DataFrame with some example data from a list. To do this, we use the method createDataFrame() and pass the data and the column names as arguments:
column_names = ["language", "framework", "users"]
data = [
("Python", "Django", 20000),
("Python", "FastAPI", 9000),
("Java", "Spring", 7000),
("JavaScript", "ReactJS", 5000)
]
df = spark.createDataFrame(data, column_names)
df.show()
Rename a Single Column
Now, we would like to rename the column "language" into "column_1".
To do this, we use the withColumnRenamed() method of PySpark and pass the existing column name and the new column name as arguments:
df = df.withColumnRenamed("language", "column 1")
df.show()
Rename Multiple Columns
Now, we would like to rename the column "framework" into "column 2" and the column "users" into "column 3".
To do this, we use the withColumnRenamed() method of PySpark multiple times and pass the existing column name and the new column name as arguments:
df = df.withColumnRenamed("framework", "column 2") \
.withColumnRenamed("users", "column 3")
df.show()
Conclusion
Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to rename a PySpark DataFrame column. We can simply use the withColumnRenamed() method of PySpark. Try it yourself!
Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.