Data Engineer

Data Engineer

As a Data Engineer, I collect, extract and transform raw data in order to provide clean, reliable and usable data.

63 posts
PySpark - User Defined Function (UDF)
Academy Membership PySparkPython

PySpark - User Defined Function (UDF)

Introduction In this tutorial, we want to create a UDF and apply it to a PySpark DataFrame. In order to do this, we will show you two different ways: using the udf() function and using the @udf decorator. Import Libraries First, we import the following python modules: from pyspark.sql...

PySpark - Aggregate Functions
Academy Membership PySparkPython

PySpark - Aggregate Functions

Introduction In this tutorial, we want to make aggregate operations on columns of a PySpark DataFrame. In order to do this, we use different aggregate functions of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession from pyspark.sql.functions import * Create SparkSession Before...

PySpark - Concatenate DataFrames
Academy Membership PySparkPython

PySpark - Concatenate DataFrames

Introduction In this tutorial, we want to concatenate multiple PySpark DataFrames. In order to do this, we use the the union() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with Pyspark, we need to create...

PySpark - Join DataFrames
Academy Membership PySparkPython

PySpark - Join DataFrames

Introduction In this tutorial, we want to join PySpark DataFrames. In order to do this, we use the the join() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with Pyspark, we need to create a...

PySpark - Replace Null Values in a DataFrame

PySpark - Replace Null Values in a DataFrame

Introduction In this tutorial, we want to replace null values in a PySpark DataFrame. In order to do this, we use the the fillna() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession from pyspark.sql.functions import mean Create SparkSession Before...

PySpark - Remove Null Values from a DataFrame

PySpark - Remove Null Values from a DataFrame

Introduction In this tutorial, we want to drop rows with null values from a PySpark DataFrame. In order to do this, we use the the dropna() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with...

PySpark - Remove Duplicates from a DataFrame

PySpark - Remove Duplicates from a DataFrame

Introduction In this tutorial, we want to drop duplicates from a PySpark DataFrame. In order to do this, we use the the dropDuplicates() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with Pyspark, we need...

PySpark - Filter Rows from a DataFrame

PySpark - Filter Rows from a DataFrame

Introduction In this tutorial, we want to filter specific rows from a PySpark DataFrame based on specific conditions. In order to do this, we use the the filter() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can...

PySpark - Sort a DataFrame

PySpark - Sort a DataFrame

Introduction In this tutorial, we want to sort a PySpark DataFrame by specific columns. In order to do this, we use the the orderBy() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with Pyspark, we...

You’ve successfully subscribed to Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.