Data Engineer

Data Engineer

As a Data Engineer, I collect, extract and transform raw data in order to provide clean, reliable and usable data.

63 posts
How to set up a FastAPI Project
Academy Membership FastAPIPython

How to set up a FastAPI Project

Introduction FastAPI has quickly gained popularity as a modern, fast and easy-to-use Python web framework for building RESTful APIs. In this tutorial, we show you step-by-step how to set up a FastAPI project. Prerequisites First of all, make sure you have Python installed on your system. Furthermore, it is recommended...

PySpark - Write DataFrame to CSV File

PySpark - Write DataFrame to CSV File

Introduction In this tutorial, we want to write a PySpark DataFrame to a CSV file. In order to do this, we use the csv() method and the format("csv").save() method of PySpark DataFrameWriter. Besides, we use DataFrame.write for creating a DataFrameWriter instance. Import Libraries First, we...

What is a Data Lakehouse?
Academy Membership Data EngineeringDatabricks

What is a Data Lakehouse?

Introduction In this tutorial, we want to explain the characteristics of a Data Lakehouse. In order to do this, we will take a closer look at the key features of Data Lakes and Data Warehouses and how a Data Lakehouse combines the best of both worlds. Definition At its core,...

PySpark - Read CSV File into DataFrame

PySpark - Read CSV File into DataFrame

Introduction In this tutorial, we want to read a CSV file into a PySpark DataFrame. In order to do this, we use the csv() method and the format("csv").load() method of PySpark DataFrameReader. Besides, we use spark.read for creating a DataFrameReader instance. Import Libraries First, we...

PySpark - Explode Arrays into Rows of a DataFrame
Academy Membership PySparkPython

PySpark - Explode Arrays into Rows of a DataFrame

Introduction In this tutorial, we want to explode arrays into rows of a PySpark DataFrame. In order to do this, we use the explode() function and the explode_outer() function of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession from pyspark.sql.functions...

PySpark - Date and Timestamp

PySpark - Date and Timestamp

Introduction In this tutorial, we want to add the current date and the current timestamp to a PySpark DataFrame. In order to do this, we use the current_date() function and the current_timestamp() function of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import...

PySpark - Regular Expressions (Regex)
Academy Membership PySparkPython

PySpark - Regular Expressions (Regex)

Introduction In this tutorial, we want to use regular expressions (regex) to filter, replace and extract strings of a PySpark DataFrame based on specific patterns. In order to do this, we use the rlike() method, the regexp_replace() function and the regexp_extract() function of PySpark. Import Libraries...

You’ve successfully subscribed to Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.