Data Engineering

Data Engineering

57 posts
Create a View in a Warehouse in Microsoft Fabric
Academy Membership Microsoft FabricDP-600

Create a View in a Warehouse in Microsoft Fabric

Introduction An essential feature of Microsoft Fabric is the ability to create views in a data warehouse, enabling data transformation without physical duplication. Views simplify and aggregate data, improving accessibility and efficiency for analysis. For the DP-600 certification exam, knowing how to create and manage views is crucial. This task...

PySpark - Convert Column Data Types of a DataFrame
Academy Membership PySparkDP-600

PySpark - Convert Column Data Types of a DataFrame

Introduction When working with PySpark DataFrames, handling different data types correctly is essential for data preprocessing. Mismatched or incorrect data types can lead to errors in Spark operations such as filtering, aggregations, and machine learning workflows. In this tutorial, we’ll explore how to convert column data types in a...

Choose between a Lakehouse, Warehouse or Eventhouse in Microsoft Fabric (DP-600)
Academy Membership Microsoft FabricDP-600

Choose between a Lakehouse, Warehouse or Eventhouse in Microsoft Fabric (DP-600)

Introduction Microsoft Fabric offers multiple storage and processing solutions for different analytical needs, including lakehouses, warehouses, and eventhouses. For the DP-600 certification exam, understanding when to use each option is crucial for designing efficient data architectures. In this tutorial, you'll learn how to differentiate between lakehouses, warehouses, and...

PySpark - Replace Empty Strings with Null Values
Academy Membership PySparkPython

PySpark - Replace Empty Strings with Null Values

Introduction When working with PySpark DataFrames, handling missing or empty values is a common task in data preprocessing. In many cases, empty strings ("") should be treated as null values for better compatibility with Spark operations, such as filtering, aggregations, and machine learning workflows. In this tutorial, we’ll...

PySpark - Split a Column into Multiple Columns
Academy Membership PySparkPython

PySpark - Split a Column into Multiple Columns

Introduction When working with data in PySpark, you might often encounter scenarios where a single column contains multiple pieces of information, such as a combination of names, categories, or attributes. In such cases, it is essential to split these values into separate columns for better data organization and analysis. In...

PySpark - Parse a Column of JSON Strings
Academy Membership PySparkPython

PySpark - Parse a Column of JSON Strings

Introduction Parsing JSON strings with PySpark is an essential task when working with large datasets in JSON format. By transforming JSON data into a structured format, you can enable efficient processing and analysis. PySpark provides a powerful way to parse these JSON strings and extract their contents into separate columns,...

You’ve successfully subscribed to Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.