Deep Learning Nerds Academy
The ultimate Learning Platform for AI, Data Science, Data Analytics and Data Engineering.

What’s new?
PySpark coalesce() Function Explained
Academy Membership PySparkPython

PySpark coalesce() Function Explained

📘 Introduction In many real-world datasets, the same type of information can appear in more than one column. A customer may provide an email address, a phone number, or a backup contact, and different systems may populate different fields. When you want to select the first available non-null value from several...

Premium posts Previous posts
Kafka Producers and Consumers Explained: How Data Flows in Apache Kafka
Academy Membership Kafka

Kafka Producers and Consumers Explained: How Data Flows in Apache Kafka

📘 Introduction Modern systems produce endless streams of real-time data — from app events and online purchases to sensor readings and transactions. To handle this flow efficiently, applications need a fast, reliable way to move data between services as it happens. That’s where Apache Kafka comes in. Kafka is a distributed...

How to Ingest Data from Kafka Streams to Delta Tables Using PySpark in Databricks
Academy Membership DatabricksPySpark

How to Ingest Data from Kafka Streams to Delta Tables Using PySpark in Databricks

📘 Introduction Real-time data ingestion is a critical part of modern data architectures. Organizations need to process and store continuous streams of information for analytics, monitoring, and machine learning. Databricks, with the combined power of PySpark and Delta Lake, provides an efficient way to build end-to-end streaming pipelines that handle data...

Overview of all important YAML Files in dbt
Academy Membership dbtData Engineering

Overview of all important YAML Files in dbt

📘Introduction When working with dbt (data build tool), YAML files are the backbone of your project’s configuration. They define how dbt behaves, how your models connect to data sources, and how metadata, documentation, and tests are managed. Understanding these YAML files and knowing where they are located within your...

How to Generate a Hash from Multiple Columns in PySpark
Academy Membership PySparkData Engineering

How to Generate a Hash from Multiple Columns in PySpark

📘 Introduction When processing massive datasets in PySpark, it’s often necessary to uniquely identify rows or efficiently detect changes across records. Using multiple columns as a composite key can quickly become cumbersome and inefficient — especially during joins or deduplication. A better solution is to generate a single hash value derived...

You’ve successfully subscribed to Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.