Premium

A list of premium posts for academy members containing hands-on tutorials, best practices, career advices and learning paths.

118 premium posts
PySpark - Convert Column from String to Date Format
Academy Membership PySparkPython

PySpark - Convert Column from String to Date Format

Introduction In data processing, it's common to find date fields as strings. Converting these string representations into proper date formats is crucial for accurate data analysis and processing. In this tutorial, we will explore how to convert a string to a date column in a PySpark DataFrame. Import...

dbt Core vs dbt Cloud
Academy Membership dbtData Engineering

dbt Core vs dbt Cloud

Introduction dbt comes in two versions - dbt Core and dbt Cloud. While both provide the core functionality for data transformation, they serve different purposes and are suited to different requirements. In this tutorial, we’ll dive into the features of dbt Core and dbt Cloud, highlighting the key differences...

PySpark - Extract a Substring from a DataFrame Column
Academy Membership PySparkPython

PySpark - Extract a Substring from a DataFrame Column

Introduction When dealing with large datasets in PySpark, it's common to encounter situations where you need to manipulate string data within your DataFrame columns. One such common operation is extracting a portion of a string—also known as a substring—from a column. In this tutorial, we will...

How to create Azure Data Lake Storage (ADLS) in Microsoft Azure: A Step-by-Step Guide
Academy Membership Azure

How to create Azure Data Lake Storage (ADLS) in Microsoft Azure: A Step-by-Step Guide

Introduction Azure Data Lake Storage (ADLS) is a powerful cloud-based data storage solution designed to handle large-scale analytics, accommodating both structured and unstructured data. Its flexible architecture and integration with Azure services make it an ideal platform for organizations looking to leverage data for advanced analytics and business intelligence. In...

PySpark - Count Rows and Columns of a DataFrame
Academy Membership PySparkPython

PySpark - Count Rows and Columns of a DataFrame

Introduction In data processing and analysis with PySpark, it's often important to know the structure of your data, such as the number of rows and columns in a DataFrame. This is crucial for various operations, including data validation, transformations, and general exploration. In this tutorial, we'll...

How to Containerize a Gradio App with Docker: A Step-by-Step Guide
Academy Membership GradioDocker

How to Containerize a Gradio App with Docker: A Step-by-Step Guide

Introduction Building and deploying machine learning apps can sometimes be a complex task, but using Gradio, a Python library for quickly building interactive web apps for your ML models, simplifies it significantly. To make your app easily deployable across different environments, containerizing it using Docker is a perfect solution. In...

PySpark - Count Distinct Values of a DataFrame Column
Academy Membership PySparkPython

PySpark - Count Distinct Values of a DataFrame Column

Introduction In this tutorial, we want to count the distinct values of a PySpark DataFrame column. In order to do this, we use the distinct().count() method and the  countDistinct() function of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession from pyspark.sql....

PySpark - How to create and use Broadcast Variables
Academy Membership PySparkPython

PySpark - How to create and use Broadcast Variables

Introduction In distributed computing environments like Apache Spark, efficient data handling is critical for performance. One useful feature for optimizing computations is broadcast variables. Broadcast variables allow you to share large read-only data across all nodes in a Spark cluster without duplicating the data for each task. In this tutorial,...

How to load a Sample Dataset for Real-Time Intelligence into an existing KQL Database in Microsoft Fabric

How to load a Sample Dataset for Real-Time Intelligence into an existing KQL Database in Microsoft Fabric

Introduction Microsoft Fabric offers a gallery of sample datasets that can be used to perform and practice Real-Time Analytics. These sample data sets offer a great opportunity to gain experience and familiarize yourself with the technologies and services in the Real-Time Intelligence Experience. In this tutorial we will show you...

How to install Python on Mac
Academy Membership Python

How to install Python on Mac

Introduction Installing Python is a straightforward process that gives you access to one of the most powerful programming languages available today. Whether you're running macOS, Windows or Linux, Python can be easily installed through official channels so you can start programming quickly. This tutorial will show you how...

Power BI - CONCATENATE function in DAX
Academy Membership Power BIData Analytics

Power BI - CONCATENATE function in DAX

Introduction Power BI comes with the powerful formula language Data Analysis Expression (DAX) which allows the implementation of custom calculations. There are numerous operators and functions available in DAX. One important DAX function is the CONCATENATE function. With the CONCATENATE function, two text values can be merged into one text...

Power BI - Import Data from JSON file
Academy Membership Power BIData Analytics

Power BI - Import Data from JSON file

Introduction The first step when creating a Power BI report is to connect with data sources. Power BI can connect to a wide range of data sources. This capability allows users to access and analyze data from various sources within Power BI. One important format to often deal with is...

Power BI - CALCULATE Function in DAX
Academy Membership Power BIData Analytics

Power BI - CALCULATE Function in DAX

Introduction Power BI comes with the powerful formula language Data Analysis Expression (DAX) which allows the implementation of custom calculations. There are numerous operators and functions available in DAX. One of the most important DAX functions is the CALCULATE function. The CALCULATE function enables dynamic aggregations based on specific criteria....

PySpark - Create Embedding Vectors with Sentence-Transformers
Academy Membership PySparkPython

PySpark - Create Embedding Vectors with Sentence-Transformers

Introduction In today's data-driven world, understanding text data is crucial across various domains, from data analysis to engineering and architecture. However, dealing with text data often requires converting it into numerical representations for machine learning models to process efficiently. This is where embedding vectors come into play, offering...

PySpark - Concatenate String Columns of a DataFrame
Academy Membership PySparkPython

PySpark - Concatenate String Columns of a DataFrame

Introduction In this tutorial, we will show you how to concatenate multiple string columns of a PySpark DataFrame into a single column. In order to do this, we will use the functions concat() and concat_ws() of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql...

How to group Data using a Dataflow in Microsoft Fabric
Academy Membership Microsoft FabricAzure

How to group Data using a Dataflow in Microsoft Fabric

Introduction One fundamental part of Microsoft Fabric is transforming data. Whether filtering, joining, merging or grouping data, there are several options available in Fabric to perform these operations. In this tutorial, we will explain step-by-step how to group data and apply an aggregation function using a dataflow. Goal A delta...

PySpark - Group and Concatenate Strings in a DataFrame
Academy Membership PySparkPython

PySpark - Group and Concatenate Strings in a DataFrame

Introduction In this tutorial, we will show you how to group and concatenate strings in a PySpark DataFrame. In order to do this, we will use the groupBy() method in combination with the functions concat_ws(), collect_list() and array_distinct() of PySpark. Import Libraries First, we import the following...

PySpark - How to use Pandas User Defined Function (UDF)
Academy Membership PySparkPython

PySpark - How to use Pandas User Defined Function (UDF)

Introduction In the realm of big data processing, PySpark has emerged as a powerful tool for handling large-scale datasets. Its distributed computing framework allows for efficient processing of massive volumes of data. However, despite its capabilities, performing certain data transformations in PySpark can sometimes be cumbersome and complex. That'...

Power BI - Add Index Column in Power Query
Academy Membership Power BIData Analytics

Power BI - Add Index Column in Power Query

Introduction Power BI offers with the Power Query Editor a powerful tool for cleaning and transforming data. One important part of data preparation is adding an Index Column. For organizing and structuring your data it is crucial that every row is uniquely identified by an ID. An Index Column enables...

Type Hints in Python: A Guide for Beginners
Academy Membership Python

Type Hints in Python: A Guide for Beginners

Introduction As projects grow in size and complexity, it becomes increasingly important to ensure that code remains understandable and easy to work with. One powerful tool for achieving this is the use of type hints. In this tutorial, we will explain why and how to use type hints in Python....

Power BI - Import Data from XML file
Academy Membership Power BIData Analytics

Power BI - Import Data from XML file

Introduction The first step when creating a Power BI report is to connect with data sources. Power BI can connect to a wide range of data sources. This capability allows users to access and analyze data from various sources within Power BI. One important format to often deal with is...

Understanding the maths behind Long Short-Term Memory (LSTM) Networks: What happens inside an LSTM cell?
Academy Membership Deep Learning

Understanding the maths behind Long Short-Term Memory (LSTM) Networks: What happens inside an LSTM cell?

Introduction Traditional RNNs, limited by their simplistic structure, have problems retaining information over longer time periods, leading to the infamous vanishing gradient problem. Long Short-Term Memory (LSTM) Networks have the impressive ability to capture and preserve long-term dependencies in sequential data. But how is an LSTM able to do this?...

How to use Environment Variables in Python
Academy Membership Python

How to use Environment Variables in Python

Introduction Environment variables are used for securely storing and accessing sensitive data, facilitating seamless configuration management across different environments. In this tutorial, we will explore how to work with environment variables in Python. In order to do this, we will use the Python libraries os and python-dotenv. What is an...

Power BI - Custom Filtering in Power Query
Academy Membership Power BIData Analytics

Power BI - Custom Filtering in Power Query

Introduction Power BI offers with the Power Query Editor a powerful tool for cleaning and transforming data. One important part of data preparation is filtering your data. Filtering enables you to sort out irrelevant data and to reduce the amount of data. One important type of filtering is custom filtering....

PySpark - Window Functions
Academy Membership PythonPySpark

PySpark - Window Functions

Introduction Window functions in PySpark are a powerful feature for data manipulation and analysis. They allow you to perform complex calculations on subsets of data within a DataFrame, without the need for expensive joins or subqueries. In this tutorial, we will show you how to use window functions in PySpark....

Performance Metrics for Classification in Machine Learning: Understanding Accuracy, Precision, Recall and F1 Score
Academy Membership Machine Learning

Performance Metrics for Classification in Machine Learning: Understanding Accuracy, Precision, Recall and F1 Score

Introduction In Machine Learning, one essential step is evaluating the performance of a model. For classification models, the Confusion Matrix serves as a fundamental instrument for evaluating the performance. The Confusion Matrix provides a visualization of the results of a model. Based on the information from the Confusion Matrix, some...

How to containerize a FastAPI Application with Docker
Academy Membership FastAPIDocker

How to containerize a FastAPI Application with Docker

Introduction FastAPI, a high-performance Python web framework, coupled with Docker, a powerful containerization tool, can significantly boost the efficiency of your development workflow. In this blog post, we'll walk you through the process of setting up a FastAPI project using a Dockerfile, providing a flexible and scalable solution...

Confusion Matrix in Machine Learning: A Hands-On Explanation
Academy Membership Machine Learning

Confusion Matrix in Machine Learning: A Hands-On Explanation

Introduction In Machine Learning, one essential step is evaluating the performance of a model. For classification models, the Confusion Matrix serves as a fundamental instrument for evaluating the performance. It provides a clear and visual summary of the prediction accuracy of a model by illustrating the correspondence between the predicted...

PySpark - Add an ID Column to a DataFrame
Academy Membership PythonPySpark

PySpark - Add an ID Column to a DataFrame

Introduction One common task when working with large datasets is the need to generate unique identifiers for each record. In this tutorial, we will explore how to easily add an ID column to a PySpark DataFrame. In order to do this, we use the monotonically_increasing_id() function of PySpark....

A Beginner's Guide to Docker: Get Started with Containerization
Academy Membership Docker

A Beginner's Guide to Docker: Get Started with Containerization

Introduction In the fast-paced world of software development, efficiency and consistency are key. Docker, a powerful containerization platform, has revolutionized the way we build, ship, and run applications. In this tutorial, we show you how to get started with Docker. Step 1: Install Docker (if not already installed) First of...

Get started with PostgreSQL on Mac: A Step-by-Step Guide
Academy Membership PostgreSQL

Get started with PostgreSQL on Mac: A Step-by-Step Guide

Introduction PostgreSQL is one of the most widely used database management systems. One of the easiest ways to use PostgreSQL on macOS is the Postgres.app. Postgres.app provides a simple interface for setting up a server and a command-line interface (psql) for interacting with databases via the terminal. In...

Structured vs. Semi-structured vs. Unstructured Data
Academy Membership DataData Engineering

Structured vs. Semi-structured vs. Unstructured Data

Introduction Data comes in different forms, each with its own characteristics and challenges. Basically, there are three main categories of data: Structured, Semi-structured and Unstructured Data. In this tutorial, we explore the characteristics and some examples for each kind of data. Structured Data First, let's have a look...

How to set up a FastAPI Project
Academy Membership FastAPIPython

How to set up a FastAPI Project

Introduction FastAPI has quickly gained popularity as a modern, fast and easy-to-use Python web framework for building RESTful APIs. In this tutorial, we show you step-by-step how to set up a FastAPI project. Prerequisites First of all, make sure you have Python installed on your system. Furthermore, it is recommended...

How to use Power BI on Mac
Academy Membership Power BI

How to use Power BI on Mac

Introduction Power BI is one of the most widely used BI tools. But using Power BI on the Mac can be a challenge. This is because Microsoft does not offer a version of Power BI Desktop for the Mac. Nevertheless, there are workarounds for using Power BI on the Mac....

What is a Data Lakehouse?
Academy Membership Data EngineeringDatabricks

What is a Data Lakehouse?

Introduction In this tutorial, we want to explain the characteristics of a Data Lakehouse. In order to do this, we will take a closer look at the key features of Data Lakes and Data Warehouses and how a Data Lakehouse combines the best of both worlds. Definition At its core,...

PySpark - Explode Arrays into Rows of a DataFrame
Academy Membership PySparkPython

PySpark - Explode Arrays into Rows of a DataFrame

Introduction In this tutorial, we want to explode arrays into rows of a PySpark DataFrame. In order to do this, we use the explode() function and the explode_outer() function of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession from pyspark.sql.functions...

Generative AI - Why now?

Generative AI - Why now?

Introduction In this tutorial, we want to explain why Generative AI (GenAI) is possible now. In order to do this, we describe the key factors that are responsible for the rise of Generative AI. Factors The key factors that enable Generative AI are availability of large datasets, computational power, and...

Deep Learning - How the McCulloch-Pitts Neuron works
Academy Membership Deep Learning

Deep Learning - How the McCulloch-Pitts Neuron works

Introduction In this tutorial we will cover the very first and the simplest mathematical neuron model in history - the McCulloch-Pitts Neuron. We look at the architecture and functionality. History The McCulloch-Pitts-Neuron is the simplest form of a neuron model and was published in 1943 by Warren McCulloch and Walter...

Keras - One-Hot Encoding
Academy Membership KerasPython

Keras - One-Hot Encoding

Introduction In this tutorial, we want to one-hot encode a NumPy array that contains categorical values. In order to do this, we use the to_categorical() function of Keras. Import Libraries First, we import the following python modules: import numpy as np from keras.utils import to_categorical Define Data...

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning

Introduction Machine Learning can be divided into two main types: Supervised Learning and Unsupervised Learning. In this tutorial, we want to take a closer look to these approaches and compare them to each other. Overview Both supervised learning and unsupervised learning have their own characteristics and are suitable for solving...

Python - Import Stock Prices from Yahoo Finance
Academy Membership Python

Python - Import Stock Prices from Yahoo Finance

Introduction In this tutorial, we want to import Stock Prices from Yahoo Finance into Python. In order to do this, we use ticker module of YFinance. YFinance The python library yfinance enables access to financial data from Yahoo Finance. Yahoo Finance provides various financial market data such as stock...

What is Generative AI?

What is Generative AI?

Introduction In this tutorial, we want to explain Generative AI (GenAI). In order to do this, we describe both the terms Artificial Intelligence, Machine Learning, Deep Learning and Generative AI as well as the relationships between them. Overview First, we want to have a look to the relationships between Artificial...

PySpark - Regular Expressions (Regex)
Academy Membership PySparkPython

PySpark - Regular Expressions (Regex)

Introduction In this tutorial, we want to use regular expressions (regex) to filter, replace and extract strings of a PySpark DataFrame based on specific patterns. In order to do this, we use the rlike() method, the regexp_replace() function and the regexp_extract() function of PySpark. Import Libraries...

PySpark - User Defined Function (UDF)
Academy Membership PySparkPython

PySpark - User Defined Function (UDF)

Introduction In this tutorial, we want to create a UDF and apply it to a PySpark DataFrame. In order to do this, we will show you two different ways: using the udf() function and using the @udf decorator. Import Libraries First, we import the following python modules: from pyspark.sql...

PySpark - Aggregate Functions
Academy Membership PySparkPython

PySpark - Aggregate Functions

Introduction In this tutorial, we want to make aggregate operations on columns of a PySpark DataFrame. In order to do this, we use different aggregate functions of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession from pyspark.sql.functions import * Create SparkSession Before...

PySpark - Concatenate DataFrames
Academy Membership PySparkPython

PySpark - Concatenate DataFrames

Introduction In this tutorial, we want to concatenate multiple PySpark DataFrames. In order to do this, we use the the union() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with Pyspark, we need to create...

PySpark - Join DataFrames
Academy Membership PySparkPython

PySpark - Join DataFrames

Introduction In this tutorial, we want to join PySpark DataFrames. In order to do this, we use the the join() method of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql import SparkSession Create SparkSession Before we can work with Pyspark, we need to create a...

You’ve successfully subscribed to Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.