Introduction
In this tutorial, we want to use regular expressions (regex) to filter, replace and extract strings of a Pandas DataFrame based on specific patterns. In order to do this, we use the str.contains() method, the str.replace() method and the str.extract() method of Pandas.
Import Libraries
First, we import the following python modules:
import pandas as pd
Create Pandas DataFrame
Next, we create a Pandas DataFrame with some example data from a dictionary:
data = {
"language": ["Python", "JavaScript", "Python", "Java"],
"framework": ["FastAPI 0.92.0", "ReactJS 18.0", "Django 4.1", "Spring Boot 3.1"],
"users": [9000, 7000, 20000, 12000]
}
df = pd.DataFrame(data)
df
Filter Data
We would like to filter data of the DataFrame based on a certain string pattern.
In this example, we want to select all rows, where the value of the column "language" starts with "Py".
To do this, we use the str.contains() method of Pandas and set the parameter "regex" to True:
pattern = r"^Py"
df_new = df[df["language"].str.contains(pattern, regex=True)]
df_new
Replace Data
We would like to replace data of the DataFrame based on a certain string pattern.
In this example, we want to replace the version numbers in the values of column "framework" with an empty string.
To do this, we use the str.replace() method of Pandas and set the parameter "regex" to True:
pattern = r"\s*(\d+(\.\d+){0,2})"
df["framework"] = df["framework"].str.replace(pattern, "", regex=True)
df
Extract Data
We would like to extract data of the DataFrame based on a certain string pattern.
In this example, we want to extract the version numbers in the values of column "framework" and and write them into a new column with the name "version".
To do this, we use the str.extract() method of Pandas:
pattern = r"(\d+(\.\d+){0,2})"
df['version'] = df['framework'].str.extract(pattern)[0]
df
Conclusion
Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to use regular expressions (regex) to filter, replace and extract strings of a Pandas DataFrame based on specific patterns. We can simply use the str.contains() method, the str.replace() method and the str.extract() method of Pandas. Try it yourself!