Thursday, 1 February 2024

Pandas: Filter Dataframe rows using contains method

Using contains() method, we can filter the data frame rows based on given sub string presences.

 

Example 1: Filter rows when the column value contain given substring.

book_in_hobbies = df['Hobbies'].str.lower().str.contains('book')
persons_hobby_contain_book = df[book_in_hobbies]

We can achieve the case insensitive search by setting the argument case to False while calling contains method.

 

book_in_hobbies = df['Hobbies'].str.contains('book', case=False)

Example 2: Check for substring in multiple columns

contains_c = df[['Name', 'City', 'Hobbies']].apply(lambda x: x.str.contains('c', case=False))

We applied contains() method to multiple columns 'Name', 'City', 'Hobbies' using the apply() method. It returns a DataFrame where each cell contains a Boolean value indicating whether the corresponding element contains the substring 'c' or not.

 

Find the below working application.

 

contains.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Jitendra', "Krishna"],
        'Age': [34, 35, 234, 35, 52, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
        'Hobbies': ['Football,Cricket', 'Tennis, cricket', 'Trekking, reading books', 'Chess', 'Read Books', 'Cricket']}
df = pd.DataFrame(data)

print('Original DataFrame')
print(df)

# Convert the column values to lower case to perform the case insensitive search
book_in_hobbies = df['Hobbies'].str.lower().str.contains('book')
persons_hobby_contain_book = df[book_in_hobbies]

print('\nBoolean series to find the people whose hobby contain the string "book"\n', book_in_hobbies)
print('\nPeople whose hobby contains the string "book"\n',persons_hobby_contain_book)

book_in_hobbies = df['Hobbies'].str.contains('book', case=False)
persons_hobby_contain_book = df[book_in_hobbies]
print('\nBoolean series to find the people whose hobby contain the string "book"\n', book_in_hobbies)
print('\nPeople whose hobby contains the string "book"\n',persons_hobby_contain_book)

# Check for substring in multiple columns
contains_c = df[['Name', 'City', 'Hobbies']].apply(lambda x: x.str.contains('c', case=False))
print('\nRow that contains the character "c" case insensitive : \n', contains_c)

Output

Original DataFrame
       Name  Age       City                  Hobbies
0   Krishna   34  Bangalore         Football,Cricket
1     Sailu   35  Hyderabad          Tennis, cricket
2      Joel  234  Hyderabad  Trekking, reading books
3     Chamu   35    Chennai                    Chess
4  Jitendra   52  Bangalore               Read Books
5   Krishna   34    Chennai                  Cricket

Boolean series to find the people whose hobby contain the string "book"
 0    False
1    False
2     True
3    False
4     True
5    False
Name: Hobbies, dtype: bool

People whose hobby contains the string "book"
        Name  Age       City                  Hobbies
2      Joel  234  Hyderabad  Trekking, reading books
4  Jitendra   52  Bangalore               Read Books

Boolean series to find the people whose hobby contain the string "book"
 0    False
1    False
2     True
3    False
4     True
5    False
Name: Hobbies, dtype: bool

People whose hobby contains the string "book"
        Name  Age       City                  Hobbies
2      Joel  234  Hyderabad  Trekking, reading books
4  Jitendra   52  Bangalore               Read Books

Row that contains the character "c" case insensitive : 
     Name   City  Hobbies
0  False  False     True
1  False  False     True
2  False  False    False
3   True   True     True
4  False  False    False
5  False   True     True

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment