Using contains() method, we can filter the data frame rows based on given sub string presences.
Example 1: Filter rows when the column value contain given substring.
book_in_hobbies = df['Hobbies'].str.lower().str.contains('book')
persons_hobby_contain_book = df[book_in_hobbies]
We can achieve the case
insensitive search by setting the argument case to False while calling contains
method.
book_in_hobbies = df['Hobbies'].str.contains('book', case=False)
Example 2: Check for substring in multiple columns
contains_c = df[['Name', 'City', 'Hobbies']].apply(lambda x: x.str.contains('c', case=False))
We applied contains() method to multiple columns 'Name', 'City', 'Hobbies' using the apply() method. It returns a DataFrame where each cell contains a Boolean value indicating whether the corresponding element contains the substring 'c' or not.
Find the below working application.
contains.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Jitendra', "Krishna"],
'Age': [34, 35, 234, 35, 52, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
'Hobbies': ['Football,Cricket', 'Tennis, cricket', 'Trekking, reading books', 'Chess', 'Read Books', 'Cricket']}
df = pd.DataFrame(data)
print('Original DataFrame')
print(df)
# Convert the column values to lower case to perform the case insensitive search
book_in_hobbies = df['Hobbies'].str.lower().str.contains('book')
persons_hobby_contain_book = df[book_in_hobbies]
print('\nBoolean series to find the people whose hobby contain the string "book"\n', book_in_hobbies)
print('\nPeople whose hobby contains the string "book"\n',persons_hobby_contain_book)
book_in_hobbies = df['Hobbies'].str.contains('book', case=False)
persons_hobby_contain_book = df[book_in_hobbies]
print('\nBoolean series to find the people whose hobby contain the string "book"\n', book_in_hobbies)
print('\nPeople whose hobby contains the string "book"\n',persons_hobby_contain_book)
# Check for substring in multiple columns
contains_c = df[['Name', 'City', 'Hobbies']].apply(lambda x: x.str.contains('c', case=False))
print('\nRow that contains the character "c" case insensitive : \n', contains_c)
Output
Original DataFrame Name Age City Hobbies 0 Krishna 34 Bangalore Football,Cricket 1 Sailu 35 Hyderabad Tennis, cricket 2 Joel 234 Hyderabad Trekking, reading books 3 Chamu 35 Chennai Chess 4 Jitendra 52 Bangalore Read Books 5 Krishna 34 Chennai Cricket Boolean series to find the people whose hobby contain the string "book" 0 False 1 False 2 True 3 False 4 True 5 False Name: Hobbies, dtype: bool People whose hobby contains the string "book" Name Age City Hobbies 2 Joel 234 Hyderabad Trekking, reading books 4 Jitendra 52 Bangalore Read Books Boolean series to find the people whose hobby contain the string "book" 0 False 1 False 2 True 3 False 4 True 5 False Name: Hobbies, dtype: bool People whose hobby contains the string "book" Name Age City Hobbies 2 Joel 234 Hyderabad Trekking, reading books 4 Jitendra 52 Bangalore Read Books Row that contains the character "c" case insensitive : Name City Hobbies 0 False False True 1 False False True 2 False False False 3 True True True 4 False False False 5 False True True
No comments:
Post a Comment