Using contains() method, we can filter the data frame rows based on given sub string presences.
Example 1: Filter rows when the column value contain given substring.
book_in_hobbies = df['Hobbies'].str.lower().str.contains('book')
persons_hobby_contain_book = df[book_in_hobbies]
We can achieve the case
insensitive search by setting the argument case to False while calling contains
method.
book_in_hobbies = df['Hobbies'].str.contains('book', case=False)
Example 2: Check for substring in multiple columns
contains_c = df[['Name', 'City', 'Hobbies']].apply(lambda x: x.str.contains('c', case=False))
We applied contains() method to multiple columns 'Name', 'City', 'Hobbies' using the apply() method. It returns a DataFrame where each cell contains a Boolean value indicating whether the corresponding element contains the substring 'c' or not.
Find the below working application.
contains.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Jitendra', "Krishna"],
'Age': [34, 35, 234, 35, 52, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
'Hobbies': ['Football,Cricket', 'Tennis, cricket', 'Trekking, reading books', 'Chess', 'Read Books', 'Cricket']}
df = pd.DataFrame(data)
print('Original DataFrame')
print(df)
# Convert the column values to lower case to perform the case insensitive search
book_in_hobbies = df['Hobbies'].str.lower().str.contains('book')
persons_hobby_contain_book = df[book_in_hobbies]
print('\nBoolean series to find the people whose hobby contain the string "book"\n', book_in_hobbies)
print('\nPeople whose hobby contains the string "book"\n',persons_hobby_contain_book)
book_in_hobbies = df['Hobbies'].str.contains('book', case=False)
persons_hobby_contain_book = df[book_in_hobbies]
print('\nBoolean series to find the people whose hobby contain the string "book"\n', book_in_hobbies)
print('\nPeople whose hobby contains the string "book"\n',persons_hobby_contain_book)
# Check for substring in multiple columns
contains_c = df[['Name', 'City', 'Hobbies']].apply(lambda x: x.str.contains('c', case=False))
print('\nRow that contains the character "c" case insensitive : \n', contains_c)
Output
Original DataFrame
Name Age City Hobbies
0 Krishna 34 Bangalore Football,Cricket
1 Sailu 35 Hyderabad Tennis, cricket
2 Joel 234 Hyderabad Trekking, reading books
3 Chamu 35 Chennai Chess
4 Jitendra 52 Bangalore Read Books
5 Krishna 34 Chennai Cricket
Boolean series to find the people whose hobby contain the string "book"
0 False
1 False
2 True
3 False
4 True
5 False
Name: Hobbies, dtype: bool
People whose hobby contains the string "book"
Name Age City Hobbies
2 Joel 234 Hyderabad Trekking, reading books
4 Jitendra 52 Bangalore Read Books
Boolean series to find the people whose hobby contain the string "book"
0 False
1 False
2 True
3 False
4 True
5 False
Name: Hobbies, dtype: bool
People whose hobby contains the string "book"
Name Age City Hobbies
2 Joel 234 Hyderabad Trekking, reading books
4 Jitendra 52 Bangalore Read Books
Row that contains the character "c" case insensitive :
Name City Hobbies
0 False False True
1 False False True
2 False False False
3 True True True
4 False False False
5 False True True
No comments:
Post a Comment