In this post, I am going to explain how to filter a DataFrame using comparison operators.
I am going to use below data set to demonstrate the examples.
Name Age City Gender Percentage 0 Krishna 34 Bangalore Male 81 1 Sailu 35 Hyderabad Female 76 2 Joel 29 Hyderabad Male 67 3 Chamu 35 Chennai Female 100 4 Jitendra 52 Bangalore Male 87 5 Raj 34 Chennai Male 89
Following comparison operators supported by Pandas to filter the data set.
a. == (equal to)
b. != (not equal to)
c. > (greater than)
d. < (less than)
e. >= (greater than or equal to)
f. <= (less than or equal to)
Extract the rows with index 2 and 5
To extract the rows with index 2 and 5, we need to define a Boolean series, where the values at indexes 2 and 5 are set to True and set to False at remaining places.
0 False 1 False 2 True 3 False 4 False 5 True
my_list = [False, False, True, False, False, True]
my_series = pd.Series(my_list)
Pass the Boolean series to data frame to extract the rows at index 2 and 5. In this case, df[my_series] return the rows at indexes 2 and 5.
Name Age City Gender Percentage 2 Joel 29 Hyderabad Male 67 5 Raj 34 Chennai Male 89
Extract the rows where Gender is Male
male_gender_boolean_series = (df['Gender'] == 'Male')
male_gender_rows = df[male_gender_boolean_series]
Above statement return below rows.
Name Age City Gender Percentage 0 Krishna 34 Bangalore Male 81 2 Joel 29 Hyderabad Male 67 4 Jitendra 52 Bangalore Male 87 5 Raj 34 Chennai Male 89
Extract the rows where the Percentage is > 85
percentage_greater_85_boolean_series = (df['Percentage'] > 85)
percentage_greater_85_rows = df[percentage_greater_85_boolean_series]
Above statement return below rows.
Name Age City Gender Percentage 3 Chamu 35 Chennai Female 100 4 Jitendra 52 Bangalore Male 87 5 Raj 34 Chennai Male 89
filter_by_comparison_operator.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Jitendra', "Raj"],
'Age': [34, 35, 29, 35, 52, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male'],
'Percentage': [81, 76, 67, 100, 87, 89]}
df = pd.DataFrame(data)
print('Original DataFrame')
print(df)
print('\nExtract the rows with index 2 and 5')
my_list = [False, False, True, False, False, True]
my_series = pd.Series(my_list)
print(df[my_series])
print('\nExtract the rows where gender is Male')
male_gender_boolean_series = (df['Gender'] == 'Male')
male_gender_rows = df[male_gender_boolean_series]
print(male_gender_rows)
print('\nExtract the rows where Percentage > 85')
percentage_greater_85_boolean_series = (df['Percentage'] > 85)
percentage_greater_85_rows = df[percentage_greater_85_boolean_series]
print(percentage_greater_85_rows)
Output
Original DataFrame Name Age City Gender Percentage 0 Krishna 34 Bangalore Male 81 1 Sailu 35 Hyderabad Female 76 2 Joel 29 Hyderabad Male 67 3 Chamu 35 Chennai Female 100 4 Jitendra 52 Bangalore Male 87 5 Raj 34 Chennai Male 89 Extract the rows with index 2 and 5 Name Age City Gender Percentage 2 Joel 29 Hyderabad Male 67 5 Raj 34 Chennai Male 89 Extract the rows where gender is Male Name Age City Gender Percentage 0 Krishna 34 Bangalore Male 81 2 Joel 29 Hyderabad Male 67 4 Jitendra 52 Bangalore Male 87 5 Raj 34 Chennai Male 89 Extract the rows where Percentage > 85 Name Age City Gender Percentage 3 Chamu 35 Chennai Female 100 4 Jitendra 52 Bangalore Male 87 5 Raj 34 Chennai Male 89
No comments:
Post a Comment