In this post and coming subsequent posts, I am going to explain the group by operations on DataFrame.
What is Group By operation?
Group By operation is used to group rows based on one or more columns in a dataframe.
Let’s use below data set to demonstrate the examples.
Name Age City Gender 0 Krishna 34 Bangalore Male 1 Chamu 25 Chennai Female 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male 4 Sravya 52 Bangalore Female 5 Raj 23 Chennai Male
Group by city
group_by_city = df.groupby('City')
group_by_city return an object of type ‘DataFrameGroupBy’, where the data is group by ‘City’. You can visualize the data like below.
Group Name: Bangalore
Name Age City Gender
0 Krishna 34 Bangalore Male
4 Sravya 52 Bangalore Female
Group Name: Chennai
Name Age City Gender
1 Chamu 25 Chennai Female
5 Raj 23 Chennai Male
Group Name: Hyderabad
Name Age City Gender
2 Joel 29 Hyderabad Male
3 Gopi 41 Hyderabad Male
Group By Gender and City
If you want to group by more than one column, pass the list of columns as an argument to groupby method.
group_by_gender_city = df.groupby(['Gender', 'City'])
You can visualize the data of ‘group_by_gender_city’ like below.
Group Name: ('Female', 'Bangalore')
Name Age City Gender
4 Sravya 52 Bangalore Female
Group Name: ('Female', 'Chennai')
Name Age City Gender
1 Chamu 25 Chennai Female
Group Name: ('Male', 'Bangalore')
Name Age City Gender
0 Krishna 34 Bangalore Male
Group Name: ('Male', 'Chennai')
Name Age City Gender
5 Raj 23 Chennai Male
Group Name: ('Male', 'Hyderabad')
Name Age City Gender
2 Joel 29 Hyderabad Male
3 Gopi 41 Hyderabad Male
Find the below working application.
hello_world.py
import pandas as pd
# Print the content of DataFrameGroupBy object
def print_group_by_result(group_by_object, label):
print('*'*50)
print(label,'\n')
for group_name, group_data in group_by_object:
print("Group Name:", group_name)
print(group_data)
print()
print('*' * 50)
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Chamu', 'Joel', 'Gopi', 'Sravya', "Raj"],
'Age': [34, 25, 29, 41, 52, 23],
'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai'],
'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)
print(df)
group_by_city = df.groupby('City')
print('\nGroup by city is')
print('type of group_by_city is : ', type(group_by_city))
print_group_by_result(group_by_city, 'Group by city details')
group_by_gender_city = df.groupby(['Gender', 'City'])
print('\nGroup by Gender and City is')
print('type of group_by_gender_city is : ', type(group_by_gender_city))
print_group_by_result(group_by_gender_city, 'Group by Gender and City details')
Output
Name Age City Gender
0 Krishna 34 Bangalore Male
1 Chamu 25 Chennai Female
2 Joel 29 Hyderabad Male
3 Gopi 41 Hyderabad Male
4 Sravya 52 Bangalore Female
5 Raj 23 Chennai Male
Group by city is
type of group_by_city is : <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
**************************************************
Group by city details
Group Name: Bangalore
Name Age City Gender
0 Krishna 34 Bangalore Male
4 Sravya 52 Bangalore Female
Group Name: Chennai
Name Age City Gender
1 Chamu 25 Chennai Female
5 Raj 23 Chennai Male
Group Name: Hyderabad
Name Age City Gender
2 Joel 29 Hyderabad Male
3 Gopi 41 Hyderabad Male
**************************************************
Group by Gender and City is
type of group_by_gender_city is : <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
**************************************************
Group by Gender and City details
Group Name: ('Female', 'Bangalore')
Name Age City Gender
4 Sravya 52 Bangalore Female
Group Name: ('Female', 'Chennai')
Name Age City Gender
1 Chamu 25 Chennai Female
Group Name: ('Male', 'Bangalore')
Name Age City Gender
0 Krishna 34 Bangalore Male
Group Name: ('Male', 'Chennai')
Name Age City Gender
5 Raj 23 Chennai Male
Group Name: ('Male', 'Hyderabad')
Name Age City Gender
2 Joel 29 Hyderabad Male
3 Gopi 41 Hyderabad Male
**************************************************
Previous Next Home
No comments:
Post a Comment