In this post and coming subsequent posts, I am going to explain the group by operations on DataFrame.
What is Group By operation?
Group By operation is used to group rows based on one or more columns in a dataframe.
Let’s use below data set to demonstrate the examples.
Name Age City Gender 0 Krishna 34 Bangalore Male 1 Chamu 25 Chennai Female 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male 4 Sravya 52 Bangalore Female 5 Raj 23 Chennai Male
Group by city
group_by_city = df.groupby('City')
group_by_city return an object of type ‘DataFrameGroupBy’, where the data is group by ‘City’. You can visualize the data like below.
Group Name: Bangalore Name Age City Gender 0 Krishna 34 Bangalore Male 4 Sravya 52 Bangalore Female Group Name: Chennai Name Age City Gender 1 Chamu 25 Chennai Female 5 Raj 23 Chennai Male Group Name: Hyderabad Name Age City Gender 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male
Group By Gender and City
If you want to group by more than one column, pass the list of columns as an argument to groupby method.
group_by_gender_city = df.groupby(['Gender', 'City'])
You can visualize the data of ‘group_by_gender_city’ like below.
Group Name: ('Female', 'Bangalore') Name Age City Gender 4 Sravya 52 Bangalore Female Group Name: ('Female', 'Chennai') Name Age City Gender 1 Chamu 25 Chennai Female Group Name: ('Male', 'Bangalore') Name Age City Gender 0 Krishna 34 Bangalore Male Group Name: ('Male', 'Chennai') Name Age City Gender 5 Raj 23 Chennai Male Group Name: ('Male', 'Hyderabad') Name Age City Gender 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male
Find the below working application.
hello_world.py
import pandas as pd
# Print the content of DataFrameGroupBy object
def print_group_by_result(group_by_object, label):
print('*'*50)
print(label,'\n')
for group_name, group_data in group_by_object:
print("Group Name:", group_name)
print(group_data)
print()
print('*' * 50)
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Chamu', 'Joel', 'Gopi', 'Sravya', "Raj"],
'Age': [34, 25, 29, 41, 52, 23],
'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai'],
'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)
print(df)
group_by_city = df.groupby('City')
print('\nGroup by city is')
print('type of group_by_city is : ', type(group_by_city))
print_group_by_result(group_by_city, 'Group by city details')
group_by_gender_city = df.groupby(['Gender', 'City'])
print('\nGroup by Gender and City is')
print('type of group_by_gender_city is : ', type(group_by_gender_city))
print_group_by_result(group_by_gender_city, 'Group by Gender and City details')
Output
Name Age City Gender 0 Krishna 34 Bangalore Male 1 Chamu 25 Chennai Female 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male 4 Sravya 52 Bangalore Female 5 Raj 23 Chennai Male Group by city is type of group_by_city is : <class 'pandas.core.groupby.generic.DataFrameGroupBy'> ************************************************** Group by city details Group Name: Bangalore Name Age City Gender 0 Krishna 34 Bangalore Male 4 Sravya 52 Bangalore Female Group Name: Chennai Name Age City Gender 1 Chamu 25 Chennai Female 5 Raj 23 Chennai Male Group Name: Hyderabad Name Age City Gender 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male ************************************************** Group by Gender and City is type of group_by_gender_city is : <class 'pandas.core.groupby.generic.DataFrameGroupBy'> ************************************************** Group by Gender and City details Group Name: ('Female', 'Bangalore') Name Age City Gender 4 Sravya 52 Bangalore Female Group Name: ('Female', 'Chennai') Name Age City Gender 1 Chamu 25 Chennai Female Group Name: ('Male', 'Bangalore') Name Age City Gender 0 Krishna 34 Bangalore Male Group Name: ('Male', 'Chennai') Name Age City Gender 5 Raj 23 Chennai Male Group Name: ('Male', 'Hyderabad') Name Age City Gender 2 Joel 29 Hyderabad Male 3 Gopi 41 Hyderabad Male **************************************************
Previous Next Home
No comments:
Post a Comment