Wednesday 24 April 2024

Pandas: Apply one or more aggregation functions on groupby data

Using ‘agg’ method of DataFrameGroupBy object, we can apply one or more aggregation functions on groupby data.

I am using below data set to demonstrate the examples.

      Name  Age       City  Gender  Weight
0  Krishna   34  Bangalore    Male      74
1    Chamu   25    Chennai  Female      58
2     Joel   29  Hyderabad    Male      85
3     Gopi   41  Hyderabad    Male      87
4   Sravya   52  Bangalore  Female      63
5      Raj   23    Chennai    Male      79

Let’s aggregate the data by City column prior to applying the aggregation functions.

Group Name: Bangalore
      Name  Age       City  Gender  Weight
0  Krishna   34  Bangalore    Male      74
4   Sravya   52  Bangalore  Female      63

Group Name: Chennai
    Name  Age     City  Gender  Weight
1  Chamu   25  Chennai  Female      58
5    Raj   23  Chennai    Male      79

Group Name: Hyderabad
   Name  Age       City Gender  Weight
2  Joel   29  Hyderabad   Male      85
3  Gopi   41  Hyderabad   Male      87

Example 1: Applying ‘sum’ function on Age column and ‘mean’ function on ‘Weight’ column.

agg_result = group_by_city.agg({
    'Age' : 'sum',
    'Weight' : 'mean'
})

In the above example, to apply multiple aggregation functions to specific columns, I passed a dictionary to the agg() method. The keys of the dictionary represent the column names, and the values can be either a single aggregation function or a list of multiple aggregation functions.

 

‘agg_result’ point to below dataframe.

           Age  Weight
City                  
Bangalore   86    68.5
Chennai     48    68.5
Hyderabad   70    86.0

Example 2: Perform multiple operations on a given column

agg_result = group_by_city.agg({
    'Age' : ['sum', 'size'],
    'Weight' : 'mean'
})

Above snippet apply 'sum', 'size' functions on ‘Age’ column and ‘mean’ function on ‘Weight’ column.

 ‘agg_result’ point to below dataset.


          Age      Weight
          sum size   mean
City                     
Bangalore  86    2   68.5
Chennai    48    2   68.5
Hyderabad  70    2   86.0 

Example 3: If you have only numeric columns in the data frame, and want to apply same set of operations on the columns, then you can pass list of operations to the agg method like below.

group_by_city = df[['Age', 'City', 'Weight']].groupby('City')
agg_result = group_by_city.agg(['sum', 'mean', 'size'])

‘agg_result’ point to below dataset.

          Age            Weight           
          sum  mean size    sum  mean size
City                                      
Bangalore  86  43.0    2    137  68.5    2
Chennai    48  24.0    2    137  68.5    2
Hyderabad  70  35.0    2    172  86.0    2

Above snippet applies sum, mean and size operations on ‘Age’ and ‘Weight’ columns.

 

Find the below working Application.

 

one_or_more_aggregations.py

import pandas as pd

# Print the content of DataFrameGroupBy object
def print_group_by_result(group_by_object, label):
    print('*'*50)
    print(label,'\n')
    for group_name, group_data in group_by_object:
        print("Group Name:", group_name)
        print(group_data)
        print()
    print('*' * 50)


# Create a sample DataFrame
data = { 'Name': ['Krishna', 'Chamu', 'Joel', 'Gopi', 'Sravya', "Raj"],
         'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Male'],
         'Weight': [74, 58, 85, 87, 63, 79]}

df = pd.DataFrame(data)
print(df)

group_by_city = df.groupby('City')
print('\nGroup by city is')
print('type of group_by_city is : ', type(group_by_city))
print_group_by_result(group_by_city, 'Group by city details')

# specify aggregation for each column separately
agg_result = group_by_city.agg({
    'Age' : 'sum',
    'Weight' : 'mean'
})
print('\nSum on Age column and mean on Weight column')
print(agg_result)

# Specify multiple operations on a column
agg_result = group_by_city.agg({
    'Age' : ['sum', 'size'],
    'Weight' : 'mean'
})

print('\nMultiple operations on a column')
print(agg_result)

group_by_city = df[['Age', 'City', 'Weight']].groupby('City')
agg_result = group_by_city.agg(['sum', 'mean', 'size'])
print('\nMultiple operations on a column')
print(agg_result)

Output

      Name  Age       City  Gender  Weight
0  Krishna   34  Bangalore    Male      74
1    Chamu   25    Chennai  Female      58
2     Joel   29  Hyderabad    Male      85
3     Gopi   41  Hyderabad    Male      87
4   Sravya   52  Bangalore  Female      63
5      Raj   23    Chennai    Male      79

Group by city is
type of group_by_city is :  <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
**************************************************
Group by city details 

Group Name: Bangalore
      Name  Age       City  Gender  Weight
0  Krishna   34  Bangalore    Male      74
4   Sravya   52  Bangalore  Female      63

Group Name: Chennai
    Name  Age     City  Gender  Weight
1  Chamu   25  Chennai  Female      58
5    Raj   23  Chennai    Male      79

Group Name: Hyderabad
   Name  Age       City Gender  Weight
2  Joel   29  Hyderabad   Male      85
3  Gopi   41  Hyderabad   Male      87

**************************************************

Sum on Age column and mean on Weight column
           Age  Weight
City                  
Bangalore   86    68.5
Chennai     48    68.5
Hyderabad   70    86.0

Multiple operations on a column
          Age      Weight
          sum size   mean
City                     
Bangalore  86    2   68.5
Chennai    48    2   68.5
Hyderabad  70    2   86.0

Multiple operations on a column
          Age            Weight           
          sum  mean size    sum  mean size
City                                      
Bangalore  86  43.0    2    137  68.5    2
Chennai    48  24.0    2    137  68.5    2
Hyderabad  70  35.0    2    172  86.0    2


 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment