Saturday 13 April 2024

Pandas: Get first row from every group

Using ‘first()’ method of DataFrameGroupBy object, we can retrieve the first row from each group based on the grouping criteria.

 

Example

data = {'Name': ['Krishna', 'Chamu', 'Joel', 'Gopi', 'Sravya', "Raj"],
        'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Male']}

df = pd.DataFrame(data)

group_by_city = df.groupby('City')
first_row_of_each_group = group_by_city.first()

In the example above, I defined a DataFrame ‘df’ with columns "Name", "Age" , "City"  and "Gender". We group the DataFrame by the "City" column using groupby(City) and store the result in ‘group_by_city’ variable.

 

By calling the first() method on the grouped object ‘group_by_city’, we can get the first record or row from every group. The result of first() method is a dataframe, where the index of the DataFrame represents the unique group values (Bangalore, Chennai, Hyderabad).

 

Find the below working application.

 

get_first_row_of_each_group.py

import pandas as pd

# Print the content of DataFrameGroupBy object
def print_group_by_result(group_by_object, label):
    print('*'*50)
    print(label,'\n')
    for group_name, group_data in group_by_object:
        print("Group Name:", group_name)
        print(group_data)
        print()
    print('*' * 50)


# Create a sample DataFrame
data = {'Name': ['Krishna', 'Chamu', 'Joel', 'Gopi', 'Sravya', "Raj"],
        'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Male']}

df = pd.DataFrame(data)
print(df)

group_by_city = df.groupby('City')
print('\nGroup by city is')
print('type of group_by_city is : ', type(group_by_city))
print_group_by_result(group_by_city, 'Group by city details')

first_row_of_each_group = group_by_city.first()
print('\ntype of first_row_of_each_group : ', type(first_row_of_each_group))
print('first of each group are')
print(first_row_of_each_group)

Output

      Name  Age       City  Gender
0  Krishna   34  Bangalore    Male
1    Chamu   25    Chennai  Female
2     Joel   29  Hyderabad    Male
3     Gopi   41  Hyderabad    Male
4   Sravya   52  Bangalore  Female
5      Raj   23    Chennai    Male

Group by city is
type of group_by_city is :  <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
**************************************************
Group by city details 

Group Name: Bangalore
      Name  Age       City  Gender
0  Krishna   34  Bangalore    Male
4   Sravya   52  Bangalore  Female

Group Name: Chennai
    Name  Age     City  Gender
1  Chamu   25  Chennai  Female
5    Raj   23  Chennai    Male

Group Name: Hyderabad
   Name  Age       City Gender
2  Joel   29  Hyderabad   Male
3  Gopi   41  Hyderabad   Male

**************************************************

type of first_row_of_each_group :  <class 'pandas.core.frame.DataFrame'>
first of each group are
              Name  Age  Gender
City                           
Bangalore  Krishna   34    Male
Chennai      Chamu   25  Female
Hyderabad     Joel   29    Male


Previous                                                 Next                                                 Home

No comments:

Post a Comment