Thursday, 4 April 2024

Pandas: Sort a multi-indexed DataFrame

Using sort_index method, we can sort a multi indexed data frame.

Let’s experiment with the below data set.

    Year  Quarter  Sales       City
0  2022        1    100  Bangalore
1  2022        3    150  Bangalore
2  2020        2    115  Bangalore
3  2021        3    120  Hyderabad
4  2021        1    180  Hyderabad
5  2021        2     90  Hyderabad
6  2020        1    130    Chennai
7  2022        2    160    Chennai

 

Set the Year and Quarter as multi index columns

df.set_index(['Year', 'Quarter'], inplace=True)

Above snippet convert the data set like below.

               Sales       City
Year Quarter                  
2022 1          100  Bangalore
     3          150  Bangalore
2020 2          115  Bangalore
2021 3          120  Hyderabad
     1          180  Hyderabad
     2           90  Hyderabad
2020 1          130    Chennai
2022 2          160    Chennai

Sort the data by ascending order of index columns

new_df = df.sort_index()

 

new_df points to below data set.

               Sales       City
Year Quarter                  
2020 1          130    Chennai
     2          115  Bangalore
2021 1          180  Hyderabad
     2           90  Hyderabad
     3          120  Hyderabad
2022 1          100  Bangalore
     2          160    Chennai
     3          150  Bangalore

 

Sort the data by descending order of index columns

By setting the ascending argument to False, we can sort the data by ascending order of index columns.

new_df = df.sort_index(ascending=False)

 ‘new_df’ point to below data set.

               Sales       City
Year Quarter                  
2022 3          150  Bangalore
     2          160    Chennai
     1          100  Bangalore
2021 3          120  Hyderabad
     2           90  Hyderabad
     1          180  Hyderabad
2020 2          115  Bangalore
     1          130    Chennai

 

Sort the data by ascending order of Year and descending order of Quarter

By passing the list of Booleans to the ascending argument, we can achieve this.

new_df = df.sort_index(ascending=[True,False])

 

List of booleans that we passed to ascending argument are in the order of the multi index columns. In this case, True mapped to Year column and False mapped to the ‘Quarter’ column.

 

‘new_df’ point to below data set.

               Sales       City
Year Quarter                  
2020 2          115  Bangalore
     1          130    Chennai
2021 3          120  Hyderabad
     2           90  Hyderabad
     1          180  Hyderabad
2022 3          150  Bangalore
     2          160    Chennai
1	100  Bangalore

 

Sort the data by descending order of Year and ascending order of Quarter

By passing the list of Booleans to the ascending argument, we can achieve this.

 

new_df = df.sort_index(ascending=[False,True])

 

‘new_df’ point to below data set.

               Sales       City
Year Quarter                  
2022 1          100  Bangalore
     2          160    Chennai
     3          150  Bangalore
2021 1          180  Hyderabad
     2           90  Hyderabad
     3          120  Hyderabad
2020 1          130    Chennai
     2          115  Bangalore

 

Find the below working application.

 

sort_multi_index_data.py

import pandas as pd

# Create a sample DataFrame
data = {'Year': [2022, 2022, 2020, 2021, 2021, 2021, 2020, 2022],
        'Quarter': [1, 3, 2, 3, 1, 2, 1, 2],
        'Sales': [100, 150, 115, 120, 180, 90, 130, 160],
        'City': ['Bangalore', 'Bangalore', 'Bangalore', 'Hyderabad', 'Hyderabad', 'Hyderabad', 'Chennai', 'Chennai']
        }
df = pd.DataFrame(data)
print('Original DataFrame\n', df)

# Set Year and Quarter as indexes
df.set_index(['Year', 'Quarter'], inplace=True)
print('\nAfter setting index columns Year and Quarter\n',df)

# Sort the data by ascending order of the Year and Quarter columns
new_df = df.sort_index()
print('\nSort the data set by ascending order of index columns\n',new_df)

# Sort the data by descending order of the Year and Quarter columns
new_df = df.sort_index(ascending=False)
print('\nSort the data set by descending order of index columns\n',new_df)

# Sort the data by ascending order of the Year and descending order Quarter columns
new_df = df.sort_index(ascending=[True,False])
print('\nSort the data by ascending order of the Year and descending order Quarter columns\n',new_df)

# Sort the data by descending order of the Year and ascending order Quarter columns
new_df = df.sort_index(ascending=[False, True])
print('\nSort the data by descending order of the Year and ascending order Quarter columns\n',new_df)

 

Output

Original DataFrame
    Year  Quarter  Sales       City
0  2022        1    100  Bangalore
1  2022        3    150  Bangalore
2  2020        2    115  Bangalore
3  2021        3    120  Hyderabad
4  2021        1    180  Hyderabad
5  2021        2     90  Hyderabad
6  2020        1    130    Chennai
7  2022        2    160    Chennai

After setting index columns Year and Quarter
               Sales       City
Year Quarter                  
2022 1          100  Bangalore
     3          150  Bangalore
2020 2          115  Bangalore
2021 3          120  Hyderabad
     1          180  Hyderabad
     2           90  Hyderabad
2020 1          130    Chennai
2022 2          160    Chennai

Sort the data set by ascending order of index columns
               Sales       City
Year Quarter                  
2020 1          130    Chennai
     2          115  Bangalore
2021 1          180  Hyderabad
     2           90  Hyderabad
     3          120  Hyderabad
2022 1          100  Bangalore
     2          160    Chennai
     3          150  Bangalore

Sort the data set by descending order of index columns
               Sales       City
Year Quarter                  
2022 3          150  Bangalore
     2          160    Chennai
     1          100  Bangalore
2021 3          120  Hyderabad
     2           90  Hyderabad
     1          180  Hyderabad
2020 2          115  Bangalore
     1          130    Chennai

Sort the data by ascending order of the Year and descending order Quarter columns
               Sales       City
Year Quarter                  
2020 2          115  Bangalore
     1          130    Chennai
2021 3          120  Hyderabad
     2           90  Hyderabad
     1          180  Hyderabad
2022 3          150  Bangalore
     2          160    Chennai
     1          100  Bangalore

Sort the data by descending order of the Year and ascending order Quarter columns
               Sales       City
Year Quarter                  
2022 1          100  Bangalore
     2          160    Chennai
     3          150  Bangalore
2021 1          180  Hyderabad
     2           90  Hyderabad
     3          120  Hyderabad
2020 1          130    Chennai
     2          115  Bangalore

 

Sort at given index level

By specifying ‘level’ argument, we can sort the data frame at specific index level.

 

Example 1: Sort the data set by ascending order of Year column.

 

We can specify the index level to sort by specify the level index or level label.

new_df = df.sort_index(level=0)
new_df = df.sort_index(level='Year')

 

Example 2: Sort the data by descending order of the Quarter.

new_df = df.sort_index(level=1, ascending=False)
new_df = df.sort_index(level='Quarter', ascending=False)

 

Find the below working application.

 

sort_multi_index_specific_level.py

import pandas as pd

# Create a sample DataFrame
data = {'Year': [2022, 2022, 2020, 2021, 2021, 2021, 2020, 2022],
        'Quarter': [1, 3, 2, 3, 1, 2, 1, 2],
        'Sales': [100, 150, 115, 120, 180, 90, 130, 160],
        'City': ['Bangalore', 'Bangalore', 'Bangalore', 'Hyderabad', 'Hyderabad', 'Hyderabad', 'Chennai', 'Chennai']
        }
df = pd.DataFrame(data)
print('Original DataFrame\n', df)

# Set Year and Quarter as indexes
df.set_index(['Year', 'Quarter'], inplace=True)
print('\nAfter setting index columns Year and Quarter\n',df)

# Sort the data set by ascending order of Year column
new_df = df.sort_index(level=0)
print('\nSort the data set by ascending order of Year column\n',new_df)

new_df = df.sort_index(level='Year')
print('\nSort the data set by ascending order of Year column\n',new_df)

# Sort the data by descending order of the Quarter
new_df = df.sort_index(level=1, ascending=False)
print('\nSort the data by descending order of the Quarter\n',new_df)

new_df = df.sort_index(level='Quarter', ascending=False)
print('\nSort the data by descending order of the Quarter\n',new_df)

 

Output

Original DataFrame
    Year  Quarter  Sales       City
0  2022        1    100  Bangalore
1  2022        3    150  Bangalore
2  2020        2    115  Bangalore
3  2021        3    120  Hyderabad
4  2021        1    180  Hyderabad
5  2021        2     90  Hyderabad
6  2020        1    130    Chennai
7  2022        2    160    Chennai

After setting index columns Year and Quarter
               Sales       City
Year Quarter                  
2022 1          100  Bangalore
     3          150  Bangalore
2020 2          115  Bangalore
2021 3          120  Hyderabad
     1          180  Hyderabad
     2           90  Hyderabad
2020 1          130    Chennai
2022 2          160    Chennai

Sort the data set by ascending order of Year column
               Sales       City
Year Quarter                  
2020 1          130    Chennai
     2          115  Bangalore
2021 1          180  Hyderabad
     2           90  Hyderabad
     3          120  Hyderabad
2022 1          100  Bangalore
     2          160    Chennai
     3          150  Bangalore

Sort the data set by ascending order of Year column
               Sales       City
Year Quarter                  
2020 1          130    Chennai
     2          115  Bangalore
2021 1          180  Hyderabad
     2           90  Hyderabad
     3          120  Hyderabad
2022 1          100  Bangalore
     2          160    Chennai
     3          150  Bangalore

Sort the data by descending order of the Quarter
               Sales       City
Year Quarter                  
2022 3          150  Bangalore
2021 3          120  Hyderabad
2022 2          160    Chennai
2021 2           90  Hyderabad
2020 2          115  Bangalore
2022 1          100  Bangalore
2021 1          180  Hyderabad
2020 1          130    Chennai

Sort the data by descending order of the Quarter
               Sales       City
Year Quarter                  
2022 3          150  Bangalore
2021 3          120  Hyderabad
2022 2          160    Chennai
2021 2           90  Hyderabad
2020 2          115  Bangalore
2022 1          100  Bangalore
2021 1          180  Hyderabad
2020 1          130    Chennai

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment