Monday 26 February 2024

Pandas: Working with multi indexes

Multi indexing is a feature, where we have multi levels of indexing to get better performance while processing the data.

 

I am using below data set to demonstrate the example.

    Year  Quarter  Sales       City
0  2020        1    100  Bangalore
1  2020        2    150  Bangalore
2  2020        3    115  Bangalore
3  2021        1    120  Hyderabad
4  2021        2    180  Hyderabad
5  2021        3     90  Hyderabad
6  2022        1    130    Chennai
7  2022        2    160    Chennai

 

Set the Year and Quarter columns as indexes

df.set_index(['Year', 'Quarter'], inplace=True)

In this example, we create a DataFrame with four columns: 'Year', 'Quarter', 'Sales' and 'City'. We then set the multi-index by passing a list of column names ['Year', 'Quarter'] to the set_index() method. The inplace=True parameter modifies the DataFrame in place, setting the multi-index.

 

After setting the index, dataset will be changed like below.

               Sales       City
Year Quarter                  
2020 1          100  Bangalore
     2          150  Bangalore
     3          115  Bangalore
2021 1          120  Hyderabad
     2          180  Hyderabad
     3           90  Hyderabad
2022 1          130    Chennai
     2          160    Chennai

As you see above output, the resulting DataFrame has a multi-level index, where 'Year' and 'Quarter' serve as the levels of indexing.

a.   'Year' column is used for Level 0 index

b.   'Quarter' column is used for Level 1 index

 

 

Extract the 3rd quarter results for the year 2020

Each row is uniquely identified by its (Year, Quarter) combination.

 

quarter_3_results_for_2020 = df.loc[(2020, 3)]

 

Above snippet return all the columns associated with the Year 2020 and Quarter 3.

 

Find the below working application.

 

multi_index_hello_world.py

import pandas as pd

# Create a sample DataFrame
data = {'Year': [2020, 2020, 2020, 2021, 2021, 2021, 2022, 2022],
        'Quarter': [1, 2, 3, 1, 2, 3, 1, 2],
        'Sales': [100, 150, 115, 120, 180, 90, 130, 160],
        'City': ['Bangalore', 'Bangalore', 'Bangalore', 'Hyderabad', 'Hyderabad', 'Hyderabad', 'Chennai', 'Chennai']
        }
df = pd.DataFrame(data)
print('Original DataFrame\n', df)

# Set Year and Quarter as indexes
df.set_index(['Year', 'Quarter'], inplace=True)
print('\nAfter setting index columns Year and Quarter\n',df)

# Access data using the multi-index
quarter_3_results_for_2020 = df.loc[(2020, 3)]
print('\nquarter_3_results_for_2020\n',quarter_3_results_for_2020)

 

Output

Original DataFrame
    Year  Quarter  Sales       City
0  2020        1    100  Bangalore
1  2020        2    150  Bangalore
2  2020        3    115  Bangalore
3  2021        1    120  Hyderabad
4  2021        2    180  Hyderabad
5  2021        3     90  Hyderabad
6  2022        1    130    Chennai
7  2022        2    160    Chennai

After setting index columns Year and Quarter
               Sales       City
Year Quarter                  
2020 1          100  Bangalore
     2          150  Bangalore
     3          115  Bangalore
2021 1          120  Hyderabad
     2          180  Hyderabad
     3           90  Hyderabad
2022 1          130    Chennai
     2          160    Chennai

quarter_3_results_for_2020
 Sales          115
City     Bangalore
Name: (2020, 3), dtype: object

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment