Multi indexing is a feature, where we have multi levels of indexing to get better performance while processing the data.
I am using below data set to demonstrate the example.
Year Quarter Sales City 0 2020 1 100 Bangalore 1 2020 2 150 Bangalore 2 2020 3 115 Bangalore 3 2021 1 120 Hyderabad 4 2021 2 180 Hyderabad 5 2021 3 90 Hyderabad 6 2022 1 130 Chennai 7 2022 2 160 Chennai
Set the Year and Quarter columns as indexes
df.set_index(['Year', 'Quarter'], inplace=True)
In this example, we create a DataFrame with four columns: 'Year', 'Quarter', 'Sales' and 'City'. We then set the multi-index by passing a list of column names ['Year', 'Quarter'] to the set_index() method. The inplace=True parameter modifies the DataFrame in place, setting the multi-index.
After setting the index, dataset will be changed like below.
Sales City Year Quarter 2020 1 100 Bangalore 2 150 Bangalore 3 115 Bangalore 2021 1 120 Hyderabad 2 180 Hyderabad 3 90 Hyderabad 2022 1 130 Chennai 2 160 Chennai
As you see above output, the resulting DataFrame has a multi-level index, where 'Year' and 'Quarter' serve as the levels of indexing.
a. 'Year' column is used for Level 0 index
b. 'Quarter' column is used for Level 1 index
Extract the 3rd quarter results for the year 2020
Each row is uniquely identified by its (Year, Quarter) combination.
quarter_3_results_for_2020 = df.loc[(2020, 3)]
Above snippet return all the columns associated with the Year 2020 and Quarter 3.
Find the below working application.
multi_index_hello_world.py
import pandas as pd
# Create a sample DataFrame
data = {'Year': [2020, 2020, 2020, 2021, 2021, 2021, 2022, 2022],
'Quarter': [1, 2, 3, 1, 2, 3, 1, 2],
'Sales': [100, 150, 115, 120, 180, 90, 130, 160],
'City': ['Bangalore', 'Bangalore', 'Bangalore', 'Hyderabad', 'Hyderabad', 'Hyderabad', 'Chennai', 'Chennai']
}
df = pd.DataFrame(data)
print('Original DataFrame\n', df)
# Set Year and Quarter as indexes
df.set_index(['Year', 'Quarter'], inplace=True)
print('\nAfter setting index columns Year and Quarter\n',df)
# Access data using the multi-index
quarter_3_results_for_2020 = df.loc[(2020, 3)]
print('\nquarter_3_results_for_2020\n',quarter_3_results_for_2020)
Output
Original DataFrame Year Quarter Sales City 0 2020 1 100 Bangalore 1 2020 2 150 Bangalore 2 2020 3 115 Bangalore 3 2021 1 120 Hyderabad 4 2021 2 180 Hyderabad 5 2021 3 90 Hyderabad 6 2022 1 130 Chennai 7 2022 2 160 Chennai After setting index columns Year and Quarter Sales City Year Quarter 2020 1 100 Bangalore 2 150 Bangalore 3 115 Bangalore 2021 1 120 Hyderabad 2 180 Hyderabad 3 90 Hyderabad 2022 1 130 Chennai 2 160 Chennai quarter_3_results_for_2020 Sales 115 City Bangalore Name: (2020, 3), dtype: object
No comments:
Post a Comment