Programming for beginners: Pandas: set and reset dataframe index

DataFrame index is an unique identifier that is assigned to each row of the DataFrame. It is similar to a row id in database table, row label in a spreadsheet.

index.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}

df = pd.DataFrame(data)
print(df)

Output

      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

By default, when a DataFrame is created, pandas assigns a numeric index starting from 0 to each row. You can confirm the same from above output.

Customize the index column using set_index method

Using dataframe set_index method, we can specify the new index column or columns.

Example

new_df = df.set_index(keys=['Name'])

Above snippet replace the standard index column with the column ‘Name’. Above snippet do not modify the existing DataFrame (df).

         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore

To make the changes reflect in current DataFrame, set the argument inplace to True.

df.set_index(keys=['Name'], inplace=True)

How to reset the index?

You can reset the index to standard numeric index by calling reset_index method.

df.reset_index(inplace=True)

Find the below working application.

customize_index.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}

df = pd.DataFrame(data)
print('\nSet the Name column as index')

new_df = df.set_index(keys=['Name'])

print('Original DataFrame')
print(df)

print('\nDataFrame where Name is the index')
print(new_df)

df.set_index(keys=['Name'], inplace=True)
print('\nOriginal DataFrame')
print(df)

df.reset_index(inplace=True)
print('\nOriginal DataFrame after resetting the index')
print(df)

Output

Set the Name column as index
Original DataFrame
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

DataFrame where Name is the index
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore

Original DataFrame
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore

Original DataFrame after resetting the index
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

set_index method replace the index

set_index method replace the current index, let’s confirm it with below dataset.

      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

df.set_index(keys=['Name'], inplace=True)

Above snippet set the ‘Name’ column as index column.

         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore

Let’s set the index column to ‘Age’.

df.set_index(keys=['Age'], inplace=True)

Once above statement is executed, it replace the old index with Age column. In this case, we are going to lose complete Name column data, as it is replaced by Age column data.

          City
Age           
34   Bangalore
35   Hyderabad
29   Hyderabad
34   Bangalore
29   Hyderabad
34   Bangalore

Find the below working application.

set_index_replacement_demo.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}

df = pd.DataFrame(data)
print(df)

df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)

df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)

Output

      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

DataFrame after setting the index to "Name" column
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore

DataFrame after setting the index to "Age" column
          City
Age           
34   Bangalore
35   Hyderabad
29   Hyderabad
34   Bangalore
29   Hyderabad
34   Bangalore

How to address above replacement problem?

You can address above problem by resetting the index to default standard numeric index before setting new index.

df.set_index(keys=['Name'], inplace=True)
df.reset_index(inplace=True)
df.set_index(keys=['Age'], inplace=True)

Find the below working application.

set_index_replacement_demo.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}

df = pd.DataFrame(data)
print(df)

df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)

print('\nResetting index')
df.reset_index(inplace=True)
print(df)

df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)

Output

      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

DataFrame after setting the index to "Name" column
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore

Resetting index
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore

DataFrame after setting the index to "Age" column
        Name       City
Age                    
34   Krishna  Bangalore
35     Sailu  Hyderabad
29       Ram  Hyderabad
34      Ravi  Bangalore
29      Joel  Hyderabad
34      Joel  Bangalore

Previous Next Home

Programming for beginners

Monday, 27 November 2023

Pandas: set and reset dataframe index

No comments:

Post a Comment