DataFrame
index is an unique identifier that is assigned to each row of the DataFrame. It
is similar to a row id in database table, row label in a spreadsheet. 
index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
Output
Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
By default, when a DataFrame is created, pandas assigns a numeric index starting from 0 to each row. You can confirm the same from above output.
Customize the index column using set_index method
Using dataframe set_index method, we can specify the new index column or columns.
Example
new_df = df.set_index(keys=['Name'])
Above snippet replace the standard index column with the column ‘Name’. Above snippet do not modify the existing DataFrame (df).
Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore
To make the changes reflect in current DataFrame, set the argument inplace to True.
df.set_index(keys=['Name'], inplace=True)
How to reset the index?
You can reset the index to standard numeric index by calling reset_index method.
df.reset_index(inplace=True) 
Find the below working application.
customize_index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print('\nSet the Name column as index')
new_df = df.set_index(keys=['Name'])
print('Original DataFrame')
print(df)
print('\nDataFrame where Name is the index')
print(new_df)
df.set_index(keys=['Name'], inplace=True)
print('\nOriginal DataFrame')
print(df)
df.reset_index(inplace=True)
print('\nOriginal DataFrame after resetting the index')
print(df)
Output
Set the Name column as index
Original DataFrame
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
DataFrame where Name is the index
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore
Original DataFrame
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore
Original DataFrame after resetting the index
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
set_index method replace the index
set_index method replace the current index, let’s confirm it with below dataset.Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
df.set_index(keys=['Name'], inplace=True)
Above snippet set the ‘Name’ column as index column.
Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore
Let’s set the index column to ‘Age’.
df.set_index(keys=['Age'], inplace=True)
Once above statement is executed, it replace the old index with Age column. In this case, we are going to lose complete Name column data, as it is replaced by Age column data.
City Age 34 Bangalore 35 Hyderabad 29 Hyderabad 34 Bangalore 29 Hyderabad 34 Bangalore
Find the below working application.
set_index_replacement_demo.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)
df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)
Output
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
DataFrame after setting the index to "Name" column
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore
DataFrame after setting the index to "Age" column
          City
Age           
34   Bangalore
35   Hyderabad
29   Hyderabad
34   Bangalore
29   Hyderabad
34   Bangalore
How to address above replacement problem?
You can address above problem by resetting the index to default standard numeric index before setting new index.
df.set_index(keys=['Name'], inplace=True)
df.reset_index(inplace=True)
df.set_index(keys=['Age'], inplace=True)
Find the below working application.
set_index_replacement_demo.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
        'Age': [34, 35, 29, 34, 29, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)
print('\nResetting index')
df.reset_index(inplace=True)
print(df)
df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)
Output
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
DataFrame after setting the index to "Name" column
         Age       City
Name                   
Krishna   34  Bangalore
Sailu     35  Hyderabad
Ram       29  Hyderabad
Ravi      34  Bangalore
Joel      29  Hyderabad
Joel      34  Bangalore
Resetting index
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2      Ram   29  Hyderabad
3     Ravi   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
DataFrame after setting the index to "Age" column
        Name       City
Age                    
34   Krishna  Bangalore
35     Sailu  Hyderabad
29       Ram  Hyderabad
34      Ravi  Bangalore
29      Joel  Hyderabad
34      Joel  Bangalore 
No comments:
Post a Comment