DataFrame
index is an unique identifier that is assigned to each row of the DataFrame. It
is similar to a row id in database table, row label in a spreadsheet.
index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
Output
Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
By default, when a DataFrame is created, pandas assigns a numeric index starting from 0 to each row. You can confirm the same from above output.
Customize the index column using set_index method
Using dataframe set_index method, we can specify the new index column or columns.
Example
new_df = df.set_index(keys=['Name'])
Above snippet replace the standard index column with the column ‘Name’. Above snippet do not modify the existing DataFrame (df).
Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore
To make the changes reflect in current DataFrame, set the argument inplace to True.
df.set_index(keys=['Name'], inplace=True)
How to reset the index?
You can reset the index to standard numeric index by calling reset_index method.
df.reset_index(inplace=True)
Find the below working application.
customize_index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print('\nSet the Name column as index')
new_df = df.set_index(keys=['Name'])
print('Original DataFrame')
print(df)
print('\nDataFrame where Name is the index')
print(new_df)
df.set_index(keys=['Name'], inplace=True)
print('\nOriginal DataFrame')
print(df)
df.reset_index(inplace=True)
print('\nOriginal DataFrame after resetting the index')
print(df)
Output
Set the Name column as index Original DataFrame Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore DataFrame where Name is the index Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore Original DataFrame Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore Original DataFrame after resetting the index Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
set_index method replace the index
set_index method replace the current index, let’s confirm it with below dataset.Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
df.set_index(keys=['Name'], inplace=True)
Above snippet set the ‘Name’ column as index column.
Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore
Let’s set the index column to ‘Age’.
df.set_index(keys=['Age'], inplace=True)
Once above statement is executed, it replace the old index with Age column. In this case, we are going to lose complete Name column data, as it is replaced by Age column data.
City Age 34 Bangalore 35 Hyderabad 29 Hyderabad 34 Bangalore 29 Hyderabad 34 Bangalore
Find the below working application.
set_index_replacement_demo.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)
df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)
Output
Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore DataFrame after setting the index to "Name" column Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore DataFrame after setting the index to "Age" column City Age 34 Bangalore 35 Hyderabad 29 Hyderabad 34 Bangalore 29 Hyderabad 34 Bangalore
How to address above replacement problem?
You can address above problem by resetting the index to default standard numeric index before setting new index.
df.set_index(keys=['Name'], inplace=True)
df.reset_index(inplace=True)
df.set_index(keys=['Age'], inplace=True)
Find the below working application.
set_index_replacement_demo.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)
print('\nResetting index')
df.reset_index(inplace=True)
print(df)
df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)
Output
Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore DataFrame after setting the index to "Name" column Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore Resetting index Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore DataFrame after setting the index to "Age" column Name City Age 34 Krishna Bangalore 35 Sailu Hyderabad 29 Ram Hyderabad 34 Ravi Bangalore 29 Joel Hyderabad 34 Joel Bangalore
No comments:
Post a Comment