DataFrame
index is an unique identifier that is assigned to each row of the DataFrame. It
is similar to a row id in database table, row label in a spreadsheet.
index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
Output
Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
By default, when a DataFrame is created, pandas assigns a numeric index starting from 0 to each row. You can confirm the same from above output.
Customize the index column using set_index method
Using dataframe set_index method, we can specify the new index column or columns.
Example
new_df = df.set_index(keys=['Name'])
Above snippet replace the standard index column with the column ‘Name’. Above snippet do not modify the existing DataFrame (df).
Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore
To make the changes reflect in current DataFrame, set the argument inplace to True.
df.set_index(keys=['Name'], inplace=True)
How to reset the index?
You can reset the index to standard numeric index by calling reset_index method.
df.reset_index(inplace=True)
Find the below working application.
customize_index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print('\nSet the Name column as index')
new_df = df.set_index(keys=['Name'])
print('Original DataFrame')
print(df)
print('\nDataFrame where Name is the index')
print(new_df)
df.set_index(keys=['Name'], inplace=True)
print('\nOriginal DataFrame')
print(df)
df.reset_index(inplace=True)
print('\nOriginal DataFrame after resetting the index')
print(df)
Output
Set the Name column as index
Original DataFrame
Name Age City
0 Krishna 34 Bangalore
1 Sailu 35 Hyderabad
2 Ram 29 Hyderabad
3 Ravi 34 Bangalore
4 Joel 29 Hyderabad
5 Joel 34 Bangalore
DataFrame where Name is the index
Age City
Name
Krishna 34 Bangalore
Sailu 35 Hyderabad
Ram 29 Hyderabad
Ravi 34 Bangalore
Joel 29 Hyderabad
Joel 34 Bangalore
Original DataFrame
Age City
Name
Krishna 34 Bangalore
Sailu 35 Hyderabad
Ram 29 Hyderabad
Ravi 34 Bangalore
Joel 29 Hyderabad
Joel 34 Bangalore
Original DataFrame after resetting the index
Name Age City
0 Krishna 34 Bangalore
1 Sailu 35 Hyderabad
2 Ram 29 Hyderabad
3 Ravi 34 Bangalore
4 Joel 29 Hyderabad
5 Joel 34 Bangalore
set_index method replace the index
set_index method replace the current index, let’s confirm it with below dataset.Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 Ram 29 Hyderabad 3 Ravi 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore
df.set_index(keys=['Name'], inplace=True)
Above snippet set the ‘Name’ column as index column.
Age City Name Krishna 34 Bangalore Sailu 35 Hyderabad Ram 29 Hyderabad Ravi 34 Bangalore Joel 29 Hyderabad Joel 34 Bangalore
Let’s set the index column to ‘Age’.
df.set_index(keys=['Age'], inplace=True)
Once above statement is executed, it replace the old index with Age column. In this case, we are going to lose complete Name column data, as it is replaced by Age column data.
City Age 34 Bangalore 35 Hyderabad 29 Hyderabad 34 Bangalore 29 Hyderabad 34 Bangalore
Find the below working application.
set_index_replacement_demo.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)
df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)
Output
Name Age City
0 Krishna 34 Bangalore
1 Sailu 35 Hyderabad
2 Ram 29 Hyderabad
3 Ravi 34 Bangalore
4 Joel 29 Hyderabad
5 Joel 34 Bangalore
DataFrame after setting the index to "Name" column
Age City
Name
Krishna 34 Bangalore
Sailu 35 Hyderabad
Ram 29 Hyderabad
Ravi 34 Bangalore
Joel 29 Hyderabad
Joel 34 Bangalore
DataFrame after setting the index to "Age" column
City
Age
34 Bangalore
35 Hyderabad
29 Hyderabad
34 Bangalore
29 Hyderabad
34 Bangalore
How to address above replacement problem?
You can address above problem by resetting the index to default standard numeric index before setting new index.
df.set_index(keys=['Name'], inplace=True)
df.reset_index(inplace=True)
df.set_index(keys=['Age'], inplace=True)
Find the below working application.
set_index_replacement_demo.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Ram', 'Ravi', 'Joel', "Joel"],
'Age': [34, 35, 29, 34, 29, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore']}
df = pd.DataFrame(data)
print(df)
df.set_index(keys=['Name'], inplace=True)
print('\nDataFrame after setting the index to "Name" column')
print(df)
print('\nResetting index')
df.reset_index(inplace=True)
print(df)
df.set_index(keys=['Age'], inplace=True)
print('\nDataFrame after setting the index to "Age" column')
print(df)
Output
Name Age City
0 Krishna 34 Bangalore
1 Sailu 35 Hyderabad
2 Ram 29 Hyderabad
3 Ravi 34 Bangalore
4 Joel 29 Hyderabad
5 Joel 34 Bangalore
DataFrame after setting the index to "Name" column
Age City
Name
Krishna 34 Bangalore
Sailu 35 Hyderabad
Ram 29 Hyderabad
Ravi 34 Bangalore
Joel 29 Hyderabad
Joel 34 Bangalore
Resetting index
Name Age City
0 Krishna 34 Bangalore
1 Sailu 35 Hyderabad
2 Ram 29 Hyderabad
3 Ravi 34 Bangalore
4 Joel 29 Hyderabad
5 Joel 34 Bangalore
DataFrame after setting the index to "Age" column
Name City
Age
34 Krishna Bangalore
35 Sailu Hyderabad
29 Ram Hyderabad
34 Ravi Bangalore
29 Joel Hyderabad
34 Joel Bangalore
No comments:
Post a Comment