DataFrame index
DataFrame index is an unique identifier assigned to each row of the DataFrame. When a DataFrame is created, Pandas assign a default index to the DataFrame, where the index will start with the number 0 and increment by 1 for each subsequent rows. This default index is known as RangeIndex.
dataframe_index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
'Age': [34, 25, 29, 41, 52, 23],
'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}
df = pd.DataFrame(data)
print(df)
Output
Name Age City 0 Krishna 34 Bangalore 1 Ram 25 Chennai 2 Joel 29 Hyderabad 3 Gopi 41 Hyderabad 4 Jitendra 52 Bangalore 5 Raj 23 Chennai
Specify new index to the existing DataFrame
By updating df.index attribute we can specify a new index to the existing DataFrame.
Example
df.index = ['a', 'b', 'c', 'd', 'e', 'f']
add_new_index_to_dataframe.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
'Age': [34, 25, 29, 41, 52, 23],
'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}
df = pd.DataFrame(data)
print("with default index")
print(df)
print("\nwith updated index\n")
df.index = ['a', 'b', 'c', 'd', 'e', 'f']
print(df)
print("\nprint with the index label\n")
for label in df.index:
print("For the index : ",label)
print(df.loc[label], "\n")
Output
with default index Name Age City 0 Krishna 34 Bangalore 1 Ram 25 Chennai 2 Joel 29 Hyderabad 3 Gopi 41 Hyderabad 4 Jitendra 52 Bangalore 5 Raj 23 Chennai with updated index Name Age City a Krishna 34 Bangalore b Ram 25 Chennai c Joel 29 Hyderabad d Gopi 41 Hyderabad e Jitendra 52 Bangalore f Raj 23 Chennai print with the index label For the index : a Name Krishna Age 34 City Bangalore Name: a, dtype: object For the index : b Name Ram Age 25 City Chennai Name: b, dtype: object For the index : c Name Joel Age 29 City Hyderabad Name: c, dtype: object For the index : d Name Gopi Age 41 City Hyderabad Name: d, dtype: object For the index : e Name Jitendra Age 52 City Bangalore Name: e, dtype: object For the index : f Name Raj Age 23 City Chennai Name: f, dtype: object
Use existing column as an index
Using set_index() method, we can set one or more columns as DataFrame index.
Example
df.set_index("Name", inplace=True)
Above snippet use the ‘Name’ column as DataFrame index. ‘inPlace=True’ perform the DataFrame modification in place without creating a new DataFrame.
set_existing_column_as_index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
'Age': [34, 25, 29, 41, 52, 23],
'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}
df = pd.DataFrame(data)
print("with default index")
print(df)
print("\n")
print("with Name as index")
df.set_index("Name", inplace=True)
print(df)
Output
with default index Name Age City 0 Krishna 34 Bangalore 1 Ram 25 Chennai 2 Joel 29 Hyderabad 3 Gopi 41 Hyderabad 4 Jitendra 52 Bangalore 5 Raj 23 Chennai with Name as index Age City Name Krishna 34 Bangalore Ram 25 Chennai Joel 29 Hyderabad Gopi 41 Hyderabad Jitendra 52 Bangalore Raj 23 Chennai
Set multiple columns as index
Below snippet use the Name and City columns as an index.
df.set_index(["Name", "City"], inplace=True)
Below snippet access the DataFrame row by its Name and City.
row = df.loc[("Krishna", "Bangalore")]
multi_column_index.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
'Age': [34, 25, 29, 41, 52, 23],
'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}
df = pd.DataFrame(data)
print("with default index")
print(df)
print("\n")
print("with Name as index")
df.set_index(["Name", "City"], inplace=True)
print(df)
# Access row with index ("Krishna", "Bangalore")
print('\nAccess row with index ("Krishna", "Bangalore")')
row = df.loc[("Krishna", "Bangalore")]
print(row)
Output
with default index Name Age City 0 Krishna 34 Bangalore 1 Ram 25 Chennai 2 Joel 29 Hyderabad 3 Gopi 41 Hyderabad 4 Jitendra 52 Bangalore 5 Raj 23 Chennai with Name as index Age Name City Krishna Bangalore 34 Ram Chennai 25 Joel Hyderabad 29 Gopi Hyderabad 41 Jitendra Bangalore 52 Raj Chennai 23 Access row with index ("Krishna", "Bangalore") Age 34 Name: (Krishna, Bangalore), dtype: int64
Previous Next Home
No comments:
Post a Comment