Tuesday, 3 October 2023

Customizing Index Values in Pandas DataFrames

DataFrame index

 DataFrame index is an unique identifier assigned to each row of the DataFrame. When a DataFrame is created, Pandas assign a default index to the DataFrame, where the index will start with the number 0 and increment by 1 for each subsequent rows. This default index is known as RangeIndex.

 

dataframe_index.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
        'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}

df = pd.DataFrame(data)
print(df)

Output

       Name  Age       City
0   Krishna   34  Bangalore
1       Ram   25    Chennai
2      Joel   29  Hyderabad
3      Gopi   41  Hyderabad
4  Jitendra   52  Bangalore
5       Raj   23    Chennai

Specify new index to the existing DataFrame

By updating df.index attribute we can specify a new index to the existing DataFrame.

 

Example

df.index = ['a', 'b', 'c', 'd', 'e', 'f']

 

add_new_index_to_dataframe.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
        'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}

df = pd.DataFrame(data)
print("with default index")
print(df)

print("\nwith updated index\n")
df.index = ['a', 'b', 'c', 'd', 'e', 'f']
print(df)

print("\nprint with the index label\n")
for label in df.index:
    print("For the index : ",label)
    print(df.loc[label], "\n")

Output

with default index
       Name  Age       City
0   Krishna   34  Bangalore
1       Ram   25    Chennai
2      Joel   29  Hyderabad
3      Gopi   41  Hyderabad
4  Jitendra   52  Bangalore
5       Raj   23    Chennai

with updated index

       Name  Age       City
a   Krishna   34  Bangalore
b       Ram   25    Chennai
c      Joel   29  Hyderabad
d      Gopi   41  Hyderabad
e  Jitendra   52  Bangalore
f       Raj   23    Chennai

print with the index label

For the index :  a
Name      Krishna
Age            34
City    Bangalore
Name: a, dtype: object 

For the index :  b
Name        Ram
Age          25
City    Chennai
Name: b, dtype: object 

For the index :  c
Name         Joel
Age            29
City    Hyderabad
Name: c, dtype: object 

For the index :  d
Name         Gopi
Age            41
City    Hyderabad
Name: d, dtype: object 

For the index :  e
Name     Jitendra
Age            52
City    Bangalore
Name: e, dtype: object 

For the index :  f
Name        Raj
Age          23
City    Chennai
Name: f, dtype: object

Use existing column as an index

Using set_index() method, we can set one or more columns as DataFrame index.

 

Example

df.set_index("Name", inplace=True)

 

Above snippet use the ‘Name’ column as DataFrame index. ‘inPlace=True’ perform the DataFrame modification  in place without creating a new DataFrame.

 

set_existing_column_as_index.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
        'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}

df = pd.DataFrame(data)
print("with default index")
print(df)

print("\n")
print("with Name as index")

df.set_index("Name", inplace=True)
print(df)

Output

with default index
       Name  Age       City
0   Krishna   34  Bangalore
1       Ram   25    Chennai
2      Joel   29  Hyderabad
3      Gopi   41  Hyderabad
4  Jitendra   52  Bangalore
5       Raj   23    Chennai


with Name as index
          Age       City
Name                    
Krishna    34  Bangalore
Ram        25    Chennai
Joel       29  Hyderabad
Gopi       41  Hyderabad
Jitendra   52  Bangalore
Raj        23    Chennai

Set multiple columns as index

Below snippet use the Name and City columns as an index.

df.set_index(["Name", "City"], inplace=True)

 

Below snippet access the DataFrame row by its Name and City.

row = df.loc[("Krishna", "Bangalore")]

 

multi_column_index.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
        'Age': [34, 25, 29, 41, 52, 23],
        'City': ['Bangalore', 'Chennai', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Chennai']}

df = pd.DataFrame(data)
print("with default index")
print(df)

print("\n")
print("with Name as index")

df.set_index(["Name", "City"], inplace=True)
print(df)

# Access row with index ("Krishna", "Bangalore")
print('\nAccess row with index ("Krishna", "Bangalore")')
row = df.loc[("Krishna", "Bangalore")]
print(row)

Output

with default index
       Name  Age       City
0   Krishna   34  Bangalore
1       Ram   25    Chennai
2      Joel   29  Hyderabad
3      Gopi   41  Hyderabad
4  Jitendra   52  Bangalore
5       Raj   23    Chennai


with Name as index
                    Age
Name     City          
Krishna  Bangalore   34
Ram      Chennai     25
Joel     Hyderabad   29
Gopi     Hyderabad   41
Jitendra Bangalore   52
Raj      Chennai     23

Access row with index ("Krishna", "Bangalore")
Age    34
Name: (Krishna, Bangalore), dtype: int64



  

Previous                                                 Next                                                 Home

No comments:

Post a Comment