Monday, 4 December 2023

Pandas: loc: Access the DataFrame rows using row labels

In this post, I am going to explain how to access the DataFrame rows using index position.

Using ‘loc’ accessor, we can access and manipulate data in a DataFrame using label-based indexing.

 

I am going to use below dataset to demonstrate the examples.

      Name  Age       City  Gender  Rating
0  Krishna   34  Bangalore    Male      81
1    Sailu   35  Hyderabad  Female      76
2     Joel   29  Hyderabad    Male      67
3    Chamu   35    Chennai  Female     100
4  Krishna   52  Bangalore    Male      87
5      Raj   34    Chennai    Male      89

Example 1: Access a single row by its label

df.loc[row_label_1]

 

Example 2: Access multiple rows by their labels

df.loc[[row_label_1, row_label_2]]

 

Example 3: Access specific column of the row

df.loc[row_label_1, column_label_1]

 

Example 4: Access specific column2 of the row

df.loc[row_label_1, [column_label_1, column_label_2]]

 

Example 5: Access multiple rows and columns

df.loc[[row_label_1, row_label_2], [column_label_1, column_label_2]]

 

Example 6: Access all rows for specific columns

df.loc[:, [column_label_1, column_label_2]]

 

Find the below working application.

 

loc_hello_world.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Krishna', "Raj"],
        'Age': [34, 35, 29, 35, 52, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male'],
        'Rating': [81, 76, 67, 100, 87, 89]}

df = pd.DataFrame(data)
print('Original DataFrame')
print(df)

print('\nSet "Name" column as index column')
df.set_index('Name', inplace=True)
print(df)

row_label_1 = 'Krishna'
row_label_2 = 'Chamu'

column_label_1 = 'City'
column_label_2 = 'Age'

# Access a single row by its label
result = df.loc[row_label_1]
print('\nAccess a single row by its label :\n',result)

# Access multiple rows by their labels
result = df.loc[[row_label_1, row_label_2]]
print('\nAccess multiple rows by their labels :\n',result)

# Access specific column of the row
result = df.loc[row_label_1, column_label_1]
print('\nAccess specific column of the row :\n',result)

# Access specific columns of the row
result = df.loc[row_label_1, [column_label_1, column_label_2]]
print('\nAccess specific columns of the row :\n',result)

# Access multiple rows and columns
result = df.loc[[row_label_1, row_label_2], [column_label_1, column_label_2]]
print('\nAccess multiple rows and columns :\n',result)

# Access all rows for specific columns
result = df.loc[:, [column_label_1, column_label_2]]
print('\nAccess all rows for specific columns :\n',result)

Output

Original DataFrame
      Name  Age       City  Gender  Rating
0  Krishna   34  Bangalore    Male      81
1    Sailu   35  Hyderabad  Female      76
2     Joel   29  Hyderabad    Male      67
3    Chamu   35    Chennai  Female     100
4  Krishna   52  Bangalore    Male      87
5      Raj   34    Chennai    Male      89

Set "Name" column as index column
         Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35  Hyderabad  Female      76
Joel      29  Hyderabad    Male      67
Chamu     35    Chennai  Female     100
Krishna   52  Bangalore    Male      87
Raj       34    Chennai    Male      89

Access a single row by its label :
          Age       City Gender  Rating
Name                                  
Krishna   34  Bangalore   Male      81
Krishna   52  Bangalore   Male      87

Access multiple rows by their labels :
          Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Krishna   52  Bangalore    Male      87
Chamu     35    Chennai  Female     100

Access specific column of the row :
 Name
Krishna    Bangalore
Krishna    Bangalore
Name: City, dtype: object

Access specific columns of the row :
               City  Age
Name                   
Krishna  Bangalore   34
Krishna  Bangalore   52

Access multiple rows and columns :
               City  Age
Name                   
Krishna  Bangalore   34
Krishna  Bangalore   52
Chamu      Chennai   35

Access all rows for specific columns :
               City  Age
Name                   
Krishna  Bangalore   34
Sailu    Hyderabad   35
Joel     Hyderabad   29
Chamu      Chennai   35
Krishna  Bangalore   52
Raj        Chennai   34

Following are the common uses of loc accessor

a.   Access rows and columns

b.   Slicing with index labels

c.    Boolean indexing

d.   Assigning values

 

Access rows and columns

This is already covered in the introduction part of this post.

 

Slicing with index labels

Example 1: Slicing rows based on index labels. Here both 'row_label_1' and 'row_label_2' are inclusive.

 

df.loc['row_label_1':'row_label_2']

 

Example 2: Slicing rows and selecting specific columns. Here 'row_label_1', 'row_label_2', 'column_label_1' and 'column_label_2' are inclusive.

 

df.loc['row_label_1':'row_label_2', 'column_label_1':'column_label_2']

 

slicing.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Ram', "Raj"],
        'Age': [34, 35, 29, 35, 52, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male'],
        'Rating': [81, 76, 67, 100, 87, 89]}

df = pd.DataFrame(data)
print('Original DataFrame')
print(df)

print('\nSet "Name" column as index column')
df.set_index('Name', inplace=True)
print(df)

row_label_1 = 'Krishna'
row_label_2 = 'Chamu'

column_label_1 = 'Age'
column_label_2 = 'Gender'

# Slicing rows based on index labels
result = df.loc[row_label_1:row_label_2]
print('\nSlicing rows based on index labels:\n',result)

# Slicing rows and selecting specific columns
result = df.loc[row_label_1:row_label_2, column_label_1:column_label_2]
print('\nSlicing rows and selecting specific columns\n',result)

Output

Original DataFrame
      Name  Age       City  Gender  Rating
0  Krishna   34  Bangalore    Male      81
1    Sailu   35  Hyderabad  Female      76
2     Joel   29  Hyderabad    Male      67
3    Chamu   35    Chennai  Female     100
4      Ram   52  Bangalore    Male      87
5      Raj   34    Chennai    Male      89

Set "Name" column as index column
         Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35  Hyderabad  Female      76
Joel      29  Hyderabad    Male      67
Chamu     35    Chennai  Female     100
Ram       52  Bangalore    Male      87
Raj       34    Chennai    Male      89

Slicing rows based on index labels:
          Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35  Hyderabad  Female      76
Joel      29  Hyderabad    Male      67
Chamu     35    Chennai  Female     100

Slicing rows and selecting specific columns
          Age       City  Gender
Name                           
Krishna   34  Bangalore    Male
Sailu     35  Hyderabad  Female
Joel      29  Hyderabad    Male
Chamu     35    Chennai  Female

Boolean indexing

Example 1: Access rows based on a condition

age_greater_34 = df.loc[df['Age'] > 34]

 

Example 2: Access rows based on multiple conditions

age_greater_34_city_hyderabad = df.loc[(df['Age'] > 34) & (df['City'] == 'Hyderabad')]

 

boolean_indexing.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Ram', "Raj"],
        'Age': [34, 35, 29, 35, 52, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male'],
        'Rating': [81, 76, 67, 100, 87, 89]}

df = pd.DataFrame(data)
print('Original DataFrame')
print(df)

print('\nSet "Name" column as index column')
df.set_index('Name', inplace=True)
print(df)

# Access rows based on a condition
age_greater_34 = df.loc[df['Age'] > 34]
print('\nage_greater_34\n',age_greater_34)

# Access rows based on multiple conditions
age_greater_34_city_hyderabad = df.loc[(df['Age'] > 34) & (df['City'] == 'Hyderabad')]
print('\nage_greater_34_city_hyderabad:\n',age_greater_34_city_hyderabad)

Output

Original DataFrame
      Name  Age       City  Gender  Rating
0  Krishna   34  Bangalore    Male      81
1    Sailu   35  Hyderabad  Female      76
2     Joel   29  Hyderabad    Male      67
3    Chamu   35    Chennai  Female     100
4      Ram   52  Bangalore    Male      87
5      Raj   34    Chennai    Male      89

Set "Name" column as index column
         Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35  Hyderabad  Female      76
Joel      29  Hyderabad    Male      67
Chamu     35    Chennai  Female     100
Ram       52  Bangalore    Male      87
Raj       34    Chennai    Male      89

age_greater_34
        Age       City  Gender  Rating
Name                                 
Sailu   35  Hyderabad  Female      76
Chamu   35    Chennai  Female     100
Ram     52  Bangalore    Male      87

age_greater_35_city_hyderabad:
        Age       City  Gender  Rating
Name                                 
Sailu   35  Hyderabad  Female      76

Assigning values

Example 1: Assign a value to a specific cell

df.loc[row_label, column_label] = new_value

 

Example 2: Assign a value to multiple cells based on a condition

df.loc[df['City'] == 'Hyderabad', 'City'] = 'Mumbai'

 

assign_values.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Ram', "Raj"],
        'Age': [34, 35, 29, 35, 52, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male'],
        'Rating': [81, 76, 67, 100, 87, 89]}

df = pd.DataFrame(data)
print('Original DataFrame')
print(df)

print('\nSet "Name" column as index column')
df.set_index('Name', inplace=True)
print(df)

row_label = 'Joel'
column_label = 'City'
new_value = 'Delhi'
# Assign a value to a specific cell
df.loc[row_label, column_label] = new_value
print('\nSetting Joel\' City to Delhi.\n', df)

# Assign a value to multiple cells based on a condition
df.loc[df['City'] == 'Hyderabad', 'City'] = 'Mumbai'
print('\nSetting the city to Mumbai where the City is Hyderabad.\n',df)

Output

Original DataFrame
      Name  Age       City  Gender  Rating
0  Krishna   34  Bangalore    Male      81
1    Sailu   35  Hyderabad  Female      76
2     Joel   29  Hyderabad    Male      67
3    Chamu   35    Chennai  Female     100
4      Ram   52  Bangalore    Male      87
5      Raj   34    Chennai    Male      89

Set "Name" column as index column
         Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35  Hyderabad  Female      76
Joel      29  Hyderabad    Male      67
Chamu     35    Chennai  Female     100
Ram       52  Bangalore    Male      87
Raj       34    Chennai    Male      89

Setting Joel' City to Delhi.
          Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35  Hyderabad  Female      76
Joel      29      Delhi    Male      67
Chamu     35    Chennai  Female     100
Ram       52  Bangalore    Male      87
Raj       34    Chennai    Male      89

Setting the city to Mumbai where the City is Hyderabad.
          Age       City  Gender  Rating
Name                                   
Krishna   34  Bangalore    Male      81
Sailu     35     Mumbai  Female      76
Joel      29      Delhi    Male      67
Chamu     35    Chennai  Female     100
Ram       52  Bangalore    Male      87
Raj       34    Chennai    Male      89

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment