Monday 27 November 2023

Count number of unique items in a DataFrame column

In this post, I am going to explain two approaches to count number of unique items in a DataFrame column.

I will be using below data set to demonstrate the application.

      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2     None   29  Hyderabad
3     None   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
6  Krishna   34  Hyderabad

 

Approach 1: Using series unique method.

unique_names = df['Name'].unique()

 

Above snippet return following series.

['Krishna' 'Sailu' None 'Joel']

As you see above output, unique method consider missing values while finding unique items in the series. By applying len method on ‘unique_names’ series, we can get total unique names in the DataSet.

total_unique_names = len(unique_names)

 

Approach 2: Using nunique method

total_unique_names = df['Name'].nunique()

‘nunique’ method do not consider the missing values for counting by default.  By passing the argument 'dropna' to False, we can consider the missing values in count calculation.

total_unique_names = df['Name'].nunique(dropna=False)

count_unique_items_in_a_column.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', None, None, 'Joel', "Joel", "Krishna"],
        'Age': [34, 35, 29, 34, 29, 34, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore', 'Hyderabad']}

df = pd.DataFrame(data)
print('Original DataFrame')
print(df)

unique_names = df['Name'].unique()
total_unique_names = len(unique_names)

print('\nUnique names : ')
print(unique_names)
print('\nTotal unique names including missing values : ', total_unique_names)

total_unique_names = df['Name'].nunique()
print('\nTotal unique names excluding missing values : ', total_unique_names)

total_unique_names = df['Name'].nunique(dropna=False)
print('\nTotal unique names including missing values : ', total_unique_names)

Output

Original DataFrame
      Name  Age       City
0  Krishna   34  Bangalore
1    Sailu   35  Hyderabad
2     None   29  Hyderabad
3     None   34  Bangalore
4     Joel   29  Hyderabad
5     Joel   34  Bangalore
6  Krishna   34  Hyderabad

Unique names : 
['Krishna' 'Sailu' None 'Joel']

Total unique names including missing values :  4

Total unique names excluding missing values :  3

Total unique names including missing values :  4






 

 

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment