In this post, I am going to explain two approaches to count number of unique items in a DataFrame column.
I will be using below data set to demonstrate the application.
Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 None 29 Hyderabad 3 None 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore 6 Krishna 34 Hyderabad
Approach 1: Using series unique method.
unique_names = df['Name'].unique()
Above snippet return following series.
['Krishna' 'Sailu' None 'Joel']
As you see above output, unique method consider missing values while finding unique items in the series. By applying len method on ‘unique_names’ series, we can get total unique names in the DataSet.
total_unique_names = len(unique_names)
Approach 2: Using nunique method
total_unique_names = df['Name'].nunique()
‘nunique’ method do not consider the missing values for counting by default. By passing the argument 'dropna' to False, we can consider the missing values in count calculation.
total_unique_names = df['Name'].nunique(dropna=False)
count_unique_items_in_a_column.py
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', None, None, 'Joel', "Joel", "Krishna"],
'Age': [34, 35, 29, 34, 29, 34, 34],
'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Bangalore', 'Hyderabad', 'Bangalore', 'Hyderabad']}
df = pd.DataFrame(data)
print('Original DataFrame')
print(df)
unique_names = df['Name'].unique()
total_unique_names = len(unique_names)
print('\nUnique names : ')
print(unique_names)
print('\nTotal unique names including missing values : ', total_unique_names)
total_unique_names = df['Name'].nunique()
print('\nTotal unique names excluding missing values : ', total_unique_names)
total_unique_names = df['Name'].nunique(dropna=False)
print('\nTotal unique names including missing values : ', total_unique_names)
Output
Original DataFrame Name Age City 0 Krishna 34 Bangalore 1 Sailu 35 Hyderabad 2 None 29 Hyderabad 3 None 34 Bangalore 4 Joel 29 Hyderabad 5 Joel 34 Bangalore 6 Krishna 34 Hyderabad Unique names : ['Krishna' 'Sailu' None 'Joel'] Total unique names including missing values : 4 Total unique names excluding missing values : 3 Total unique names including missing values : 4
No comments:
Post a Comment