Sunday, 2 March 2025

Ordinal encoding: A simple way to convert categorical data for machine learning

Ordinal encoding is used to convert categorical variables into numbers for machine learning algorithms. In this method, you will be assigning numeric values to categories based on their order or meaning manually.

Ordinal encoding is particularly useful when dealing with categorical data that does have any ordinal relationship or numerical significance between categories.

 

To illustrate an example, consider a variable with user performance 'poor', 'average', 'good', and 'excellent'. Using Ordinal encoding we can customize the numbers given to the performances.

 

Example

# Define the meaningful order of categories
meaningful_order = ['poor', 'average', 'good', 'excellent']

# Creating an instance of OrdinalEncoder
ordinal_encoder = OrdinalEncoder(categories=[meaningful_order])

 

In the above snippet, I had given the order based their performance (poor < average < good < excellent)

 

Find the below working application.

 

ordinal_encoding.py

from sklearn.preprocessing import OrdinalEncoder
import pandas as pd

df = pd.DataFrame({'rating': ['poor', 'average', 'good', 'average', 'good', 'excellent']})

# Define the meaningful order of categories
meaningful_order = ['poor', 'average', 'good', 'excellent']

# Creating an instance of OrdinalEncoder
ordinal_encoder = OrdinalEncoder(categories=[meaningful_order])

# Fitting the encoder on the categorical labels and transforming them
encoded_df = ordinal_encoder.fit_transform(df)

# Print the encoded DataFrame
# print(encoded_df)

index = 0
for rating in df['rating']:
    print(rating, ' -> ', encoded_df[index][0])
    index  = index + 1

Output

poor  ->  0.0
average  ->  1.0
good  ->  2.0
average  ->  1.0
good  ->  2.0
excellent  ->  3.0

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment