Ordinal
encoding is used to convert categorical variables into numbers for machine
learning algorithms. In this method, you will be assigning numeric values to categories
based on their order or meaning manually.
Ordinal encoding is particularly useful when dealing with categorical data that does have any ordinal relationship or numerical significance between categories.
To illustrate an example, consider a variable with user performance 'poor', 'average', 'good', and 'excellent'. Using Ordinal encoding we can customize the numbers given to the performances.
Example
# Define the meaningful order of categories meaningful_order = ['poor', 'average', 'good', 'excellent'] # Creating an instance of OrdinalEncoder ordinal_encoder = OrdinalEncoder(categories=[meaningful_order])
In the above snippet, I had given the order based their performance (poor < average < good < excellent)
Find the below working application.
ordinal_encoding.py
from sklearn.preprocessing import OrdinalEncoder import pandas as pd df = pd.DataFrame({'rating': ['poor', 'average', 'good', 'average', 'good', 'excellent']}) # Define the meaningful order of categories meaningful_order = ['poor', 'average', 'good', 'excellent'] # Creating an instance of OrdinalEncoder ordinal_encoder = OrdinalEncoder(categories=[meaningful_order]) # Fitting the encoder on the categorical labels and transforming them encoded_df = ordinal_encoder.fit_transform(df) # Print the encoded DataFrame # print(encoded_df) index = 0 for rating in df['rating']: print(rating, ' -> ', encoded_df[index][0]) index = index + 1
Output
poor -> 0.0 average -> 1.0 good -> 2.0 average -> 1.0 good -> 2.0 excellent -> 3.0
Previous Next Home
No comments:
Post a Comment