Sunday, 2 March 2025

Label Encoding: A Simple Way to Convert Categorical Data for Machine Learning

Label encoding is a process to convert the categorical data into numbers. Each categorical data is assigned with an unique number. This is required in multiple occasions, where machine learning algorithms require  numerical data to perform Mathematical computations, for example Linear regression, decision trees, support vector machines work with numerical data.


Examples of Categorical data

  1. favorite_color: red, green, yellow etc.,
  2. gender : male, female
  3. Marital status: single, married, divorced
  4. vehical_type: car, bike, bus
  5. rating: poor, average, good, excellent
  6. occupation: doctor, teacher, software engineer
  7. Religion: Christian, Muslim, Hindu, Buddhist
  8. education: high school, degree, masters, PhD

 

Let’s write an example using Scikit learn library.

 

label_encoding.py 

from sklearn.preprocessing import LabelEncoder

# Sample categorical labels
categorical_labels = ['poor', 'average', 'good', 'average', 'good', 'excellent']

# Creating an instance of LabelEncoder
label_encoder = LabelEncoder()

# Fitting the encoder on the categorical labels and transforming them
encoded_labels = label_encoder.fit_transform(categorical_labels)

print(encoded_labels)

 

Output

[3 0 2 0 2 1]

 

In this example, the LabelEncoder maps each unique label to a corresponding integer. In this case,

  1. 'poor' is mapped to 3,
  2. 'average' is mapped to 0,
  3. 'good' is mapped to 2, and
  4.  'excellent' is mapped to 1

 

Advantages of Label encoding

  1. Easy to implement
  2. It is a lossless transformation, original data is converted back from encoded data.

 

Limitation on Label encoding

Label encoding technique is ineffective in machine learning algorithms, when you have many unique categories. In these cases,  use other encoding techniques like one-hot encoding.

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment