Sunday, 16 March 2025

How StandardScaler works on a Pandas dataframe?

Standardization normalizes the data such that it has a mean of 0 and the standard deviation of 1. This is achieved by subtracting the mean from all the values and then dividing by the standard deviation.

X_normalized = (X - mean(X)) / std(X)

How to standardize Pandas DataFrame?

Step 1: Create a StandardScaler instance

scaler = StandardScaler()

 

Step 2: Use 'fit_transform'. method of StandardScaler object to transform the data.

# Select numeric columns
numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns

# Transform the selected columns
standardized_values = scaler.fit_transform(df[numeric_columns])

‘fit_transform’ method return an ndarray object.

 

Find the below working application.

 

standardization.py

import pandas as pd
from sklearn.preprocessing import StandardScaler

# Create a sample DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D' : [True, False, False]
}

df = pd.DataFrame(data)

# Create a StandardScaler instance
scaler = StandardScaler()

# Select numeric columns
numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns

# Transform the selected columns
standardized_values = scaler.fit_transform(df[numeric_columns])

print(f'type of standardized_values : {type(standardized_values)}')
print(f'standardized_values : \n{standardized_values}')

 

Output

type of standardized_values : <class 'numpy.ndarray'>
standardized_values : 
[[-1.22474487 -1.22474487 -1.22474487]
 [ 0.          0.          0.        ]
 [ 1.22474487  1.22474487  1.22474487]]

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment