Programming for beginners: Adjusted R-squared in Machine Learning

Adjusted R-Squared is a modified version of standard R-Squared, and it is used in regression models to evaluate the prediction accuracy of a regression model.

The main difference with R-Squared is that adjusted R-squared considers the number of independent variables in the model.

Formula

Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - k - 1)]

where:

n is the number of observations (sample size)
k is the number of predictors
R^2 is the ordinary R-squared

Higher adjusted r2 value is generally considered better.

How to calculate R²?

Formula

R² = 1 - (SSR / TSS)

where:

SSR stands for ‘sum of squares of residuals’ or ‘sum of squared errors’. It is the sum of the squared distances between the predicted values and the actual values.

SSR = Σ(yᵢ - y)², where ȳ is the predicted value and y is the actual value from the regression model.

TSS is the total sum of the squares, and it is squared distances between the actual values and the mean value.

TSS = Σ(yᵢ - ȳ)², where yᵢ is the predicted value, and ȳ is the mean of y.

R² measures how well the regression models explain the variance in the data. A higher R² indicates that the model is a better fit for the data. A perfect model would have an R² of 1. A higher R-squared value indicates a better fit.

adjusted_r2.py

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score

# Generate some data
year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015])
population_in_millions = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993])

# Fit two different regression models to the data
model1 = np.polyfit(year, population_in_millions, 1)
model2 = np.polyfit(year, population_in_millions, 2)

# Calculate predicted values for both models
predicted1 = np.polyval(model1, year)
predicted2 = np.polyval(model2, year)

n = len(year)

# Calculate R-squared for both models
r2_model1 = r2_score(population_in_millions, predicted1)
k = 1  # Number of independent variables (year)
adjusted_r_squared1= 1 - (1 - r2_model1) * (n - 1) / (n - k - 1)

r2_model2 = r2_score(population_in_millions, predicted2)
k = 2  # Number of independent variables (year)
adjusted_r_squared2 = 1 - (1 - r2_model2) * (n - 1) / (n - k - 1)

print("R2 for model1:", r2_model1)
print("Adjusted R2 for model1:", adjusted_r_squared1)
print("R2 for model2:", r2_model2)
print("Adjusted R2 for model2:", adjusted_r_squared2)

if adjusted_r_squared1 > adjusted_r_squared2:
    print('prediction1 is more accurate')
else:
    print('prediction2 is more accurate')

# Draw the plot
plt.plot(year, population_in_millions, color='red')
plt.plot(year, predicted1, color='blue', label=f'predicted1')
plt.plot(year, predicted2, color='green', label=f'predicted2')

plt.legend()

plt.show()

Output

R2 for model1: 0.9960164669550016
Adjusted R2 for model1: 0.9954473908057161
R2 for model2: 0.9996705140149567
Adjusted R2 for model2: 0.9995606853532756
prediction2 is more accurate

Previous Next Home

Programming for beginners

Saturday, 22 March 2025

Adjusted R-squared in Machine Learning

No comments:

Post a Comment