Adjusted R-Squared is a modified version of standard R-Squared, and it is used in regression models to evaluate the prediction accuracy of a regression model.
The main difference with R-Squared is that adjusted R-squared considers the number of independent variables in the model.
Formula
Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - k - 1)]
where:
- n is the number of observations (sample size)
- k is the number of predictors
- R^2 is the ordinary R-squared
Higher adjusted r2 value is generally considered better.
How to calculate R²?
Formula
R² = 1 - (SSR / TSS)
where:
- SSR stands for ‘sum of squares of residuals’ or ‘sum of squared errors’. It is the sum of the squared distances between the predicted values and the actual values.
SSR = Σ(yᵢ - y)², where ȳ is the predicted value and y is the actual value from the regression model.
- TSS is the total sum of the squares, and it is squared distances between the actual values and the mean value.
TSS = Σ(yᵢ - ȳ)², where yᵢ is the predicted value, and ȳ is the mean of y.
R² measures how well the regression models explain the variance in the data. A higher R² indicates that the model is a better fit for the data. A perfect model would have an R² of 1. A higher R-squared value indicates a better fit.
adjusted_r2.py
import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import r2_score # Generate some data year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015]) population_in_millions = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993]) # Fit two different regression models to the data model1 = np.polyfit(year, population_in_millions, 1) model2 = np.polyfit(year, population_in_millions, 2) # Calculate predicted values for both models predicted1 = np.polyval(model1, year) predicted2 = np.polyval(model2, year) n = len(year) # Calculate R-squared for both models r2_model1 = r2_score(population_in_millions, predicted1) k = 1 # Number of independent variables (year) adjusted_r_squared1= 1 - (1 - r2_model1) * (n - 1) / (n - k - 1) r2_model2 = r2_score(population_in_millions, predicted2) k = 2 # Number of independent variables (year) adjusted_r_squared2 = 1 - (1 - r2_model2) * (n - 1) / (n - k - 1) print("R2 for model1:", r2_model1) print("Adjusted R2 for model1:", adjusted_r_squared1) print("R2 for model2:", r2_model2) print("Adjusted R2 for model2:", adjusted_r_squared2) if adjusted_r_squared1 > adjusted_r_squared2: print('prediction1 is more accurate') else: print('prediction2 is more accurate') # Draw the plot plt.plot(year, population_in_millions, color='red') plt.plot(year, predicted1, color='blue', label=f'predicted1') plt.plot(year, predicted2, color='green', label=f'predicted2') plt.legend() plt.show()
Output
R2 for model1: 0.9960164669550016 Adjusted R2 for model1: 0.9954473908057161 R2 for model2: 0.9996705140149567 Adjusted R2 for model2: 0.9995606853532756 prediction2 is more accurate
No comments:
Post a Comment