Root Mean Squared Error (RMSE) is a common metric to find the accuracy of a predictive model in Machine learning, statistics. A lower RMSE indicates that the predictions of the model are more accurate, and a large RMSE indicates that the model is making inaccurate predictions.
Formula
RMSE = sqrt(mean(residuals^2))
where:
- residuals are the differences between the predicted values and the actual values
- mean() is the average of the residuals
- sqrt() is the square root function
In simple terms, RMSE is the square root of MSE.
RMSE = √(MSE)
Following function calculates the RMSE between predicted and actual values.
def rmse(actual, predicted): # Calculate residuals residuals = actual - predicted # Calculate mean squared error (MSE) mse = np.mean(residuals ** 2) # Calculate RMSE by taking the square root of MSE rmse = np.sqrt(mse) return rmse
In the above example, actual, predicted are the two numpy arrays.
Following application use RMSE of two regression models to evaluate their performances.
rmse.py
import numpy as np import matplotlib.pyplot as plt def rmse(actual, predicted): # Calculate residuals residuals = actual - predicted # Calculate mean squared error (MSE) mse = np.mean(residuals ** 2) # Calculate RMSE by taking the square root of MSE rmse = np.sqrt(mse) return rmse # Generate some data year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015]) population_in_millions = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993]) # Fit two different regression models to the data model1 = np.polyfit(year, population_in_millions, 1) model2 = np.polyfit(year, population_in_millions, 2) # Predict the values for both models m1 = model1[0] c1 = model1[1] population_pred1 = m1 * year + c1 a1 = model2[0] a2 = model2[1] a3 = model2[2] population_pred2 = a1 * year**2 + a2 * year + a3 # Calculate the MSE for both models rmse1 = rmse(population_in_millions, population_pred1) rmse2 = rmse(population_in_millions, population_pred2) print("RMSE for model1:", rmse1) print("RMSE for model2:", rmse2) if rmse1 > rmse2: print('prediction2 is more accurate') else: print('prediction1 is more accurate') # Draw the plot plt.plot(year, population_in_millions, color='red') plt.plot(year, population_pred1, color='blue', label=f'pred1 : {m1}*x+{c1}') plt.plot(year, population_pred2, color='green', label=f'pred2 : {a1}*x*x+{a2}*x+{a3}') plt.legend() plt.show()
Output
RMSE for model1: 1.922119418734067 RMSE for model2: 0.552795663385624 prediction2 is more accurate
You can depict the same from below image.
No comments:
Post a Comment