Programming for beginners: Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) is a common metric to find the accuracy of a predictive model in Machine learning, statistics. A lower RMSE indicates that the predictions of the model are more accurate, and a large RMSE indicates that the model is making inaccurate predictions.

Formula

RMSE = sqrt(mean(residuals^2))

where:

residuals are the differences between the predicted values and the actual values
mean() is the average of the residuals
sqrt() is the square root function

In simple terms, RMSE is the square root of MSE.

RMSE = √(MSE)

Following function calculates the RMSE between predicted and actual values.

def rmse(actual, predicted):
    # Calculate residuals
    residuals = actual - predicted

    # Calculate mean squared error (MSE)
    mse = np.mean(residuals ** 2)

    # Calculate RMSE by taking the square root of MSE
    rmse = np.sqrt(mse)

    return rmse

In the above example, actual, predicted are the two numpy arrays.

Following application use RMSE of two regression models to evaluate their performances.

rmse.py

import numpy as np
import matplotlib.pyplot as plt

def rmse(actual, predicted):
    # Calculate residuals
    residuals = actual - predicted

    # Calculate mean squared error (MSE)
    mse = np.mean(residuals ** 2)

    # Calculate RMSE by taking the square root of MSE
    rmse = np.sqrt(mse)

    return rmse

# Generate some data
year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015])
population_in_millions = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993])

# Fit two different regression models to the data
model1 = np.polyfit(year, population_in_millions, 1)
model2 = np.polyfit(year, population_in_millions, 2)

# Predict the values for both models
m1 = model1[0]
c1 = model1[1]
population_pred1 =  m1 * year + c1

a1 = model2[0]
a2 = model2[1]
a3 = model2[2]
population_pred2 = a1 * year**2 + a2 * year + a3

# Calculate the MSE for both models
rmse1 = rmse(population_in_millions, population_pred1)
rmse2 = rmse(population_in_millions, population_pred2)

print("RMSE for model1:", rmse1)
print("RMSE for model2:", rmse2)

if rmse1 > rmse2:
    print('prediction2 is more accurate')
else:
    print('prediction1 is more accurate')

# Draw the plot
plt.plot(year, population_in_millions, color='red')
plt.plot(year, population_pred1, color='blue', label=f'pred1 : {m1}*x+{c1}')
plt.plot(year, population_pred2, color='green', label=f'pred2 : {a1}*x*x+{a2}*x+{a3}')

plt.legend()

plt.show()

Output

RMSE for model1: 1.922119418734067
RMSE for model2: 0.552795663385624
prediction2 is more accurate

You can depict the same from below image.

Previous Next Home

Programming for beginners

Tuesday, 18 March 2025

Root Mean Squared Error (RMSE)

No comments:

Post a Comment