Programming for beginners: Mean squared error (MSE)

MSE is a one of the commonly used metric in regression analysis to evaluate the performance of a predictive model, such as linear regression.

Formula

MSE = (1/n) * Σ(yi - ŷi)^2

Where:

MSE: Mean Squared Error
n: The number of data points (observations)
yi: The actual target value for the i-th data point
ŷi: The predicted target value for the i-th data point
Σ: Sum up the squared differences for all data points

Points to remember

MSE is easy to understand and implement
MSE is always non-negative
Lower MSE values indicate that the model's predictions are closer to the actual data points, which means our model is predicting better. Higher MSE values indicate that the model’s predictions are not closer.
MSE might not always be the correct/appropriate metric, it is depending on the specific problem you're trying to solve.
MSE is sensitive to outliers, so when your dataset has outliers MSE may not be the best choice.
MSE can be used to compare the accuracy of different regression models

For example, following application use MSE to compare the accuracy of different regression models.

mse.py

import numpy as np
import matplotlib.pyplot as plt

# Generate some data
year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015])
population_in_millions = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993])

# Fit two different regression models to the data
model1 = np.polyfit(year, population_in_millions, 1)
model2 = np.polyfit(year, population_in_millions, 2)

# Predict the values for both models
m1 = model1[0]
c1 = model1[1]
population_pred1 =  m1 * year + c1

a1 = model2[0]
a2 = model2[1]
a3 = model2[2]
population_pred2 = a1 * year**2 + a2 * year + a3

# Calculate the MSE for both models
mse1 = np.mean((population_in_millions - population_pred1)**2)
mse2 = np.mean((population_in_millions - population_pred2)**2)

print("MSE for model1:", mse1)
print("MSE for model2:", mse2)

if mse1 > mse2:
    print('prediction2 is more accurate')
else:
    print('prediction1 is more accurate')

# Draw the plot
plt.plot(year, population_in_millions, color='red')
plt.plot(year, population_pred1, color='blue', label=f'pred1 : {m1}*x+{c1}')
plt.plot(year, population_pred2, color='green', label=f'pred2 : {a1}*x*x+{a2}*x+{a3}')

plt.legend()

plt.show()

Output

MSE for model1: 3.694543059874588
MSE for model2: 0.3055830454579521
prediction2 is more accurate

You can confirm the same from below diagram.

Previous Next Home

Programming for beginners

Tuesday, 18 March 2025

Mean squared error (MSE)

No comments:

Post a Comment