MSE is a one of the commonly used metric in regression analysis to evaluate the performance of a predictive model, such as linear regression.
Formula
MSE = (1/n) * Σ(yi - ŷi)^2
Where:
- MSE: Mean Squared Error
- n: The number of data points (observations)
- yi: The actual target value for the i-th data point
- ŷi: The predicted target value for the i-th data point
- Σ: Sum up the squared differences for all data points
Points to remember
- MSE is easy to understand and implement
- MSE is always non-negative
- Lower MSE values indicate that the model's predictions are closer to the actual data points, which means our model is predicting better. Higher MSE values indicate that the model’s predictions are not closer.
- MSE might not always be the correct/appropriate metric, it is depending on the specific problem you're trying to solve.
- MSE is sensitive to outliers, so when your dataset has outliers MSE may not be the best choice.
- MSE can be used to compare the accuracy of different regression models
For example, following application use MSE to compare the accuracy of different regression models.
mse.py
import numpy as np import matplotlib.pyplot as plt # Generate some data year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015]) population_in_millions = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993]) # Fit two different regression models to the data model1 = np.polyfit(year, population_in_millions, 1) model2 = np.polyfit(year, population_in_millions, 2) # Predict the values for both models m1 = model1[0] c1 = model1[1] population_pred1 = m1 * year + c1 a1 = model2[0] a2 = model2[1] a3 = model2[2] population_pred2 = a1 * year**2 + a2 * year + a3 # Calculate the MSE for both models mse1 = np.mean((population_in_millions - population_pred1)**2) mse2 = np.mean((population_in_millions - population_pred2)**2) print("MSE for model1:", mse1) print("MSE for model2:", mse2) if mse1 > mse2: print('prediction2 is more accurate') else: print('prediction1 is more accurate') # Draw the plot plt.plot(year, population_in_millions, color='red') plt.plot(year, population_pred1, color='blue', label=f'pred1 : {m1}*x+{c1}') plt.plot(year, population_pred2, color='green', label=f'pred2 : {a1}*x*x+{a2}*x+{a3}') plt.legend() plt.show()
Output
MSE for model1: 3.694543059874588 MSE for model2: 0.3055830454579521 prediction2 is more accurate
You can confirm the same from below diagram.
Previous Next Home
No comments:
Post a Comment