Residual
is the difference between an actual and the value predicted by a model. Many
evaluation algorithms use residuals to assess the fit of a regression model.
Formula
εᵢ = yᵢ - ŷᵢ
Where:
- εᵢ represents the residual for the i-th data point.
- yᵢ is the actual value of the dependent variable for the i-th data point.
- ŷᵢ is the predicted value of the dependent variable for the i-th data point. This is generated by the regression model.
Example
actual_population = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993]) # Fit two different regression models to the data model1 = np.polyfit(year, actual_population, 1) # Predict the values for both models m = model1[0] c = model1[1] predicted_population = m * year + c # Calculate residuals residuals = actual_population - predicted_population
Positive residual
A positive residual indicates that the model underpredicted the value (predicted value < observed value).
Negative residual
Negative residual indicates that the model overpredicted the value (predicted value > observed value).
We can visualize the distribution of residuals and identify any patterns or outliers using a scatter plot.
residuals.py
import numpy as np import matplotlib.pyplot as plt # Generate some data year = np.array([2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015]) actual_population = np.array([1425.776, 1417.173, 1407.564, 1396.387, 1384.332, 1371.818, 1359.003, 1346.021, 1332.993]) # Fit two different regression models to the data model1 = np.polyfit(year, actual_population, 1) # Predict the values for both models m = model1[0] c = model1[1] predicted_population = m * year + c # Calculate residuals residuals = actual_population - predicted_population print("Actual Values:", actual_population) print("Predicted Values:", predicted_population) print("Residuals:", residuals) # Create a scatterplot of residuals plt.scatter(np.arange(len(residuals)), residuals, c='blue', marker='o', label='Residuals') # Add a horizontal line at y=0 for reference plt.axhline(y=0, color='red', linestyle='--', linewidth=1, label='Zero Residual Line') # Add labels and a legend plt.xlabel('Data Point') plt.ylabel('Residual') plt.title('Residual Plot') plt.legend() # Show the plot plt.grid() plt.show()
Output
Actual Values: [1425.776 1417.173 1407.564 1396.387 1384.332 1371.818 1359.003 1346.021 1332.993] Predicted Values: [1429.42604444 1417.65472778 1405.88341111 1394.11209444 1382.34077778 1370.56946111 1358.79814444 1347.02682778 1335.25551111] Residuals: [-3.65004444 -0.48172778 1.68058889 2.27490556 1.99122222 1.24853889 0.20485556 -1.00582778 -2.26251111]
plt.axhline(y=0, color='red', linestyle='--', linewidth=1, label='Zero Residual Line')
I defined a horizontal line to visualize how residuals deviate from zero.
How residuals used?
- An effective model have residuals close to zero and randomly distributed around zero. You can visualize the same using above kind of graph.
- We can detect outliers using residuals, by identifying the residuals that are much larger or smaller than the others,
No comments:
Post a Comment