‘Mean’ is the average of a set of data points.
Formula to calculate the mean.
Here,
- μ represents the mean.
- xi represents each individual value in the dataset.
- n is the total number of values in the dataset.
For example, I have the values [1, 2, 3, 4, 5] and mean is calculated like below.
Mean = (Sum of all values)/(total number of values)
= (1 + 2 + 3 + 4 + 5)/5
= 15 / 5
= 3
Implementation of mean in Python
mean.py
data = [1, 2, 3, 4, 5] no_of_values = len(data) sum_of_values = sum(data) mean = sum_of_values/no_of_values print(f'data : {data}') print(f'no_of_values : {no_of_values}') print(f'sum_of_values : {sum_of_values}') print(f'mean : {mean}')
Output
data : [1, 2, 3, 4, 5] no_of_values : 5 sum_of_values : 15 mean : 3.0
Mean is mainly used to get the some idea on central tendency.
visualize_central_tendency.py
import matplotlib.pyplot as plt import numpy as np # Sample dataset data = [10, 20, 30, 40, 50, 60] # Calculate the mean mean = np.mean(data) # Create a scatter plot of data points plt.scatter(data, range(len(data)), label='Data Points', color='blue', marker='o') # Add a vertical line at the mean plt.axvline(x=mean, color='red', linestyle='--', label='Mean') # Add labels and a title plt.xlabel('Actual value') plt.ylabel('Index') plt.title('Scatter Plot with Mean') # Add a legend plt.legend() # Show the plot plt.show()
Output
Mean is affected by outliers
Mean is affected by outliers (extreme values) in the dataset. If there are extreme outliers present in your dataset, then mean might not represent the central tendency accurately.
For example, consider below data.
[10, 20, 30, 40, 50, 100000]
We have an extreme value 100000 here which impact the central tendency calculation using mean here. You can depict the same from below figure.
mean_outliers.py
import matplotlib.pyplot as plt import numpy as np # Sample dataset data = [10, 20, 30, 40, 50, 100000] # Calculate the mean mean = np.mean(data) # Create a scatter plot of data points plt.scatter(data, range(len(data)), label='Data Points', color='blue', marker='o') # Add a vertical line at the mean plt.axvline(x=mean, color='red', linestyle='--', label='Mean') # Add labels and a title plt.xlabel('Actual value') plt.ylabel('Index') plt.title('Scatter Plot with Mean') # Add a legend plt.legend() # Show the plot plt.show()
Output
What are the other ways to calculate central tendency?
- Median
- Mode
There are other variants of mean to calculate the central tendency
- Weighted mean
- Geometric mean
- Harmonic mean
- Trimmed mean
No comments:
Post a Comment