Wednesday, 5 March 2025

The Significance of Mean to calculate central tendency of data

‘Mean’ is the average of a set of data points.

Formula to calculate the mean. 


Here,

  1. μ represents the mean.
  2. xi represents each individual value in the dataset.
  3. n is the total number of values in the dataset.

 

For example, I have the values [1, 2, 3, 4, 5] and mean is calculated like below.

 

Mean = (Sum of all values)/(total number of values)

         = (1 + 2 + 3 + 4 + 5)/5

         = 15 / 5

         = 3

 

Implementation of mean in Python

mean.py

data = [1, 2, 3, 4, 5]

no_of_values = len(data)
sum_of_values = sum(data)
mean = sum_of_values/no_of_values

print(f'data : {data}')
print(f'no_of_values : {no_of_values}')
print(f'sum_of_values : {sum_of_values}')
print(f'mean : {mean}')

Output

data : [1, 2, 3, 4, 5]
no_of_values : 5
sum_of_values : 15
mean : 3.0

Mean is mainly used to get the some idea on central tendency.

 

visualize_central_tendency.py

import matplotlib.pyplot as plt
import numpy as np

# Sample dataset
data = [10, 20, 30, 40, 50, 60]

# Calculate the mean
mean = np.mean(data)

# Create a scatter plot of data points
plt.scatter(data, range(len(data)), label='Data Points', color='blue', marker='o')

# Add a vertical line at the mean
plt.axvline(x=mean, color='red', linestyle='--', label='Mean')

# Add labels and a title
plt.xlabel('Actual value')
plt.ylabel('Index')
plt.title('Scatter Plot with Mean')

# Add a legend
plt.legend()

# Show the plot
plt.show()

 

Output

 


 

Mean is affected by outliers

Mean is affected by outliers (extreme values) in the dataset. If there are extreme outliers present in your dataset, then mean might not represent the central tendency accurately.

 

For example, consider below data.

[10, 20, 30, 40, 50, 100000]

 

We have an extreme value 100000 here which impact the central tendency calculation using mean here. You can depict the same from below figure.

 

mean_outliers.py

 

import matplotlib.pyplot as plt
import numpy as np

# Sample dataset
data = [10, 20, 30, 40, 50, 100000]

# Calculate the mean
mean = np.mean(data)

# Create a scatter plot of data points
plt.scatter(data, range(len(data)), label='Data Points', color='blue', marker='o')

# Add a vertical line at the mean
plt.axvline(x=mean, color='red', linestyle='--', label='Mean')

# Add labels and a title
plt.xlabel('Actual value')
plt.ylabel('Index')
plt.title('Scatter Plot with Mean')

# Add a legend
plt.legend()

# Show the plot
plt.show()

Output




What are the other ways to calculate central tendency?

  1. Median
  2. Mode

 

There are other variants of mean to calculate the central tendency

  1. Weighted mean
  2. Geometric mean
  3. Harmonic mean
  4. Trimmed mean

 


Previous                                                    Next                                                    Home

No comments:

Post a Comment