Sunday, 9 March 2025

Median to represent central tendency

 

Median represent the middle value of the data points, when the data points are arranged in either ascending or descending order.

 

How to calculate median?

Step 1: Sort the data points in either ascending or descending order.

 

Step 2: Find the middle value.

 

  1. If you have an odd number of data points, then median is the middle value in the ordered list.
  2. If you have an even number of data points, then median is the average of the two middle values.

 

Example 1: with odd number of values

 

Suppose you have a data points like below.

 

[3, 5, 6, 2, 1, 4, 5]

 

Let’s sort the data points : [1, 2, 3, 4, 5, 5, 6] and the middle value 4 is the median.

 

Example 2: with even number of values

 

Suppose you have a data points like below.

 

[3, 5, 6, 2, 1, 4]

 

Let’s sort the data points : [1, 2, 3, 4, 5, 6] and the middle values are 3 and 4.

 

Median is (3 + 4)/2 = 7/2 = 3.5

 

median.py

import numpy as np

# Create a sample dataset
data1 = np.array([3, 5, 6, 2, 1, 4, 5])
data2 = np.array([3, 5, 6, 2, 1, 4])

# Calculate the median
median1 = np.median(data1)
median2 = np.median(data2)

# Print the median
print(f"median1: {median1}")
print(f"median2: {median2}")

Output

median1: 4.0
median2: 3.5

 

Unlike mean, median is less affected by extreme values, for example, let’s see the below example with some outliers in house prices.

 

medain_using_pyplot.py

import matplotlib.pyplot as plt
import numpy as np

# Sample dataset
house_prices = [350000, 100000, 120000, 150000, 170000, 200000, 700000, 900000]

# Calculate the mean
mean = np.mean(house_prices)
median = np.median(house_prices)

# Create a scatter plot of data points
plt.scatter(house_prices, range(len(house_prices)), label='Data Points', color='blue', marker='o')

# Add a vertical line at the mean
plt.axvline(x=mean, color='red', linestyle='--', label='Mean')
plt.axvline(x=median, color='blue', linestyle='--', label='Median')

# Add labels and a title
plt.xlabel('House prices')
plt.ylabel('Index')
plt.title('Scatter Plot with Mean')

# Add a legend
plt.legend()

# Show the plot
plt.show()

 

Output

 


As you see above diagram, mean is impacted by the outliers in the data points as compared to median.

 

Advantages of median

  1. It is easy to calculate.
  2. Less affected by outliers.


Previous                                                    Next                                                    Home

No comments:

Post a Comment