Median represent the middle value of the data points, when the data points are arranged in either ascending or descending order.
How to calculate median?
Step 1: Sort the data points in either ascending or descending order.
Step 2: Find the middle value.
- If you have an odd number of data points, then median is the middle value in the ordered list.
- If you have an even number of data points, then median is the average of the two middle values.
Example 1: with odd number of values
Suppose you have a data points like below.
[3, 5, 6, 2, 1, 4, 5]
Let’s sort the data points : [1, 2, 3, 4, 5, 5, 6] and the middle value 4 is the median.
Example 2: with even number of values
Suppose you have a data points like below.
[3, 5, 6, 2, 1, 4]
Let’s sort the data points : [1, 2, 3, 4, 5, 6] and the middle values are 3 and 4.
Median is (3 + 4)/2 = 7/2 = 3.5
median.py
import numpy as np # Create a sample dataset data1 = np.array([3, 5, 6, 2, 1, 4, 5]) data2 = np.array([3, 5, 6, 2, 1, 4]) # Calculate the median median1 = np.median(data1) median2 = np.median(data2) # Print the median print(f"median1: {median1}") print(f"median2: {median2}")
Output
median1: 4.0 median2: 3.5
Unlike mean, median is less affected by extreme values, for example, let’s see the below example with some outliers in house prices.
medain_using_pyplot.py
import matplotlib.pyplot as plt import numpy as np # Sample dataset house_prices = [350000, 100000, 120000, 150000, 170000, 200000, 700000, 900000] # Calculate the mean mean = np.mean(house_prices) median = np.median(house_prices) # Create a scatter plot of data points plt.scatter(house_prices, range(len(house_prices)), label='Data Points', color='blue', marker='o') # Add a vertical line at the mean plt.axvline(x=mean, color='red', linestyle='--', label='Mean') plt.axvline(x=median, color='blue', linestyle='--', label='Median') # Add labels and a title plt.xlabel('House prices') plt.ylabel('Index') plt.title('Scatter Plot with Mean') # Add a legend plt.legend() # Show the plot plt.show()
Output
As you see above diagram, mean is impacted by the outliers in the data points as compared to median.
Advantages of median
- It is easy to calculate.
- Less affected by outliers.
No comments:
Post a Comment