Monday, 10 March 2025

Standard Deviation: A Measure of Variability

Standard deviation quantifies the amount of variation in the data set.

Formula 


Where

  1. N : Represents number of data points in the population.
  2. Xi : Represents each data point.
  3. μ : Represents the mean of the population.
  4. σ : Represents the standard deviation

 

 

Low and high standard deviations

A low standard deviation indicates that the data points tend to be very close to the mean.

 

A high standard deviation indicates that the data points are spread out over a wider range of values.

 

How to know whether standard deviation is low or high?

There is no fixed threshold that universally defines whether a standard deviation is considered as low or high, it depends on the context of the data and the specific field of study.

 

In general, a standard deviation that is two or three times larger than the mean is considered to be high, or you can go with below kind of rules too.

 

If the standard deviation is small compared to the mean, it is considered low. This means that the data points are relatively close to the mean.

 

If the standard deviation is large compared to the mean, it is considered high. This indicates that the data points are more spread out and less concentrated around the mean.

 

Real world use cases

  1. If there is a a higher standard deviation in a company stock prices change, it indicates greater price fluctuation which is considered riskier.
  2. We can use standard deviation in evaluation the prices of houses in a neighbourhood. A low standard deviation indicates that the houses are all of similar price and a high standard deviation indicates that there is a wider range of prices.

 

 

Low standard deviation example

house_prices = [100000, 110000, 125000, 95000, 115000, 118000, 123000, 105000]

 

Mean and standard deviation for the above data points is given below.

 

mean : 111375.0

std_deviation : 10111.101572034573

 

Given the range of house prices, which spans from $95,000 to $125,000, a standard deviation of approximately 10111.101572034573 can be considered relatively low. This suggests that the house prices are relatively close to the mean of $111375.0.

 

low_standard_deviation.py

import numpy as np
import matplotlib.pyplot as plt

# Generate some random data with a mean of 0 and standard deviation of 1
house_prices = [100000, 110000, 125000, 95000, 115000, 118000, 123000, 105000]

# Calculate the standard deviation of the data
mean = np.mean(house_prices)
std_deviation = np.std(house_prices)

print(f'mean : {mean}')
print(f'std_deviation : {std_deviation}')

# Create a scatter plot of data points
plt.scatter(house_prices, range(len(house_prices)), label='House prices', color='green', marker='o')

plt.axvline(std_deviation, color='r', linestyle='dashed', linewidth=2, label=f'Standard Deviation: {std_deviation:.2f}')
plt.axvline(mean, color='blue', linestyle='dashed', linewidth=2, label=f'Mean: {mean:.2f}')

plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.title('Scatter plot  with Standard Deviation')
plt.show()

Output

mean : 111375.0
std_deviation : 10111.101572034573

 


 

High standard deviation example

house_prices = [50000, 300000, 950000, 500000, 750000, 100000, 325000, 650000, 25000, 200000, 35000]

 

Mean and standard deviation for the above data points is given below.

 

mean : 353181.8181818182

std_deviation : 303818.67145382444

 

Given the wide range of values, which span from $25,000 to $950,000, a standard deviation of approximately 303818.67145382444 can be considered relatively high. This indicates that the data points are spread out over a significant range from the mean of approximately $353181.8181818182.

 

Find the below working application.

 

high_standard_deviation.py 

import numpy as np
import matplotlib.pyplot as plt

# Generate some random data with a mean of 0 and standard deviation of 1
house_prices = [50000, 300000, 950000, 500000, 750000, 100000, 325000, 650000, 25000, 200000, 35000]

# Calculate the standard deviation of the data
mean = np.mean(house_prices)
std_deviation = np.std(house_prices)

print(f'mean : {mean}')
print(f'std_deviation : {std_deviation}')

# Create a scatter plot of data points
plt.scatter(house_prices, range(len(house_prices)), label='House prices', color='green', marker='o')

plt.axvline(std_deviation, color='r', linestyle='dashed', linewidth=2, label=f'Standard Deviation: {std_deviation:.2f}')
plt.axvline(mean, color='blue', linestyle='dashed', linewidth=2, label=f'Mean: {mean:.2f}')

plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.title('Scatter plot with Standard Deviation')
plt.show()

Output

mean : 353181.8181818182
std_deviation : 303818.67145382444

 


 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment