Standard
deviation quantifies the amount of variation in the data set.
Formula
Where
- N : Represents number of data points in the population.
- Xi : Represents each data point.
- μ : Represents the mean of the population.
- σ : Represents the standard deviation
Low and high standard deviations
A low standard deviation indicates that the data points tend to be very close to the mean.
A high standard deviation indicates that the data points are spread out over a wider range of values.
How to know whether standard deviation is low or high?
There is no fixed threshold that universally defines whether a standard deviation is considered as low or high, it depends on the context of the data and the specific field of study.
In general, a standard deviation that is two or three times larger than the mean is considered to be high, or you can go with below kind of rules too.
If the standard deviation is small compared to the mean, it is considered low. This means that the data points are relatively close to the mean.
If the standard deviation is large compared to the mean, it is considered high. This indicates that the data points are more spread out and less concentrated around the mean.
Real world use cases
- If there is a a higher standard deviation in a company stock prices change, it indicates greater price fluctuation which is considered riskier.
- We can use standard deviation in evaluation the prices of houses in a neighbourhood. A low standard deviation indicates that the houses are all of similar price and a high standard deviation indicates that there is a wider range of prices.
Low standard deviation example
house_prices = [100000, 110000, 125000, 95000, 115000, 118000, 123000, 105000]
Mean and standard deviation for the above data points is given below.
mean : 111375.0
std_deviation : 10111.101572034573
Given the range of house prices, which spans from $95,000 to $125,000, a standard deviation of approximately 10111.101572034573 can be considered relatively low. This suggests that the house prices are relatively close to the mean of $111375.0.
low_standard_deviation.py
import numpy as np import matplotlib.pyplot as plt # Generate some random data with a mean of 0 and standard deviation of 1 house_prices = [100000, 110000, 125000, 95000, 115000, 118000, 123000, 105000] # Calculate the standard deviation of the data mean = np.mean(house_prices) std_deviation = np.std(house_prices) print(f'mean : {mean}') print(f'std_deviation : {std_deviation}') # Create a scatter plot of data points plt.scatter(house_prices, range(len(house_prices)), label='House prices', color='green', marker='o') plt.axvline(std_deviation, color='r', linestyle='dashed', linewidth=2, label=f'Standard Deviation: {std_deviation:.2f}') plt.axvline(mean, color='blue', linestyle='dashed', linewidth=2, label=f'Mean: {mean:.2f}') plt.xlabel('Value') plt.ylabel('Frequency') plt.legend() plt.title('Scatter plot with Standard Deviation') plt.show()
Output
mean : 111375.0 std_deviation : 10111.101572034573
High standard deviation example
house_prices = [50000, 300000, 950000, 500000, 750000, 100000, 325000, 650000, 25000, 200000, 35000]
Mean and standard deviation for the above data points is given below.
mean : 353181.8181818182
std_deviation : 303818.67145382444
Given the wide range of values, which span from $25,000 to $950,000, a standard deviation of approximately 303818.67145382444 can be considered relatively high. This indicates that the data points are spread out over a significant range from the mean of approximately $353181.8181818182.
Find the below working application.
high_standard_deviation.py
import numpy as np import matplotlib.pyplot as plt # Generate some random data with a mean of 0 and standard deviation of 1 house_prices = [50000, 300000, 950000, 500000, 750000, 100000, 325000, 650000, 25000, 200000, 35000] # Calculate the standard deviation of the data mean = np.mean(house_prices) std_deviation = np.std(house_prices) print(f'mean : {mean}') print(f'std_deviation : {std_deviation}') # Create a scatter plot of data points plt.scatter(house_prices, range(len(house_prices)), label='House prices', color='green', marker='o') plt.axvline(std_deviation, color='r', linestyle='dashed', linewidth=2, label=f'Standard Deviation: {std_deviation:.2f}') plt.axvline(mean, color='blue', linestyle='dashed', linewidth=2, label=f'Mean: {mean:.2f}') plt.xlabel('Value') plt.ylabel('Frequency') plt.legend() plt.title('Scatter plot with Standard Deviation') plt.show()
Output
mean : 353181.8181818182 std_deviation : 303818.67145382444
Previous Next Home
No comments:
Post a Comment