Ten days of statistics (2) - Quatiles, Standard Deviation


cover

Quartiles

The quartiles of an ordered data set are the 33 points that split the data set size nn into 44 equal groups. Q2Q_2 is the median of the set, Q1Q_1 is the median of upper half, Q3Q_3 is the median of lower half. When nn odd, the median is not included in the upper half and lower half set.

Example 1

Take a set {6,7,15,36,39,40,41,42,43,47,49}\{6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49\}.

Then Q1=15;Q2=40;Q3=43Q_1 = 15; Q_2 = 40; Q_3 = 43

Example 2

Take set {7,15,36,39,40,41,42,43,47,49}\{7, 15, 36, 39, 40, 41, 42, 43, 47, 49\}.

Then Q1=36;Q2=40+412=40.5;Q3=43Q_1 = 36; Q_2 = \frac{40+41}{2} = 40.5; Q_3 = 43

Standard deviation

Variance σ2\sigma^2 of a set XX is the average magnitude of fluctuations of XX from its mean μ\mu. Formally:

σ2=i=1n(xiμ)2n\sigma^2 = \frac{\sum_{i=1}^n (x_i - \mu)^2}{n}

The standard deviation σ\sigma of a set XX is square root of it’s variance. Formally:

σ=σ2=i=1n(xiμ)2n\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum_{i=1}^n (x_i - \mu)^2}{n}}

Application of standard deviation

High standard deviation means the average data point is far away from the mean. In plain English, we are not very sure about using the mean as an expected value.

Conversely, low standard deviation means the average data point is close to the mean. Which means, we can use the mean as an expected value confidently.

For example, let’s take 2 stock’s historical performance over 10 years:

  • Stock A: mean annual return = 7%, standard deviation of annual returns = 7%
  • Stock B: mean annual return = 7%, standard deviation of annual returns = 2%

Both stocks have the same average rate of return, but the volatility is much higher with stock A. If you are an adventurous man, you should pick stock A.

Practice

Hackerrank has some exercises for you to test your knowledge:

Next lesson: Probability