Ten days of statistics (8) - Correlationship


cover

Covariance

This is a measure of how two random variables XX and YY change together. Formally

cov(X,Y)=1ni=1n(xixˉ)×(yiyˉ)cov(X,Y) = \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x}) \times (y_i - \bar{y})

Where

  • xˉ\bar{x} is the mean of xx
  • yˉ\bar{y} is the mean of yy

Pearson correlation coefficient

The Pearson correlation coefficient ρ(X,Y)\rho(X, Y) is given by:

ρ(X,Y)=cov(X,Y)σXσY\rho(X,Y) = \frac{cov(X,Y)}{\sigma_X\sigma_Y}

Where

  • σX\sigma_X is the standard deviation of XX
  • σY\sigma_Y is the standard deviation of YY

Spearman’s rank correlation coefficient

Given 2 random variables XX and YY with the same sample size. Let rankXrank_X and rankYrank_Y denotes the ranks of each data point on XX and YY respectively. Let rSr_S is the Spearman’s rank correlation coefficient of XX and YY, which equal to the Pearson correlation coefficient of rankXrank_X and rankYrank_Y

rS=cov(rankX,rankY)σrankXσrankYr_S = \frac{cov(rank_X, rank_Y)}{\sigma_{rank_X}\sigma_{rank_Y}}

If X and Y contains no duplicates

rS=cov(rankX,rankY)σ2r_S = \frac{cov(rank_X, rank_Y)}{\sigma^2}

Practice

Hackerrank has some exercises for you to test your knowledge:

Next lesson: Linear regression