Ten days of statistics (9) - Linear Regression


cover

Least square regression line

Regression line is the straight line which best describes the relationship between 2 variables XX and YY. Formally

Y=a+bXY = a+bX

Find the value of bb

The value of bb can be calculated using either of the following:

b=ρ(X,Y)σYσXb=ni=1n(xiyi)i=1n(xi)i=1n(yi)ni=1n(xi2)(i=1n(xi))2\begin{align*} b &= \rho(X,Y) \frac{\sigma_Y}{\sigma_X} \\ b &= \frac{n\sum_{i=1}^n(x_iy_i) - \sum_{i=1}^n(x_i)\sum_{i=1}^n(y_i)}{n\sum_{i=1}^n(x^2_i) - (\sum_{i=1}^n(x_i))^2} \end{align*}

Find the value of aa

a=yˉbxˉa = \bar{y} - b\bar{x}

Where xˉ\bar{x} and yˉ\bar{y} is the mean of XX and YY respectively

Practice

Hackerrank has an exercise for you to test your knowledge:

Next lesson: Multiple regression