Ordinary least squares is a method to find parameters for a linear regression model. It helps to "fit" a function to a data by minimizing the sum of squared errors (residuals) from the data. Residual is the difference between the fitted line and the sample point.
But why minimize squared errors? Why not to minimize absolute errors? To understand this, lets look at the following example, from http://en.wikibooks.org/wiki/Econometric_Theory/Ordinary_Least_Squares_(OLS), where we plot sweater sales against temperature:
Notice that the sum of residuals in model A is 5+10-5-10 = 0 and sum of residuals in model B is 3-3+3-3 = 0. So are both models great fits?
No!
Clearly, it does not make rational sense to have a positive relation between sweater sales and temperature. Hence, we do not look to minimize errors but rather the squares of the errors in order to account for the sign.
Now how to compute values for the sample parameters? The sum of squared residuals (SSR) is given by:
To minimize the SSR, we take the partial derivatives with respect to the sample parameters and equate to zero the two equations:
Now, simultaneously solving these equations, we get the following formulae for the sample parameters:
For complete derivation of this result, view Derivation of OLS estimators
So this is basically how the method of least squares leads to a line of best fit!
Reference:
But why minimize squared errors? Why not to minimize absolute errors? To understand this, lets look at the following example, from http://en.wikibooks.org/wiki/Econometric_Theory/Ordinary_Least_Squares_(OLS), where we plot sweater sales against temperature:
http://en.wikibooks.org/wiki/Econometric_Theory/Ordinary_Least_Squares_(OLS) |
Notice that the sum of residuals in model A is 5+10-5-10 = 0 and sum of residuals in model B is 3-3+3-3 = 0. So are both models great fits?
No!
Clearly, it does not make rational sense to have a positive relation between sweater sales and temperature. Hence, we do not look to minimize errors but rather the squares of the errors in order to account for the sign.
Now how to fit a regression line to the above model? Say the model has an intercept β0 and a slope term β1. So we can write the model as:
yi = β0 + β1xi + ui
But it is not possible to find the values for β0 and β1, since these pertain to the population as a whole. So we take a sample from the population and estimate these values as shown in diagram below:
Chapter 2, Wooldridge, Jeffery M. 2006. Introductory Econometrics. 3td edition. Thomas South-western http://www.swlearning.com/economics/wooldridge/wooldridge2e/powerpoint.html |
To minimize the SSR, we take the partial derivatives with respect to the sample parameters and equate to zero the two equations:
Now, simultaneously solving these equations, we get the following formulae for the sample parameters:
http://www.yongyoon.net/ecmetrics201/ols_derivation.pdf |
For complete derivation of this result, view Derivation of OLS estimators
So this is basically how the method of least squares leads to a line of best fit!
Reference:
- Econometric Theory/Ordinary Least Squares (OLS)http://en.wikibooks.org/wiki/Econometric_Theory/Ordinary_Least_Squares_(OLS)
- Chapter 2, Wooldridge, Jeffery M. 2006. Introductory Econometrics. 3td edition. http://www.swlearning.com/economics/wooldridge/wooldridge2e/powerpoint.html
- Derivation of OLS estimators http://www.yongyoon.net/ecmetrics201/ols_derivation.pdf