by Nick V. Flor (firstname.lastname@example.org) • March 17, 2015 • @ProfessorF
It can be difficult for people with no math backgrounds to follow the math (written by mathematicians on the internet) for the equation of the best line that passes through a set of N points. So, I decided to do this math proof.
For starters, imagine a cloud of points, and a line that passes through those points. In your imagination, you should see some points hit the lines, while most of the points do not fall exactly on the line. Recall that the equation of a line is:
Our goal is to calculate m, the slope of the line, and to calculate b, the y-intercept.
For any point that does not fall exactly on the line—say x’,y’— The error (e) is:
Why? Because the equation of the line would calculate y as mx’+b. However, we know that the true value that corresponds to x’ is y’. Thus, the difference is y’-y, which is e above.
Okay, so next let’s square the error to get a positive value, and let’s get rid of the ‘ to reduce clutter:
The idea of squaring the error is attributed to Gauss in the 19th century. And it’s a genius move as you’ll see later. As an aside: it’s amazing what “old” scientists were able to accomplish without the aid of calculators and computers.
Now let’s add up all the errors in the set of points, and call this total error E:
What makes squaring the error and summing all errors a genius move by Gauss?
Well if you know calculus, you know that if you take derivatives of formulas and set them equal to zero, you can find minimal points on the curve (think parabola). Thus, we can find that point in the equation where the error is minimized for m and for b, by taking partial derivatives with respect to m and b.
To calculate m, take the partial derivative of E with respect to m and set it equal to 0, then solve for m:
m=(Σxy-bΣx)/Σx² or b=(Σxy-mΣx²)/Σx
To calculate b, take the partial derivative of E with respect to b and set it equal to 0, then solve for b:
We’re not done because we have a kind of catch-22—our m equation contains b and our b equation contains m.
To fix this and find m, set the two b equations equal to one another:
Next to find b, set the two m equations equal to one another:
And we are done! So to find the equation of a line that passes through your point cloud consisting of N points, calculate