Following John R. Taylor, "An Introduction to Error Analysis," 2nd edition, Chapter 8:
We have a relation as follows, and want to fit and to the data

Assume same gaussian distribution for random error in each (same for all). Not necessary, but simplifies derivation and results
Principle of maximum likelihood
For a given and , the probability for each is:

And we can call the probability of getting all of the data points as:

Each term has the same σ_{y}, so can be simplified as:


To maximize the probability, minimize the chisquared sum ... take derivatives, solve system of equations, obtain:



Can also derive formulas for weighting each point individually
Also, formulas for calculating uncertainty in fit parameters