Physics307L:Help/Fitting a line

Take home message from this class

There are statistically sound methods for obtaining the maximum likelihood slope and intercept to fit a set of data of the form [math]\displaystyle{ (x_i,y_i) }[/math]. This really is the take home message...I want you to remember enough to know that you can do it and to be able to quickly find the resources so you can remind yourself of the necessary assumptions about the data and the formulas (or algorithms) for calculating the best fit values, along with uncertainty. Two good resources:

Chapter 6 ("Least-squares fit to a straight line") of Bevington and Robinson second edition.
Chapter 8 ("Least-squares fitting") of Taylor second edition.

In order to leave the class with this confidence (knowing you can do it and where to find material to refresh your memory), you'll need to practice the techniques during your labs! There are plenty of labs (in fact a majority of them) where least-squares fitting to a line can and should be implemented.

Theoretical background

Assumptions

It is beyond the scope of this class to describe the methods with the least assumptions possible. For example, you can do least-squares fitting when uncertainties in both x and y are important, but here we'll assume only uncertainties in y. We're also only talking about a linear fit (y=A + B*x)...extension to quadratic and higher order is not too difficult but we're not doing that here.

Assume that the data should follow a linear relationship. You can assess this assumption by examining the residuals of the best fit line.
Assume that the uncertainty in each [math]\displaystyle{ y_i }[/math] is normally distributed, with a standard deviation of [math]\displaystyle{ \sigma _i }[/math]
- Sometimes, for clarity, we'll assume that there is one common σ for all data points...and many of the built-in algorithms have this assumption. (If your algorithm in matlab or Excel does not ask you for an array of uncertainties, then you know it's assuming a fixed uncertainty!)
- If your y_i are each the mean of a bunch of independent measurements with a constant parent distribution, then the central limit theorem says this mean will be normally distributed.
- If your y_i are single measurements, then a normal distribution may still be valid...provided central limit theorem "version 2" applies: that your error in y_i results from the accumulation of a bunch of independent sources of random error.
- If your y_i measurements arise from processing of another variable with normally distributed error, then you may need to challenge this assumption.
Assume the principle of maximum likelihood is valid.

Derivation

See the Bevington or Taylor books for derivations. For the special case of fixed σ for all [math]\displaystyle{ y_i }[/math], you can see the derivation here.

Formula for best fit (maximum likelihood) parameters

General case, individual σ_i

[math]\displaystyle{ A=\frac{\sum \frac{x_i^2}{\sigma_i^2} \sum \frac{y_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2}}{\Delta} }[/math]

[math]\displaystyle{ B=\frac{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{y_i}{\sigma_i^2}}{\Delta} }[/math]

[math]\displaystyle{ \Delta=\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \left (\sum \frac{x_i}{\sigma_i^2} \right)^2 }[/math]

Special case, constant σ (note: Δ has different units)

[math]\displaystyle{ A=\frac{\sum x_i^2 \sum y_i^2 - \sum x_i \sum x_i y_i}{\Delta_{fixed}} }[/math], [math]\displaystyle{ \mbox{..........} ~~~~~~~~~~~~~~ \sigma_a^2 = \frac{\sigma}{\Delta_{fixed}} \sum x_i^2 }[/math]

[math]\displaystyle{ B=\frac{N\sum x_i y_i - \sum x_i \sum y_i}{\Delta_{fixed}} }[/math]

[math]\displaystyle{ \Delta_{fixed}=N \sum x_i^2 - \left ( \sum x_i \right )^2 }[/math] (This is actually N² times the variance of x...not sure if that helps in any kind of understanding, though.)

Physics307L:Help/Fitting a line

Contents

Take home message from this class

Theoretical background

Assumptions

Derivation

Formula for best fit (maximum likelihood) parameters

General case, individual σ_i

Special case, constant σ (note: Δ has different units)

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools

Physics307L:Help/Fitting a line

Take home message from this class

Theoretical background

Assumptions

Derivation

Formula for best fit (maximum likelihood) parameters

General case, individual σi

Special case, constant σ (note: Δ has different units)

Navigation menu

Search

General case, individual σ_i