Physics307L:Help/Fitting a line: Difference between revisions

Latest revision as of 11:48, 19 October 2009

Physics 307L, Fall 2010

Home Schedule People Interactions Labs Assignments Grading Safety Help

Take home message from this class

There are statistically sound methods for obtaining the maximum likelihood slope and intercept to fit a set of data of the form [math]\displaystyle{ (x_i,y_i) }[/math]. This really is the take home message...I want you to remember enough to know that you can do it and to be able to quickly find the resources so you can remind yourself of the necessary assumptions about the data and the formulas (or algorithms) for calculating the best fit values, along with uncertainty. Two good resources:

Chapter 6 ("Least-squares fit to a straight line") of Bevington and Robinson second edition.
Chapter 8 ("Least-squares fitting") of Taylor second edition.

In order to leave the class with this confidence (knowing you can do it and where to find material to refresh your memory), you'll need to practice the techniques during your labs! There are plenty of labs (in fact a majority of them) where least-squares fitting to a line can and should be implemented.

Theoretical background

Assumptions

It is beyond the scope of this class to describe the methods with the least assumptions possible. For example, you can do least-squares fitting when uncertainties in both x and y are important, but here we'll assume only uncertainties in y. We're also only talking about a linear fit (y=A + B*x)...extension to quadratic and higher order is not too difficult but we're not doing that here.

Assume that the data should follow a linear relationship. You can assess this assumption by examining the residuals of the best fit line.
Assume that the uncertainty in each [math]\displaystyle{ y_i }[/math] is normally distributed, with a standard deviation of [math]\displaystyle{ \sigma _i }[/math]
- Sometimes, for clarity, we'll assume that there is one common σ for all data points...and many of the built-in algorithms have this assumption. (If your algorithm in matlab or Excel does not ask you for an array of uncertainties, then you know it's assuming a fixed uncertainty!)
- If your y_i are each the mean of a bunch of independent measurements with a constant parent distribution, then the central limit theorem says this mean will be normally distributed.
- If your y_i are single measurements, then a normal distribution may still be valid...provided central limit theorem "version 2" applies: that your error in y_i results from the accumulation of a bunch of independent sources of random error.
- If your y_i measurements arise from processing of another variable with normally distributed error, then you may need to challenge this assumption.
Assume the principle of maximum likelihood is valid.

Derivation

See the Bevington or Taylor books for derivations. For the special case of fixed σ for all [math]\displaystyle{ y_i }[/math], you can see the derivation here.

Formula for best fit (maximum likelihood) parameters

[math]\displaystyle{ y = A + B*x }[/math]

General case, individual σ_i

[math]\displaystyle{ A=\frac{\sum \frac{x_i^2}{\sigma_i^2} \sum \frac{y_i}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2}}{\Delta} }[/math] [math]\displaystyle{ \mbox{,}~~~~~~~~~~~~~~~~~ \sigma_a^2 = \frac{1}{\Delta} \sum \frac {x_i^2}{\sigma_i^2} }[/math]

[math]\displaystyle{ B=\frac{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{y_i}{\sigma_i^2}}{\Delta} }[/math] [math]\displaystyle{ \mbox{,}~~~~~~~~~~~~~~~~~ \sigma_b^2 = \frac{1}{\Delta} \sum \frac {1}{\sigma_i^2} }[/math]

[math]\displaystyle{ \Delta=\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \left (\sum \frac{x_i}{\sigma_i^2} \right)^2 }[/math]

Special case, constant σ_y (note: Δ has different units)

[math]\displaystyle{ A=\frac{\sum x_i^2 \sum y_i - \sum x_i \sum x_i y_i}{\Delta_{fixed}} }[/math] [math]\displaystyle{ \mbox{,}~~~~~~~~~~~~~~~~~ \sigma_a^2 = \frac{\sigma_y^2}{\Delta_{fixed}} \sum x_i^2 }[/math]

[math]\displaystyle{ B=\frac{N\sum x_i y_i - \sum x_i \sum y_i}{\Delta_{fixed}} }[/math] [math]\displaystyle{ \mbox{,}~~~~~~~~~~~~~~~~~ \sigma_b^2 = N \frac{\sigma_y^2}{\Delta_{fixed}} }[/math] (I believe this is the same as 1/N * σ_y² / σ_x², where σ_x² is the population variance of the experimental x values (not the variance of an individual x measurement)

[math]\displaystyle{ \Delta_{fixed}=N \sum x_i^2 - \left ( \sum x_i \right )^2 }[/math] (This is actually N² times the population variance of x...not sure if that helps in any kind of understanding, though.)

[math]\displaystyle{ \sigma_y }[/math] can be inferred from the chi-squared value and the number of degrees of freedom. If you have an independent estimate of [math]\displaystyle{ \sigma_y }[/math] (e.g. SEM of several indpendent measurements), this should be consistent with the inferred uncertainty. LINEST (excel linear fitting) uses the implied value for calculating the uncertainty of the fit parameters.

[math]\displaystyle{ \sigma_{y~implied}^2 = \frac {1}{N-2} \sum (y_i - A - Bx_i)^2 }[/math] [math]\displaystyle{ \mbox{,}~~~~~~~\mbox{recall,}~~ \chi^2 = \frac {1}{\sigma_y^2} \sum (y_i - A - Bx_i)^2 }[/math]

Example Excel Sheet

I have used the values from Table 6.1 in Chapter 6 of Bevington, 2nd edition. This is the excel sheet I showed in class on November 3, 2008. I've tried to make it a little easier to read -- but the sheet will not make any sense to you if you don't look at the formulas on this page and play around with the numbers a bit.

File:Fitting a Line Chapter 6 of Bevington.xls

Notes

Steve Koch 16:37, 9 November 2008 (EST): Thanks to Justin for finding a typo in the implied sigma!

@@ Line 1: / Line 1: @@
+{{Template:Physics307L}}
+<div style="padding: 10px; width: 720px; border: 5px solid #008;">
 ==Take home message from this class==
 '''There are statistically sound methods for obtaining the maximum likelihood slope and intercept to fit a set of data of the form <math>(x_i,y_i)</math>.'''  This really is the take home message...I want you to remember enough to know that you can do it and to be able to quickly find the resources so you can remind yourself of the necessary assumptions about the data and the formulas (or algorithms) for calculating the best fit values, ''along with uncertainty''.  Two good resources:
@@ Line 23: / Line 26: @@
 ==Formula for best fit (maximum likelihood) parameters==
+:<math> y = A + B*x</math>
 ===General case, individual σ<sub>i</sub>===
-:<math>A=\frac{\sum \frac{x_i^2}{\sigma_i^2} \sum \frac{y_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2}}{\Delta}</math>
+:<math>A=\frac{\sum \frac{x_i^2}{\sigma_i^2} \sum \frac{y_i}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2}}{\Delta}</math> <math>\mbox{,}~~~~~~~~~~~~~~~~~ \sigma_a^2 = \frac{1}{\Delta} \sum \frac {x_i^2}{\sigma_i^2} </math>
-:<math>B=\frac{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{y_i}{\sigma_i^2}}{\Delta}</math>
+:<math>B=\frac{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{y_i}{\sigma_i^2}}{\Delta}</math> <math>\mbox{,}~~~~~~~~~~~~~~~~~ \sigma_b^2 = \frac{1}{\Delta} \sum \frac {1}{\sigma_i^2} </math>
 :<math>\Delta=\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \left (\sum \frac{x_i}{\sigma_i^2} \right)^2 </math>
-===Special case, constant σ (note: Δ has different units)===
+===Special case, constant σ<sub>y</sub> (note: Δ has different units)===
+:<math>A=\frac{\sum x_i^2 \sum y_i - \sum x_i \sum x_i y_i}{\Delta_{fixed}}</math> <math> \mbox{,}~~~~~~~~~~~~~~~~~ \sigma_a^2 = \frac{\sigma_y^2}{\Delta_{fixed}} \sum x_i^2 </math>
+:<math>B=\frac{N\sum x_i y_i - \sum x_i \sum y_i}{\Delta_{fixed}}</math> <math> \mbox{,}~~~~~~~~~~~~~~~~~ \sigma_b^2 = N \frac{\sigma_y^2}{\Delta_{fixed}} </math> (I believe this is the same as 1/N * σ<sub>y</sub><sup>2</sup> / σ<sub>x</sub><sup>2</sup>, where σ<sub>x</sub><sup>2</sup> is the population variance of the experimental x values (not the variance of an individual x measurement)
+:<math>\Delta_{fixed}=N \sum x_i^2 - \left ( \sum x_i \right )^2</math>  (This is actually N<sup>2</sup> times the population variance of x...not sure if that helps in any kind of understanding, though.)
-:<math>A=\frac{\sum x_i^2 \sum y_i^2 - \sum x_i \sum x_i y_i}{\Delta_{fixed}}</math>
+<math>\sigma_y</math> can be inferred from the chi-squared value and the number of degrees of freedom.  If you have an independent estimate of <math>\sigma_y</math> (e.g. SEM of several indpendent measurements), this should be consistent with the inferred uncertainty.  LINEST (excel linear fitting) uses the implied value for calculating the uncertainty of the fit parameters.
+:<math>\sigma_{y~implied}^2 = \frac {1}{N-2} \sum (y_i - A - Bx_i)^2</math> <math>\mbox{,}~~~~~~~\mbox{recall,}~~ \chi^2 = \frac {1}{\sigma_y^2} \sum (y_i - A - Bx_i)^2 </math>
-:<math>B=\frac{N\sum x_i y_i - \sum x_i \sum y_i}{\Delta_{fixed}}</math>
+==Example Excel Sheet==
+I have used the values from Table 6.1 in Chapter 6 of Bevington, 2nd edition.  This is the excel sheet I showed in class on November 3, 2008.  I've tried to make it a little easier to read -- but the sheet will not make any sense to you if you don't look at the formulas on this page and play around with the numbers a bit.
+* [[Image:Fitting a Line Chapter 6 of Bevington.xls]]
-:<math>\Delta_{fixed}=N \sum x_i^2 - \left ( \sum x_i \right )^2</math>  (This is actually N<sup>2</sup> times the variance of x...not sure if that helps in any kind of understanding, though.)
+==Notes==
+* [[User:Steven J. Koch|Steve Koch]] 16:37, 9 November 2008 (EST): Thanks to Justin for finding a typo in the implied sigma!
+</div>

Physics307L:Help/Fitting a line: Difference between revisions

Latest revision as of 11:48, 19 October 2009

Contents

Take home message from this class

Theoretical background

Assumptions

Derivation

Formula for best fit (maximum likelihood) parameters

General case, individual σ_i

Special case, constant σ_y (note: Δ has different units)

Example Excel Sheet

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools

Physics307L:Help/Fitting a line: Difference between revisions

Latest revision as of 11:48, 19 October 2009

Take home message from this class

Theoretical background

Assumptions

Derivation

Formula for best fit (maximum likelihood) parameters

General case, individual σi

Special case, constant σy (note: Δ has different units)

Example Excel Sheet

Notes

Navigation menu

Search

General case, individual σ_i

Special case, constant σ_y (note: Δ has different units)