Next: , Previous: , Up: Common Calculations   [Index]

11.1 Linear Regression

The “least squares” or “linear regression” algorithm produces a best fitting straight line through the middle of a set of N data points x1,y1, ..., xN,yN. In Chart this means a set of prices Y, and dates X (with non-trading days collapsed out).

For a possible fitted line L(X)= a + b*X, the vertical distance from the line to each point is squared, and a total deviation formed.

SumSquares = (y1 - L(x1))^2 + ... + (yN - L(xN))^2

The line parameters a and b are then chosen to make SumSquares as small as possible (hence the name “least squares”), and there’s just one line with that smallest SumSquares. The calculation is made easier if the X coordinates are shifted so that Mean(X)=0. With that the formulas for a and b are

             y1 + ... + yN
a = Mean Y = -------------

    x1*y1 + ... + xN*yN
b = -------------------
      x1^2 + ... xN^2

A least squares fit is “best” under certain mathematical assumptions: basically that the data points were a straight line to which normally distributed random amounts (positive or negative) have been added. Of course an underlying straight line is unlikely in market price data, or in economics generally, and in particular any cyclical component invalidates the assumptions. Even so the algorithm is quite widely used because it offers an objective basis for fitting a line.

11.1.1 Slope

The slope of the linear regression line, the b above, is sometimes called the regression coefficient. This is available as an indicator (Linear Regression Slope), to show how steep the fitted trend line is. The units are price change per day, which is negative for a downward sloping line. This may or may not be particularly useful so it’s under “Low Priority” in the indicator lists.

11.1.2 Standard Error

Standard error (stderr) is a statistical measure of how much values differ from an assumed underlying curve. It’s calculated as the quadratic mean of the vertical distances from each point to the curve.

Standard error from a linear regression line y=a+bx is

               / (y1 - (a+b*x1))^2 + ... + (yN - (a+b*xN))^2 \
Stderr = sqrt |  -------------------------------------------  |
               \                     N                       /

Notice the numerator is the same SumSquares which was minimized above. Standard error is similar to standard deviation (see Standard Deviation); but where stddev takes differences from a horizontal line (the Y mean), stderr here goes from the sloping linear regression line.

For reference, there’s no need to actually calculate the linear regression a and b, the stderr can be formed directly as

               /               Covariance(X,Y)^2 \
Stderr = sqrt |  Variance(Y) - -----------------  |
               \                  Variance(X)    /

where variance and covariance are as follows (and notice they simplify if X values are chosen to make Mean(X) zero),

Covariance X,Y = Mean (X*Y) - (Mean X) * (Mean Y)
Variance X = Mean(X^2) - (Mean X)^2

Standard error from a linear regression like this is used as a channel width in Kirshenbaum Bands (see Kirshenbaum Bands). It can also be viewed directly as an indicator, but this is probably of limited use and for that reason is under “Low Priority” in the indicator lists.

11.1.3 Additional Resources

Next: , Previous: , Up: Common Calculations   [Index]

Copyright 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2014, 2015, 2016, 2017 Kevin Ryde

Chart is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.