LabWindows/CVI

cancel
Showing results for 
Search instead for 
Did you mean: 

how do I create a line of best-fit and force the intercept to be zero

I have some data of which I want to find a line of best fit (linear) but want to force it through zero - ie. have the intercept be zero and just have a slope. Something like you can do in MS Excel when choosing a trend-line and 'checking' the box where it says 'set intercept to = zero'.
 
The LinearFit() function appears to just do the best fit and the resulting equation will have an intercept if the calculation needs to minimise the errors. Any ideas?
0 Kudos
Message 1 of 6
(27,393 Views)

I hope I understand your question correctly

If you want a line that is parallel to the best fit line found by LinFit() but has a y intercept of zero then simply change the intercept value returned to be zero before using the values that LinFit() returns to plot your data (I am assuming you are plotting something).

If you want to bend the end of the line to move the y intercept to pass through zero then recalculate the slope based on the points (0,0) and a point that is far away from the y axis.  To do this use the slope and intercept returned by LinFit() to find a point far away from the y axis.  Y2=mX2+b where X2 is the point you picked away from the y axis, m is the slope from linfit and b is the intercept from LinFit().  Now recalculate the slope from (0, 0) and (X2,Y2).  You can use LinFit() again if you want to find the new slope or just use the standard line fit formulas.  

0 Kudos
Message 2 of 6
(27,375 Views)
Murray,

If you want to calculate the slope of the best-fit line through (0,0) using the least-squares method, there may be a simpler way. See the attached code.

Regards,
Colin.

Message Edited by cdk52 on 08-11-2006 03:15 AM

0 Kudos
Message 3 of 6
(27,369 Views)

Thanks Guys - I think we are on the right track, but I have not explained my issue correctly it seems.

Colin, your idea of calculating the slope (from the included code) works but it does not force the line through zero. ie. the slope will return the same slope as the LinFit() routine.

I think the idea of using a point far outside the data and then an additional point at (0,0) is also quite smart and will force the data through zero, but the resulting line is not a 'best-fit' of the data. The resulting line could be quite different.

for  eg.

X, Y

0,  0

1,  2

2,  3

3,  4

4,  5

The linear line of best fit is Y = 1.6 X + 1.2, if I was to force the data through 0,0 then the line of best fit gives Y = 2.0 X.   Using the idea of a point 'way out' from the first line of best fit, (say X = 20) then using the equation above give Y as 1.6* 20 + 1.2 => Y = 33.2.   Whereas the real result I am 'hoping' to acheive would be  Y = 2.0 * X = 2.0 * 20 = 40.  Quite a different result. 

I assume I can just iterate through the Y = mX and look for a minimun least-squares error, but I thought that would be rather agricultural.

0 Kudos
Message 4 of 6
(27,346 Views)
Murray,

I'm sorry, but I believe you are wrong when you state "...the slope will return the same slope as the LinFit() routine." This will only be true when the Y-intercept calculated by LinFit() is zero.

I developed my algorithm with the explicit assumption that the line of best fit passes through the origin (0,0), i.e. the equation is of the form Y = aX. I differentiated the expression for the sum of the squares of the point errors with respect to a, set the resulting expression to zero, and solved for a.

Note: If you look at the code, it is clear adding a point at the origin contributes nothing to the sumXY and sumXSquared variables, and therefore does not affect the value of slopeEstimate.

I suggest you test both methods using real data, and see what you think.

Regards,
Colin.

Message Edited by cdk52 on 08-11-2006 03:53 PM

0 Kudos
Message 5 of 6
(27,343 Views)
I was just looking for this same thing, at least if I understand the question.  To restate:  given a set of points, what is the best fit slope m for the equation y=mx:  e.g. this is a one-parameter fit, there is no 0th order term b.

I was refactoring someone else's code and came across a place where they claimed to be doing this.  When I first looked at the code, I thought they were simply averaging the individual slopes between consecutive points, then setting b=0.  This would be quite incorrect.  However, what they were actually doing is averaging the slopes of the individual lines that originate at 0 and pass through each point.  This is a very simple calculation:  take the two arrays of Y and X, ratio them, and average the resulting array.  Unfortunately, though this is generally close (for reasonable data) it is not quite correct.

What is really desired is to find m such that it minimizes the sum of the squares of the error for y=mx.  This can be done with the Generalized LS Fit routine, but the math is so simple that I knew it must be possible to boil it down to a simple equation.  So I ran some sample data in the Generalized LS Fit routine and then coded up Colin's suggestion above.  These two approaches agree perfectly for the data I've tested.  So although I haven't worked through the equations to prove it, I suspect that Colin's algorithm is the desired simplification of the more general approach.  In text, his algorithm is:  find the average of y/x for all points, and divide that by the average of x^2 for all points.  I'm sure the algebra has been done a zillion times for this problem, so if someone could verify it from that perspective, that would be great.

Cheers,
    Dave
-------------------------------------------------------------
David Thomson Original Code Consulting
www.originalcode.com
National Instruments Alliance Program Member
Certified LabVIEW Architect
Certified Embedded Systems Developer
-------------------------------------------------------------
There are 10 kinds of people: those who understand binary, and those who don't.
0 Kudos
Message 6 of 6
(26,738 Views)