Why is a linear fit giving different results with different X values?

tst · ‎11-27-2014

OK, math geeks (Darin, that's probably you...):

Why are the two calls to the VI returning different results? Ostensibly, I would expect them to return the same values, as the actual slope is the same. I understand this is probably something that has to do with the fitting algorithm (other algorithms return other results), but my math sucks - http://en.wikipedia.org/wiki/Least_squares

Incidentally, this doesn't reflect accurately the issue I was seeing. In my case, the difference between the X values was 1, but the X values themselves were also much larger (they were timestamps, so 3.something billion). I made the numbers smaller to make it easier to see, because presumably the cause is the same.

___________________
Try to take over the world!

Sam_Sharp · ‎11-27-2014

Interesting problem - I wonder if it has something to do with being a floating point number and when you add 1000 it drops off the fractional part? According this table http://zone.ni.com/reference/en-XX/help/371361H-01/lvhowto/numeric_data_types_table/ a DBL is about 15 decimal digits and when you add the 1000 it gets to about 11 digits...perhaps it loses precision internal to the VI (because of the operations it performs)?

I think if you were to repeat it by replicating the algorithm using EXTs instead of DBLs you would get the right answer?

LabVIEW Champion, CLA, CLED, CTD
(blog)

johnsold · ‎11-27-2014

I think Sam may be on the right track with regard to resolution. However, the fitting Vis are not polymorphic (using CLFN internally) so you are stuck with doubles unless you want to re-write everything from scratch.

Lynn

johnsold · ‎11-27-2014

Another thought: Try subtracting the start time (or any other fixed, known time close to the times in the data) from your timestamps. Then you have much smaller numbers unless your data spans decades or centuries.

Lynn

tst · ‎11-27-2014

Just to be clear, I don't have an actual problem - once the source of the problem was clear (which required logging the relevant data at the client's, since this would only happen around once a day, when they managed to generate a specific circumstance), coming up with alternatives was easy.

Anyway, my initial thought was also something about limited accuracy, which would chime with the fact that the distance between the X values needed to cause this changes as the X values themselves grow, but it still doesn't make much sense to me.

___________________
Try to take over the world!

altenbach · ‎11-28-2014

Note that you can use the "least absolute residual" method instead and it no longer blows up. If you only have tow points, it should give you the equivalent result.

LabVIEW Champion.

Darin.K · ‎12-01-2014

Almost slipped this one through over the long weekend. Still do not have LV available to test, but there are enough hints in the original post to figure out what happens using a pen and paper.

In a linear least square fit, the precision is limitied by a quantity related to the difference between the sum of the squares of x and the square of the sum of the x values. When this quantity becomes zero, the slope is infinite.

For normalized floating point operations (first mantissa digit nonzero) you can estimate the allowed precision using epsilon which is the smallest power of 2 which can be added to 1 and give a result which is distinguishable from 1. In this case you need to estimate using the square root of epsilon because you are squaring values.

I wish the OP had posted the largest x0 value that worked instead of leaving a gap between working (10^0) and not working (10^3), correct me if I am wrong, but I will guess that 10^2 works and that you get the right value for x = 100,100.0000001. So x1 = x0(1+10^-9) works, but x1=x0(1+10^-10) does not. Using my handy conversion 2^10 ~ 10^3, I can estimate the limit to be in between 2^(-30) and 2^(-33). I know that for DBL that the sqrt(eps) has an exponent of 26, so it seems that LV is using EXT precision in the LS calculation. That is a relief, I do not have to get into another battle over something "working as intended".

Now comes the LV exercise to test my hypothesis:

Instead of using powers of 10, create two x values which are powers of 2, x0 = 2^m, x1 = x0 + 2^(-n). You should find a value of m+n which lies between 30 and 33 for which the fit works for m+n and fails for m+n+1. That is not really a bold prediction, the original post seems to demonstrate that. But now check against a value I do not have off the top of my head: Take an epsilon constant and choose EXT precision. Take its squre root and feed it into the Mantissa and Exponent function (or log2). The value of sqrt(eps) should lie between 2^-(m+n) and 2^-(m+n+1).

tl;dr : If my guess is right and the last working value for x0 is 100, then a pat on the back to NI for using EXT precision in the LS calculation. If my guess is wrong and the last working value is 1 instead, then I will be shaking my fists (probably to no avail as usual).

If fiddling with constants until it works is sufficient, then altenbach has the answer, try a method which is more roundoff friendly, for instance using abs(x) to give postiive definite values instead of x^2. Math folks like x^2 because it is analytic but those stuck with floating point math have other issues to consider.

Spoiler

You guys are probably too lazy busy and I will have to write the LV code myself to find the answer.....

altenbach · ‎12-01-2014

Of course if you have exactly two points, you can calculate the slope directly in a way that remains safer over a wider range. No need to deal with least squares and such. 😉

(Also note that even your slope1 is incorrect, it should be 100000, not 99997.5.)

LabVIEW Champion.

tst · ‎12-01-2014

@Darin.K wrote:

I wish the OP had posted the largest x0 value that worked instead of leaving a gap between working (10^0) and not working (10^3)

Very well. I don't have access to LV either, but I did ask the coworker who actually ran into this to test that exact thing for X values which differ by 1, and that email says that the last X value where this doesn't happen is 3037000499.

Anyway, as said, solutions are easy once you know what the problem is. This was mainly a matter of curiosity. I have no idea which solution was actually implemented, though.

Darin.K wrote:
You guys are probably too lazy busy and I will have to write the LV code myself to find the answer.....

Not only will I not have time for this today or tomorrow, I haven't even had time to properly follow and understand your reply, which I expect is probably correct. That said, the main issue is less with lack of time and more with lack of understanding of the math. Hopefully once I follow your explanation and the code that you will inevitably post, I will understand it better. 😉

___________________
Try to take over the world!

Darin.K · ‎12-01-2014

Well I managed to test it out:

The last working combination is m+n = 31, m+n=32 gives Infinite slope.

log2(sqrt(eps)) = 31.5 for EXT precision.

So LV uses EXT precision for the LS calculation after all.

And in case you are wondering 1/sqrt(eps) = 3037000499.976, perhaps that looks familiar.

LabVIEW

Why is a linear fit giving different results with different X values?

Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?

Re: Why is a linear fit giving different results with different X values?