General polynomial fit SVD method produces obviously wrong bad fit

James_G1 · ‎09-30-2010

I am using 2009 SP1

For orders 0-3 it seems to works. Orders 4 and above produce results that at so clearly not a best fit I can imagine how its possible. I've seen other posts about this but no one seems to fully acknowledge that this is clearly a bug. The other techniques work well, by eye anyway, and totally agree with each other for the least squared fit.

Can anyone explain this? Does NI have a bug report for this? Does 2010 still have this issue?

I am attaching a vi with my data set and a loop so you can play with the fits and watch the non-sense.

johnsold · ‎09-30-2010

Your data does not look much like a polynomial of any order to me. What is the underlying model for the data source?

I suspect that the very rapid fluctuations in the data from zero to 5000 are driving a high order polynomial fit crazy. In general using too high an order for a polynomial fit produces extreme results outside the dataset and in your case within as well.

Lynn

James_G1 · ‎09-30-2010

There really is not an underlying source model for the data.

I understand that a high order polynomial is terrible for extrapolation since it is no longer confined to fitting data, as a result usually goes rapidly up or down beyond the region where data was used in the fit.

I am not extrapolating in this case so that argument is not really applicable. In addition 4th order is not really a high order polynomial and my data points are more than 400.

If you look at the fit you can see a large DC offset. Simply shifting the curve upward would clearly reduce the mean squared error, wouldn't you agree?

The 4th order fit never ever crosses my data set and that makes no sense under any circumstance.

Do you see the same offset with the 4th order fit that I am refering to? I am asking just incase it somehow looked different on your comp.

altenbach · ‎09-30-2010

Your problem is ill conditioned because your x-range is astronomical: 65000^4 is on the order of 10^20. The higher the order, the more serious the problem!

All you need to do is e.g. divide the x-values by 10000, see if things improve. 🙂

(If you divide [x] by 10000, SVD is good up to an order of 15!)

LabVIEW Champion.

johnsold · ‎09-30-2010

Wire an indicator to the Residue output. Then try the various algorithms and fitting methods. The SVD algorithm with the Least Square method gives a much larger residue than any of the other combinations. Since the code is hidden behind Call Library Function Nodes, you can only go by what is in the Help files.

The nature of your data set and the combination of algorithms and methods seems to produce a combination at that point which does not calculate a good fit. The residue can be used as an indicator of the quality of the fit.

Specially if you are fitting data to a curve and there is no theoretical basis for choosing the function you are fitting, you need to be careful to evaluate the "goodness" of the fit.

Lynn

altenbach · ‎09-30-2010

There is a VI in the linear algebra palette that can give you the condition number of a matrix.

You don't need to use the built-in polynomial fit, you can equally well use the general linear fit and setup the H matrix with columns corresponding to integer powers of your X vector.

You will notice that the HtxH-matrix is very ill conditioned for high X and high orders. You are skating on thin ice leaving the x-range as-is. It is numerically suspect.

LabVIEW Champion.

James_G1 · ‎09-30-2010

It appears altenbach is correct.

What exactly is the origin of this issue? Simply dividing x by a constant should not change the accuracy of the floating point representation of the number.

Why do the other algorithms not suffer from this? To push it I multiplied my x by 10^20 and brought the fit to 6th order which means 10^144 and all the other algorithms have no problem.

I tried reading the documentation on this, and did not find mention of a limitation on the product of order by max xValue, did I miss that foot note?

James_G1 · ‎09-30-2010

altenbach,

Your above direction in to the maxtrix condition number has led me to more questions. I need to read on this more I must admit. I feel simply scaling data should not change the ability to be solve a set of linear equations. This just feels wrong to me.

At any rate you solution does in fact work. Thank you both for your interest in my problem.

Darin.K · ‎09-30-2010

Floating point numbers are stored in a format mantissa*2^(exp), with a specified number of bits to store each. Lets say you use m bits to store the mantissa and n for the exponent. Typically, the leading bit of the mantissa is assumed to be one, so you effectively have m+1 bits. The smallest increment you can represent is (1/2^{m+1})*2^n or 2^(n-m-1). As the number increases, the increment increases as well, that is why we normally say you have x digits for a given representation. The accuracy, as expressed by the magnitude of that last digit, depends on the size of the number.

What is my point? Scaling X works in this case because the smallest X values are not fully exploiting all of the bits at their disposal. Dividing by say 10^4 doesn't effect the accuracy of the small number representations but helps dramatically on the large X values raised to large powers. At some point you start losing information on the small X values and scaling no longer helps. Then you either declare victory, change algorithms, change representations, change the problem, or move on.

The Givens and Householder (I always use Householder) transformations are inherently effective with these type problems (no coincidence why they are popular). Most algorithms include scaling steps to increase numerical stability.

James_G1 · ‎11-10-2015

Just a note. I ran into this again by accident. Some code that I wrote using this vi seemed to be working fine for a while, then suddenly crazy results. Turns out I began sending in data sets whose x values were suddenly larger by roughly a factor of 10. As mentioned above, the residue did tell the story. Massive residues tipped me off. I had totally forgotten about this post and found it again when i did a search of this issue.

As Darin.K. stated, SVD is just not a good method. The Householder and Givens seem to work really well. I wish the SVD was not the default method because I feel many will fall into this trap like I did. At least there should be a note in the help file. This can really catch you off guard.

Darin.K., I apparently never accepted your advice as the solution. Sorry about that. I accepted it just now. 5 years later!

James

LabVIEW

General polynomial fit SVD method produces obviously wrong bad fit

General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit

Re: General polynomial fit SVD method produces obviously wrong bad fit