NI FPGA Floating Point Mismatch from Host

JJMontante · ‎12-02-2020

I created a simulation, relevant to a project I'm working on, to educate myself on the use of NI Single-Precision Floating Point pallette.

I created an FPGA top-level, and a host-VI simulation using the desktop execution node (DEN).

The FPGA runs on a loop every 100 ticks. During that time, it computes the sin(x) and 32767*sin(x).

The host VI is meant to compare the host implementations of SGL polymorphic Sine and Multiply VIs to the FPGA SGL Sine and Multiply VI results. It contains an initialization loop that creates 1024 points in the range of [0..2*Pi]. That array is wired as an input to an auto-indexing For-Loop, which instantiates a DEN, in parallel with the typical host VIs described above. The output of that For-Loop is some array arithmetic to see how much error exists between the FPGA and the Host VI single-precision-float implementations.

The DEN is configured to execute one time for every 100 simulated clock ticks (corresponding to the 100 tick loop rate in the FPGA).

Results: While the values are *close*, they're not exact, and I was hoping to understand where that loss comes from. My assumption is that it's down to the implementation of the CORDIC sine function implementation of the FPGA, compared to some other implementation of sine implementation on the host.

Is there anything I can do to more closely align the FPGA single-precision-float results to the Host single-precision-float such that the error is minimized?

and

Please explain/confirm the difference between the implementations?

Please see attached project, in LV2019. I attempted to save in LV2017 but got 'enter password to verify' which I skipped for all of the SGL FPGA VIs. So here are LV2019 (works) and LV2017 (maybe works?) versions of the project.

Thanks in advance!

J

Terry_ALE · ‎12-02-2020

One concept that is good to review: https://en.m.wikipedia.org/wiki/Floating-point_error_mitigation

Also, is there a seed setting somewhere?

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

JJMontante · ‎12-02-2020

I haven't seen any mention of seeding related to the Sine and Multiplication blocks, either in the LabVIEW tool palletes or the LabVIEW FPGA Single Precision Math toolkit.

Any suggestion on where I might look?

Terry_ALE · ‎12-02-2020

I did notice a dedicated forum page for this toolkit https://forums.ni.com/t5/Reference-Design-Content/NI-LabVIEW-FPGA-Floating-Point-Library/ta-p/372512... (not much activity but good to know).

I opened the code each function has an option of Instance 0 to 4. I do not know what this means; I haven't used used floats on FPGAs.

I am unable to find help for the individual functions. Found this https://zone.ni.com/reference/en-XX/help/371599P-01/lvfpgaconcepts/fpgasingleprecisfloat/ which says it is compliant to the IEEE standard.

Also, I rarely use DEN. I make test benches using DMA-FIFOs to get data to/from the FPGA. Maybe something to do with the number of clock cycles? I set it to 40 (39+1) but still errors.

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

Intaris · ‎12-03-2020

Due to the handling of certain values and internal uncertainties (rounding and so on) specified within the IEEE 754 standard (the Floating point standard practically all modern hard- and software adheres to) comparisons of two SGL values from different calculation paths may have slight differences. This is part of the wonderful world of floating point arithmetic.

So to judge whether your problem can be at least partially solved, how far away are the values?

Shane

JJMontante · ‎12-03-2020

Terry_ALE

I had already come across the first link, which is very helpful for its tables of latencies for each block within a while-loop. Regarding your other observations:

1) Instance 0 to 4 appears to implicitly generate a CLIP which can be time-shared between different loops, based on the "polymorphic-ish" VI. In this manner, you can create up to 5 or 6 SGL compute elements [some FPGA Float blocks don't have this restriction], for use throughout the design.

2) The DEN is able to either treat each FPGA clock cycle (for fastest clock domain) as a single DEN block execution in the DEN's parent loop, or to perform multiple FPGA loop iterations for each single DEN block execution. The 1:1 iteration works great for SCTLs, and the 1:N works great if using a while loop with a loop timer. Example: Loop timer set to 100 clock cycles means that you may only want to sample inputs (or send outputs) on a 1:100 ratio of DEN executions to FPGA clock cycles. So if you generate a new set of FPGA input stimulus each DEN execution loop iteration, but only sample it every 100 clock ticks in the FPGA, there is less "interface processing" performed at the borders of the DEN.

Intaris

I figured that there may be weirdness between the floating point implementations, though as Terry_ALE pointed out, the SGL FPGA library is supposed to be IEEE754 compliant.

With regards to error, I have used a formula: Error = (Expected - Observed) / Expected, where expected is the host function, and observed is the result of the FPGA blocks executed with Desktop Execution Node. The aim was to define a percentage deviation of expected from observed.

For array X = {0* (2pi/1024}, 1* (2pi/1024), ... 1023*(2pi/1024)}:
Y1 = sin(X),
Max error for all results in array Y2, per the formula above: 0.0465384 = 4.6% error
Y2 = 32767 * sin(X), where both the mult and the sine ops are SGL float FPGA lib blocks.

Max error for all results in array Y1, per the formula above: 0.0465384 = 4.6% error

Which leads me to believe that the arithmetic error is in the SGL implementation of the sine calculation block.

My application is a very low-frequency sine wave that my systems team would like generated in the R-Series FPGA to save money on buying function-generators, or additional PXI cards.

I am working through this exercise to see if I can simplify the coding of a sine-wave generator for arbitrarily slow signals without much block-ram memory impact. Alternately, I'm considering a pre-initialized memory to use as a lookup table. So really this is a learning exercise, while exploring various ways to solve this problem.

Intaris · ‎12-03-2020

4.6% seems high, but I'm not sure such a superficial calculation yields real-world insight.

Try changing the calculations you are doing to N/1024*(2pi) and see if the values change at all. Avoiding very large and very small numbers in the chain of calculations can actually have an effect on the precision of the final outcome.

https://en.wikipedia.org/wiki/Loss_of_significance

Fair notice: I have zero experience with SGL on FPGA. We use FXP exclusively so that we can manage these things ourselves.

Terry_ALE · ‎12-03-2020

Stepping back, do you need floating point? Where are you in the project? Is the purpose here to determine if you will use this function, etc?

Not trying to talk you out of it and clearly it is interesting. But if there is a deadline this is holding up, at some point I'd go fixed point.

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

JJMontante · ‎12-03-2020

Varying the number of points in the range [0,2*pi] at which sine is evaluated changed the max observed error:
128 points: 17% error

1024 points: 4.6% error

65536 points: 0.23% error

Changing from Sine to Square-Root (restricting this to single-input floating point operations):

128 points: 0% error

1024 points: 0% error

65536 points: 0% error

Changing from Square-Root to Natural Log (restricting this to single-input floating point operations):

128 points: 0% error

1024 points: 0% error

65536 points: 0% error

So now the question is: is there a bug in the SGL FPGA library for CORDIC functions or is it a natural loss of precision based on the sine/cosine implementation?

Changing from Natural Log to Cosine:

128 points: 40% error

1024 points: 6.4% error

65536 points: 0.73% error

Note: all inputs for all of the functions above were still within the 0..2*pi range, with inputs spaced equidistant within that range based on number of points. This was because I was lazy and didn't want to change the VI to sweep through those ranges.

JJMontante · ‎12-03-2020

Good point - I don't require floating point, and have recently implemented similar systems with either FXP LUTs or FXP evaluation . At the moment I have the luxury of a few hours for experimentation, so I am using that time to explore alternate problem solving mechanisms. My background is ASIC/FPGA, for DSP applications so fixed-point has always been my go-to.

However, for better or worse, with the industry changing to abstract out mathematics, it seems like a good time to learn things like HLS, or the efficacy of floating-point implementations.

-J

LabVIEW

NI FPGA Floating Point Mismatch from Host

NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host

Re: NI FPGA Floating Point Mismatch from Host