FPGA timing different in execution mode and simulation mode

MrJackHamilton · ‎01-29-2020

? Why are you using the timed loop in the FPGA?. I'm shocked LabVIEW actually lets you put one there...Take the timed loops out, drop in the FPGA Real-Time Time delay functions and set them accordingly in normal FOR or while loops. You can set the delay to clock ticks or usec.

In the FPGA, an untimed while or FOR loop with a I/O call like Analog read will take exactly the FPGA clock cycles to perform the analog conversion reading - which is more than one FPGA clock cycle, usually many more.

You should ALWAYS put a shift register with the ticks function in your FPGA loops and actually read the actual loop timing. Tnow - T1 = loop iteration time. The more code you put into a single FPGA loop, the more clock cycles it will take...thankfully FPGA run at 40 or 80 mHz..so you have lots of cycles to do things.

FPGA coding is deceptively simple, elegant and maddening for that same reasons. Any loop you create in the FPGA is a timed loop - it actually runs in hardware...FPGA's are amazing things.

Thomas444 · ‎01-29-2020

https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z000000P8sWSAS&l=cs-CZ

The single-cycle Timed Loop (SCTL) is a special use of the LabVIEW Timed Loop structure. Timed Loop structures are always SCTLs when used in an FPGA VI. When used with an FPGA target this loop executes all functions inside within one tick of the FPGA clock you have selected.

MrJackHamilton · ‎01-30-2020

Although this is interesting. It's not something very useful, as only loops doing bit shifting would be able to run at a single FPGA clock cycle.

Its inappropriate to use them with IO calls, or FIFO reads/writes as they cannot possible run in a single FPGA clock cycle. I am amazed you'd not get a complilation warning - when using STL with FPGA IO Calls as they would create a timing violation.

The FPGA STL is not magical, and will not compress multi-clock cycle processes into one cycle.

cbutcher · ‎01-31-2020

@MrJackHamilton wrote:

Although this is interesting. It's not something very useful, as only loops doing bit shifting would be able to run at a single FPGA clock cycle.

Its inappropriate to use them with IO calls, or FIFO reads/writes as they cannot possible run in a single FPGA clock cycle. I am amazed you'd not get a complilation warning - when using STL with FPGA IO Calls as they would create a timing violation.

The FPGA STL is not magical, and will not compress multi-clock cycle processes into one cycle.

Unfortunately (or perhaps more accurately, fortunately) this is just not true.

Digital IO will typically run in a single cycle, so you can also include "bit-banging" as a use of SCTLs.

Since this allows you to write high-speed communication drivers (SPI is a simple application I've used this for) it's very important on FPGA.

FIFO reads/writes will also operate in a single cycle, but must have timeout = 0.

Spoiler

You must also set arbitration to either the "as-needed" or "none" options (not sure the exact wording from the top of my head) and if you set to none, and then place two FIFO reads/writes in the same cycle, the behaviour will be undesired.

So make sure not to read or write twice in one cycle... Also note that Flat Sequence Structures have no effect in SCTL, which can be a misleading problem - everything happens at the same time (for want of a better description) in the SCTL.

These issues will NOT produce failed compilation - just garbage input/output.

You must also set arbitration to either the "as-needed" or "none" options (not sure the exact wording from the top of my head) and if you set to none, and then place two FIFO reads/writes in the same cycle, the behaviour will be undesired.So make sure not to read or write twice in one cycle... Also note that Flat Sequence Structures have no effect in SCTL, which can be a misleading problem - everything happens at the same time (for want of a better description) in the SCTL.These issues will NOT produce failed compilation - just garbage input/output.

Setting a non-zero timeout will produce a compilation warning (I think at intermediate file generation).

Putting lots of stuff in a single cycle will produce a timing violation during compilation (much later in the process, so try to avoid this). Pipelining is the common solution.

Some complex blocks (e.g. FFT, various processing/numerical operations) are implemented via something akin to Express VIs, and have configuration that control the number of cycles necessary to produce valid output, and then boolean inputs and outputs for handshaking/pipelining.

The LabVIEW FPGA online course (available if you have the SSP) explains these and many more concepts in great detail.

Intaris · ‎01-31-2020

@MrJackHamilton wrote:

Although this is interesting. It's not something very useful, as only loops doing bit shifting would be able to run at a single FPGA clock cycle.

Its inappropriate to use them with IO calls, or FIFO reads/writes as they cannot possible run in a single FPGA clock cycle. I am amazed you'd not get a complilation warning - when using STL with FPGA IO Calls as they would create a timing violation.

As a generalisation, this is obviously false. There is SOME code which cannot be used in SCTLs, but there's absolutely nothing inherently incompatible between SCTLs and IO at all. Out entire FPGA code runs exclusively in SCTLs. WIth FIFOs. At up to 160 MHz.

MrJackHamilton · ‎02-01-2020

The issue is and what I am attempting to point out. Is that timing in an FPGA is quite determinant - within the constraints of the operations a particlar loop in the FPGA performs.

Unlike the CPU run on an OS. There are few tweeks that can be employed at the FPGA level to 'speed' up processes. They take, what they take. On LabVIEW Windows, using a Timed Loop can improve loop jitter and performance over a standard While/FOR loop. As its negoiating with the OS to get a higher priority timeslice of the CPU that's shared with other threads running in the OS.

There is no 'processor' in an FPGA. What you code becomes a circuit - this is awesome, but also requires the code approach akin to a circuit design than traditional LabVIEW coding. Cause there are a limited set of optimizations that can be employed.

Why I am saying this, and I feel needs to be pointed out. Is we have come across quite a few new engineers to LabVIEW, Real-Time and FPGA who are making some broad assumtions about that they do. And at times NI Sales and Marketing is not helping with that message. [They love to sell PXI systems, where quite often others systems will do the task]

I've recovered quite a few situations where customers were ready to discard cRIO, and FPGA systems due to 'poor performance' where really it was their lack of experience that was the limiting factor. Of expecting magically performance from systems.

Can you use an STL in FPGA - sure, will it solve an unsolvable timing problem in an FPGA loop with too many operations in it...no. That has to be solved using a different approach.

I am not wanting to cross swords here, only add another viewpoint to the discussion.

Regards

Jack Hamilton

cbutcher · ‎02-01-2020

Dear OP - I'd guess perhaps this question is solved, and I'd suggest marking GerdW's response in message #7 as the conclusion.

If that doesn't answer your question, could you please reiterate it now, because I feel we're moving a little off-topic.

Dear My Hamilton,

Whilst it's true the code that can be placed in a SCTL is a reduced subset of all LabVIEW code, it's still pretty powerful and I think a large benefit of LabVIEW's FPGA module is making that available to people (like me) who have no experience with VHDL or similar.

I don't believe the design is significantly different to desktop or RT design (actually, I've recently had far more problems with RT...) but as someone who's done a small amount of circuit design (PCBs), it's definitely more like LabVIEW than a hardware system. The abstraction is immense.

@MrJackHamilton wrote:

The issue is and what I am attempting to point out. Is that timing in an FPGA is quite determinant - within the constraints of the operations a particlar loop in the FPGA performs.

...

There is no 'processor' in an FPGA. Can you use an STL in FPGA - sure, will it solve an unsolvable timing problem in an FPGA loop with too many operations in it...no. That has to be solved using a different approach.

...

Regards

Jack Hamilton

This is probably a fair set of comments, but I'm less sure about the rest of your post. Of course there are limitations to what can be done in 25ns (or less on higher clock speeds) but really it's pretty incredible... and in any case, it's irrelevant to the OP's question (why is simulated timing not accurate - answer seemingly being, it never claimed it would be!).

If the OP would like to continue this thread (with further questions or probing of the topic) perhaps we can move back to that, in the spirit of your latter remark:

@MrJackHamilton wrote:

I am not wanting to cross swords here, only add another viewpoint to the discussion.

I suspect we (or I, I shouldn't speak for Intaris) just wanted to clear up some possible confusion for future readers regarding "timed loops" on FPGA - namely that there are no "timed loops" in the same fashion as on RT, but that rather you have Single-Cycle Timed Loops, with specific limitations and specific promises (i.e. one cycle per clock tick). This promise makes programming much easier in some manners...

Thomas444 · ‎02-03-2020

The question persists - how should I improve my code to get my processing finished in time of 10 μs. Leaving aside that from theory (and verified in simulation mode), I would expect the processing to be finished in about 1 μs.

Removing SCTLs is not an option as they actually improve timing. I could write multiple elements to the FIFO so it takes only 1 cycle, but it would be more complicated as some of the elements would be zero.

cbutcher · ‎02-03-2020

Ok, thanks for the restatement.

Looking at your opening images again, I'd be tempted to see if I could rework the 3rd picture (the filter) into a single pipelined SCTL (and remove the outer loop, the For loop and the inner SCTLs). You should then be able to get a clearer statement about the number of ticks required from input to valid output.

I'm not sure how you have all of the nodes configured, so I can't try and really describe how to go about that, or if it's even possible.

I will note that writing multiple times to the same FIFO inside a SCTL won't do anything good for you - see the details here: Understanding Arbitration Options (FPGA Module) in particular the end section about "Never Arbitrate" (which is, I think, the only way you could compile multiple writes in a SCTL).

Thomas444 · ‎02-03-2020

@cbutcher wrote:

Ok, thanks for the restatement.

Looking at your opening images again, I'd be tempted to see if I could rework the 3rd picture (the filter) into a single pipelined SCTL (and remove the outer loop, the For loop and the inner SCTLs).

You actually can't do that as Butterworth filter is SCTL itself and you can't nest SCTLs.

@cbutcher wrote:

Looking at your opening images again,

I uploaded a simplified FPGA project for GerdW, reuploading it here again somewhere.

@cbutcher wrote:

I will note that writing multiple times to the same FIFO inside a SCTL won't do anything good for you - see the details here: Understanding Arbitration Options (FPGA Module) in particular the end section about "Never Arbitrate" (which is, I think, the only way you could compile multiple writes in a SCTL).

The arbitration option is for SIMULTANEOUS writes - only if you have multiple accessors which could write at the same moment. Functionally, arbitration is just a complex FPGA semaphore frame for FIFO access. Also, when configuring writing multiple elements to FIFO in FIFO properties, you can choose only values like 1,2,4,8,16,32 elements per write,which indicates that the writing is different method.

My problem as described in the first post is that in simulation mode I get timing I kind of expect: 30 FPGA ticks per cycle, whereas in execution mode, after compilation, I often found that timing is more than 400 ticks per cycle, resulting in missing every second piece of data. I was told here that it is normal that simulation timing is less than in execution, but I would like to find WHY it goes ten times longer in execution mode. I hoped for some basic mistake in my project.

LabVIEW

FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode

Re: FPGA timing different in execution mode and simulation mode