07-12-2024 10:30 AM
Hi,
I am trying to have a fast measurement and control loop in the FPGA in a NI9049 CRIO. I will be having a NI9220 for acquiring 16 AI plus a number of m·NI9477 for controlling m·32 digital outputs (pulses). Due to the huge number of DO, using arrays simplifies the design a lot.
I have implemented a user controlled I/O targeting 10 us per cycle (i.e. 100 kHz of the NI9220). The loop has the (1) AI, (2) some logic over the inputs for timing the pulses and control the pulse duration, (3) the DO, and (4) a FIFO for the communication to the RT side in charge of the data logging and other parts of the code.
However, the code is not able to meet the specified 10 us per loop. For locating the culprit I created partial case structures that deactivate different components in the code, and found that the problem is in the array operations (3). I have tried a few options: (a) for loops, or (b) sums, multiplications and logical operations on the array.
With option (a) I can go to 20 us, with option (b), to my surprise, I get stucked in 25 us. Either case, I am far from the expected 10 us, and this is for 3 NI9477, not for the expected 7 cards.
Any idea of what to use? I am certainly suprised since I didn't expect these simple operations could affect so much the performance.
Version a: FPGAmain.vi
Version b: FPGAmain_v2.vi
Operation without the array operations (version b):
Operation with the array operations (version b):
Solved! Go to Solution.
07-12-2024 12:35 PM
Hi Litos,
At first glance, I see a few problems with your FPGA code:
- Reading / writing large arrays on the front panel at each iteration can be slow. Controls / indicators on the front panel of the top-level FPGA VI are actually IO registers that are made to communicate with the outside. Use shift registers, FPGA registers or FPGA Memories instead if you want to store data internally.
- Race conditions: you are reading and writing to controls "MODx control" / "MODx request" at each iteration and also expect the external user to set their values. The user inputs will most likely be overridden by the writes at the end of the iteration. Maybe a FIFO mechanism to send user commands would be more appropriate. That would also save resources compared to having these larges I32 array registers.
- Case structures rarely shortcut execution time on FPGA, because all code paths are actually compiled and turned to hardware circuit that "propagate" in parallel. When each case has finished propagating, the output of the structure is selected from the case that was "active". Also, this adds a selection mechanism that takes even more FPGA resources, especially with large arrays. Use a conditional disable structure instead and recompile if you really want to disable parts of the code.
- Your "cooldown" and "pulse" mechanism seems a bit unclear, could you explain in more details what you are trying to do? Also how you are going to use it from your RT program?
Regards,
Raphaël.
07-12-2024 03:31 PM
It'll end up taking more space but you could try manually unrolling the for loops in a SubVI. You know how large the input arrays will be so you could copy that code 32(?) times while indexing and rebuilding the arrays to make sure the FPGA is doing the operations in parallel.
07-16-2024 11:21 AM
I tried a couple of things for solving the issue.of the slow array operations. (1) Interestingly, using the array operations in a SCTL allowed to meet the 10 us computation time... but at the cost of the DSPs being allocated for the math during the compilation!
(2) As Jacobson suggested, doing the maths element-by-element solved the issue, as it forced the program to do the maths in parallel.
I am not very happy that the user is forced to to this manually, as it is time consuming and annoying... but it seems now is working in a proper way.
07-16-2024 06:01 PM