I wish... FPGA Best Practices
So I'm starting a new thread to discuss the possibility of a presentation covering FPGA 'Best Practices', or common pitfalls/design choices.
This began in the main "I wish there was a presentation on..." thread but is continued here to avoid derailing that thread too much further.
I'd like a presentation on common patterns in FPGA code.
This presentation from 2014(?) is somewhat similar: LabVIEW FPGA Design Patterns and Best Practices (NIWeek 2014?)
I'd like to know about common mistakes and the better way of writing FPGA-based code.
Tom McQuillan: Possible presentation on GoF designs (with Sam Taggart?)
Me: Might not be exactly what I'm imagining - GoF patterns often require dynamic dispatch.
Terry Stratoudakis (Terry_ALE):
I am interested in this but first some comments and questions.
... software optimizations and techniques are mostly single core minded where on an FPGA things are spatial and so forth.
Has a new thread for this been made?
Yes, here now.
I gave a talk in May 2020 https://www.youtube.com/watch?v=i_nC_sGOqUw&t which talks about some of these techniques at a general level. How does this compare to what you are looking for?
I enjoyed that presentation but it mainly focused (as I understand it) on making faster things possible.
My problems are not necessarily related to making things as fast as possible, but rather about making them as readable/conceptually understandable as possible.
There are good LabVIEW FPGA shipping examples that have best practices as well.
Perhaps some review of these best examples could form the beginning of this hypothetical presentation (or if nobody submits this presentation, I'd be happy to receive some pointers here)
Other best practices can be found in the VST2 and RTSA code but they are not openly available. A talk could be made that speaks to those practices without revealing the code.
Also, what is typical application and NI hardware (i.e. cRIO or PXI)?
For me, cRIO, but I'd like to think that the problems I'm facing might not be specific to the hardware or the clock speeds. I guess that as speeds get faster and faster, more sacrifices to readability might need to be made though...
To give a concrete example of what I might mean with regards to pitfalls/design choices, I'll describe some cRIO code I've been recently rewriting.
My system uses some NI-9402 modules to communicate via SPI with a PCB that I designed, which contains an ADC and an "octal switch" (see ADG714). The switch controls various "almost static" inputs to the ADC, for example the shutdown, reset and oversampling digital inputs.
Most of the time, the ADC acquires continuously (this could be controlled by either the RT system, or by a switch using an NI-9344 module). The results are streamed over DMA FIFO to the RT system, which bundles them together in nice packages for communication to a desktop system, for logging, display, further analysis, etc.
Sometimes we might want to change some settings - e.g. oversampling ratio, or the sample rate, etc. To change something like the oversampling rate, the ADC must stop acquiring, the ADG needs to be updated with new values, the ADC must be reset (again requiring a pair of changes to the ADG switches), and then the sampling should resume.
Previously, the code ran in a sort of nested state machine structure. To update the settings, the RT system would change some FPGA controls, then set a boolean ("Requesting Update", or something) to true. The FPGA would poll that control, then go through a series of "Updating", "Finished Updating", "Ready to Acquire" like states, allowing the RT system to wait for the Ready to Acquire, then empty the FIFO, then set "Start" to true, resuming the acquisition.
This required lots of different booleans, and states, and seemingly worked at best "most" of the time. Clearly there were some situations in which the end state was not valid, but digging into this mess was pretty tricky - keeping the changes to state in your head continuously wasn't very practical.
This situation was vastly simplified by a recent change I made - now, the FPGA will always acquire a "block" of data of a certain length, depending on an enum "Sample Rate" value, which also includes the number of channels to sample (e.g. a typical values are "10kHz x 8Ch", or "50kHz x 3Ch", or similar).
The DMA sends a 'header' element that conveys the contents of the upcoming block - how many elements, how many channels do they represent?
By promising to always output that number of elements (even if some of them are 0, because the acquisition died due to e.g. power failure to the board, or a broken wire, or whatever), the RT system is much simpler.
Now, a new setting request can simply be enqueued on a FIFO to the FPGA, and when the end of a block is reached, the FIFO can be checked to see if it should continue sampling, or change something.
No complicated handshaking is necessary between RT and FPGA.
I don't know that this is a common problem, or a common solution (enforcing a block of data rather than individual elements, or e.g. 1 sample cycle with N results (one per channel sampled)), but it wouldn't surprise me in hindsight to learn that it was. If I'd considered this approach a long time ago, I could have saved probably a non-negligible amount of time and effort.
At the same time, modifying various parts of the code to use objects and simpler abstractions (e.g. a VI that carries out "Pulse Reset", rather than setting the ADG switches value to 28, then setting "Update Switches", then waiting for "Finished", then setting the values to 12, then "Update Switches", then...) allow more easily spotting problems in code - for example, the ADC is triggered by a pulse on one line, but the results actually are transferred partially during the next sampling cycle. If the sample rate increases, then previously it would be possible for the "Conversion Start" line to pulse repeatedly during the transfer of a previous sample, leading to a whole collection of "Start Time" values being put on a FIFO with no accompanying data.
Now, it's clearer that this can be a problem and when the sample rate changes, an additional pause is given between the last CONVST on the previous "block", and the first in the new block at a different rate.