FPGA Target Scoped FIFO Race Condition

notadoctor · ‎09-22-2016

Howdy folks,

We have recently stumbled upon an interested race condition in our product's code. We use a target scoped FIFO to collect status info from various places in the FPGA, then read the target scoped FIFO and pass the packets directly to DMA. On the host side, we've been getting packets that contain data that was never written (our data consists of typedef headers followed by a data element, not just random data, so if the header is corrupt we know something went wrong).

After investigation, we realized that a target scoped FIFO write that is Not Arbitrated can result in corrupt data when two writes occur on the same clock ticks of the FPGA. I've attached a dumbed-down VI showing my issue. Setting Data0=0, Data1=1, Data2=2 results in a received data packet of 3. I believe the FIFO is overlapping the inputs and doing a Boolean OR, but it's hard to say, as other possibilities are tough to eliminate. In any case, the data is clearly corrupt.

NI's solution would be to select a different built-in Arbitration option. Our writes are all in single cycle loops, so selecting a built-in Arbitration option would require pulling the code out of the loop, and blows up our fabric size to be un-compilable, so we must manually arbitrate. I am looking for suggestions on how best to handle this issue in a way that takes up the least space.

We have discussed using 30 individual target scoped FIFOS (one for each write), using 30 or so registers (introduces potential for lost packets due to overwriting before read), using some sort of semaphore implementation (but what if we try to grab semaphore on same tick of clock?), and a few other off the wall options.

Is there a intra-FPGA communication method we are overlooking? Has anyone seen this issue before, and if so, how did you solve it? What, in your opinion, is the least space consuming option we have?

Thanks in advance everyone,

Patrick

luda.mattski · ‎09-23-2016

Hey notadoctor,

What you're seeing here definitely makes sense. Naturally, writing multiple loops to one memory location at such a high rate will cause a race condition. I don't know the low-level details of how those FIFOs are treated in the compiler, but it stands to reason that arbitrating multiple writes to one FIFO is going to be very difficult to fit into a SCTL within one clock tick.

With that in mind, if you are looking for reliable performance, having 30 FIFOs is probably the best option. It won't be memory efficient, but perhaps you could play around with having less slightly fewer FIFOs than SCTLs and hope that writing two SCTLs to one FIFO will arbitrate correctly.

Matt | NI Systems Engineering

notadoctor · ‎09-28-2016

Hi Matt,

Thanks for the response. It is indeed a known issue with a known solution, we're just backed into a corner and have limited options. Unfortunately using unique FIFOS for each write location turned out not to be a good option, as we update status and alarm data from 100+ places in our FPGA.

I've also tried moving FIFO writes out of SCTLs and turning on Arbitration, and have tried moving FIFO writes out of SCTLs into non-reentrant subVI calls. Interestingly, both failed compilation with the exact same lack of space. Perhaps arbitration is simply making the write non-reentrant during compilation.

What sort of asynchronous communication on FPGA is best suited for writes from many locations in the FPGA?

If the answer is FIFOs, then we are basically forced to arbitrate and deal with recovering the space elsewhere.

Best,

Patrick

yea_likethecity · ‎09-30-2016

Patrick,

Have you considered merging some of the SCTLs? You shouldn't need more than one SCTL per available clock on the FPGA. You may be able to skip some of the FIFO stuff if you run some operations in parallel in the same SCTL. You could also programatically create the FIFOs, which would allow you to bundle them together and move them around your code instead of having 30 FIFOs in your project.

(Edit: spelling)

Austin
Staff Software Engineer
NI

James_McN · ‎09-30-2016

I'm afraid I don't have a solution, but I might be able to rule a few things out.

I presume the data is sequential and can't be overwritten?

I did think you could use a single memory block and write to different portions of that. You would need some sort of synchronisation to implement a FIFO on top of this. However, if you are using the block memory on the FPGA the issue is this is a specific shared resource with 2 ports (normally 1 read, 1 write). Even with memory, you will run into arbitration issues here.

I can't remember if this is still the case with a LUT implementation? This will be difficult though as you will need some sort of synchronisation to say where to read from and will be difficult or impossible to avoid your race conditions.

I think if you are using SCTLs and need multiple access you are always going to hit issues with a single shared resource. Either increasing the number of FIFOs/Resources or merging the SCTLs somehow may be the only route. It's hard to say without seeing your code and what/how your sending but merging might be my first approach since managing 30x FIFOs will be a PITA! But you're limited in how much data can be in a single element.

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are at devs.wiresmithtech.com

LabVIEW

FPGA Target Scoped FIFO Race Condition

FPGA Target Scoped FIFO Race Condition

Re: FPGA Target Scoped FIFO Race Condition

Re: FPGA Target Scoped FIFO Race Condition

Re: FPGA Target Scoped FIFO Race Condition

Re: FPGA Target Scoped FIFO Race Condition