From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA FIFO BUG?

In essence, the question boils down to:

 

If I write to one built-in BRAM FIFO in one loop (every second iteration @ 80MHz) and read from it in another loop (every iteration @ 40MHz) and then send it back over a second built-in BRAM FIFO, what is the latency between the new value being sent and the value just received via FIFOs?

 

My tests indicate it's in the >20 Cycles region.....

 

Shane

0 Kudos
Message 11 of 18
(867 Views)

@Intaris wrote:

The data goes to a dac via a CLIP and is then read in again from ADC again via CLIP. All communication is FPGA level only.


Does the data actually pass through the DAC and back in through the ADC? Depending on the settings of the particular DAC and ADC you are using you will see tens of clock cycles of latency before a sample makes its way through either of those components.

 

The clip can also add several clock cycles of latency depending on how it serializes/deserializes the data passed to and from the ADC and DAC.

 

 

Message 12 of 18
(856 Views)

Like I said, certain delays I can account for but nearly 1 microsecond delay for a 40MHz DAC and ADC seems a bit excessive to be purely a hardware issue.

 

I'm running my modulator at 160MHz, the DAC runs at 40MHz (I have 4 modulators) and I'm seeing nearly 160 cycles of my fast loop between sending a value to my hardware loop (running at 80MHz alternating between two states - Iread fromt he FIFOs in every second iteration of the hardware loop) and reading at the input of my demodulator (also running at 160MHz).

 

That's a lot of cycles.  I have multiple FIFOs between the fast loop and slow loop, one for each channel which means each FIFO is written to and read from at a rate of 40MHz.  According to my calculations, I have a delay of exactly 32 40MHz cycles which I have trouble assigning to hardware (that corresponds to a whopping 128 cycles of my 160MHz loop).  The CLIP DOES add some delays but the tests made with completely different architecture do not show such a large delay by a large margin.

 

But despite all of this my question remains: What is the expected latency when passing data via a built-in Block RAM FIFO.  The help states something along the lines of 5 or 6 cycles (Read / Write cycles I assume).  Would it help if I wrote my FIFOs every second cycle of the 160MHz loop and read it EVERY iteration of my 80MHz loop?  If there were a latency, this would effectively half the absolute time associated with that.

 

I realise this is really getting into the nitty gritty of implementations but I don't like using functions like this without having some more detailed information about how they work and what certain decisions mean for certain applications.

 

Shane

0 Kudos
Message 13 of 18
(842 Views)

@Intaris wrote:

Like I said, certain delays I can account for but nearly 1 microsecond delay for a 40MHz DAC and ADC seems a bit excessive to be purely a hardware issue.


I can't speak to the hardware that you are using, but with the hardware I've used this wouldn't be excessive when operating at 40MHz. For some perspective, here are some numbers from a product I'm more experienced with.

 

The 5782 will by default run its ADC at 250MHz, its DAC at 500MHz, and the SCTL in which their respective IO nodes reside at 125MHz. Using the default settings the round trip time through the CLIP, through the DAC, through the wire looping the DAC to the ADC, through the ADC, and back through the CLIP will take about 300 nano seconds. If you enable interpolation on the DAC this increases to around 500 nanoseconds. 

 

Regarding the FIFO delay, I ran a quick test, it looks like regardless of data type or clock domain it takes 5 ticks inside of a SCTL for data to make its way through a FIFO. Attached is the test bench I whipped up.

Message 14 of 18
(825 Views)

I can't open the files, LV version is newer than mine.

 

So am I correct in saying that the FIFO essentially acts as if it has five "invisible" elements which need to be written to before any can be read (this would be perfectly in line with the documentation)?  If so then that's kind of what I'm seeing.  I have used "built-in" for FIFO control which apparently increases this by 1, so that would make it 6.  Both ways makes 12.  That's a relatively large chunk of my 32 cycles.

 

I'll have to check tomorrow with the turnaround times without using FIFOs (our last generation FPGA code) so that I can use that as a baseline.

 

Even still, I could theoretically split the 12 Read Cycles down to 6 by writing and reading double the amount to the FIFO as is required (and discarding every second item).  In this way the "invisible" FIFO elements have less bearing.

 

While I do freely admit that this behaviour IS documented, it should perhaps be added to the excellent new document on high performance FPGA techniques which was released earlier this year.  That document is a treasure trove of great information.  So whoever put that together, megakudos.

 

We don't use NI's DACs or ADCs, we drive our own custom electronics with the digital lines of the FPGA card (The CLIP runs at 200MHz).  Our DAC has an Analog output rate of 40MHz, we send data to it in parallel (we send a full 14-bit value every 25ns which is then passed on to the DAC over 14+ digital lines, the ADC however I need to check on but I thought it was the same), not in series so we don't expect any protocol issues with SPI or anything similar.

 

I'll chack the baseline numbers and report back.  But it's good to know that at least some of my observed delays are already explained.

 

As always thanks for taking the time to humour me with my weird questions.

 

Shane

 

Ps I've already built in a custom delay on the demodulator side (a single Block RAM node used as a big shift register to delay the data by the amount I am seeing).  This doesn't really change anything on the output side but I now no longer have much of a frequency-dependent phase shift.  Before, I would go from Phase zero to 360 degrees between 0 and 1.25 MHz.  Now I have barely a shift.

0 Kudos
Message 15 of 18
(809 Views)

@Intaris wrote:

I can't open the files, LV version is newer than mine.


The project may be wierd since it uses a target that doesn't exist in 2011, but the FPGA vi should be fine.

0 Kudos
Message 16 of 18
(789 Views)

False alarm, the old version shows very similar delays.

 

It seems the FIFOs are just a very small part of the overall delay observed.  Thanks for the answers, helped me anyway.

 

Shane.

0 Kudos
Message 17 of 18
(765 Views)

Just a small follow-up.

 

I checked out the benchmark code you posted and had immediately noticed that the BRAM FIFOs were transferring only within a single clock domain, there was no clock domain corssing going on.

 

I implemented my own version of the test with a 80MHz loop writing to a FIFO (Name "There", Datatype I32) which was then read in a 160MHz loop and passed straight to a return FIFO (Name "Back again" Datatype I32) which was then again read in the 80MHz loop.  By passing the loop iteration counter at the time of activation, the round trip could be measured in relation to the 80MHz process clock.

 

I got 7 cycles for the "there and back again" task.  I presume this is to be interpreted as 5 clock cycles per FIFO of the appropriate writing domain.  I would thus expect that transferring from an 80MHz loop to a 160MHz loop takes longer than transferring from a 160MHz loop to a 80MHz loop (in absolute time terms).  Is that correct?  I suppose the crux of my question is which loop controls the propagation, the writer or the reader?

 

Shane.

0 Kudos
Message 18 of 18
(732 Views)