LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

For a DMA-FIFO running from host to 7976 is there an optimal datatype for FIFO?

Solved!
Go to solution

For transmission to the 7976 from the Host, I can pack my data into say U64s or break it up into U8s.  Is there a datatype that will give best throughput?  Maybe based on the bus (PXIe) or the RIO drivers implementation behind the scenes?


Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications
0 Kudos
Message 1 of 9
(3,467 Views)

My experience, although with an older version of LabVIEW and a different board, is that the you'll run into a limit of the number of elements that can be transferred per second. For maximum speed your best bet is to make those elements as many bits as possible. Use a 64-bit data type if convenient to do so.

0 Kudos
Message 2 of 9
(3,450 Views)
Solution
Accepted by topic author Terry_ALE

My experience is different.  I once did a load of tests with different width DMA FIFOs to FPGA and tested throughput and latency.  I saw the same total bit/sec transfer whether sending data over U8, U16 or U32 DMAs.  My explanation for this?  Since the DMA width is 32-bit, LV does bit packing in order to ensure that every portion of the 32 bits is used.  That means if you have a U8 DMA, it will transfer 4 at once, for a U16 DMA 2 at once or U32 DMA one at a time.  64-bit is split over two individual transfers.

 

Just don't use FXP.  Even a 1-bit FXP is represented internally as 64-bit and will require TWO DMA transfers to get to a FPGA.

 

Aside fromt hat, U8, U16 or U32 makes essentially no difference as they are all packed to 32 bit internally.

Message 3 of 9
(3,423 Views)

@Intaris wrote:

...  I saw the same total bit/sec transfer whether sending data over U8, U16 or U32 DMAs.  ...


My experience is similar.  The difference is almost negligible.  I've seen signs (but didn't invest time to prove it), that the FPGA uses a little more real-estate for U16 or U8; probably to handle fractions of a transfer.  But if it complicates your program at all, your program probably makes up the difference.

 

There might be a minor difference if you are using Single-Cycle-Timed-Loops to read, but SCTL isn't that common for FIFO reads [Edit:  For cases I deal with] (I never have), so I usually forget to point that out.

 

BTW, signed/unsigned makes no difference.  U32 and I32 perform and compile exactly the same.

___________________
CLD, CPI; User since rev 8.6.
0 Kudos
Message 4 of 9
(3,408 Views)

Thanks for the responses.

 

On the topic of FIFOs in SCTL, I normally do put them into SCTLs.  Which cases do not work for FIFO reads in a SCTL?  Is it due to the host-target bus chocking from time to time?

 


Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications
0 Kudos
Message 5 of 9
(3,400 Views)

@Intaris wrote:

My experience is different.  I once did a load of tests with different width DMA FIFOs to FPGA and tested throughput and latency.  I saw the same total bit/sec transfer whether sending data over U8, U16 or U32 DMAs.  My explanation for this?  Since the DMA width is 32-bit, LV does bit packing in order to ensure that every portion of the 32 bits is used.  That means if you have a U8 DMA, it will transfer 4 at once, for a U16 DMA 2 at once or U32 DMA one at a time.  64-bit is split over two individual transfers.

 

Just don't use FXP.  Even a 1-bit FXP is represented internally as 64-bit and will require TWO DMA transfers to get to a FPGA.

 

Aside fromt hat, U8, U16 or U32 makes essentially no difference as they are all packed to 32 bit internally.


I have not done my own testing but have heard this as well. One thing that is mentioned in the High Throughput Developer's Guide is that the DMA engine will fire if

 

1. The FPGA-side buffer is one quarter full

2. The FPGA-side buffer has at least 512 bytes (a full PCIe packet)

3. The eviction timer of a DMA controller fires

 

If you are concerned about maximizing throughput, the second condition is probably what is coming up. This might help explain why you see no noticable difference in transfer rates.

Matt J | National Instruments | CLA
0 Kudos
Message 6 of 9
(3,398 Views)

I wonder if the differing results about transferring elements versus bytes are architectural - I did my testing on an sbRIO and was transferring host-to-target, perhaps the other direction or on other hardware the performance is different. I found that I could transfer about the same number of elements per second in a U8 or a U32 DMA FIFO, but the U32 carried 4 times as much data (of course). I no longer have access to that code but I do have access to a newer sbRIO so if I have a few spare minutes maybe I'll duplicate the test and perhaps be proven wrong and learn something ;). If you want to build your own test, here's what I'd do. On the FPGA side, put a DMA FIFO read in a loop (I'd do an SCTL) and wait for the first element to be read from it; save the tick count. Keep reading until you've gotten a particular number of elements, get the tick count again, subtract. Change the data type, repeat. On the host side, write an appropriately-sized array to the DMA FIFO all at once.

 

I haven't run into any problems putting a DMA FIFO Read inside a SCTL, other than needing to handle the case where there's nothing to read. If you're reading inside a SCTL and you can keep the DMA buffer constantly full, then you're better off with a wider data type simply because you can read more bits per loop cycle (you can read one element, regardless of how wide it is). On the sbRIO I was using, though, the host side could not keep the DMA FIFO saturated and it wasn't possible to read an element every cycle of an SCTL.

0 Kudos
Message 7 of 9
(3,376 Views)

nathand, I did something very similar but you must make sure that the DMA read on the FPGA is fast enough to actually take care of the data.  If it's 32-bit wide and running with 100MHz, then you need a 400MHz U8 read to saturate the channel.

 

You need to bear this in mind when analysing the data.

0 Kudos
Message 8 of 9
(3,371 Views)

Good point. It's been long enough since I did my test that I have no idea if I took that into account at the time. I was able to get it running fast enough that the sbRIO was no longer the limiting factor with a 32-bit DMA FIFO, and that was good enough for me. My code could drive the SPI interface to an external device at full speed constantly and that was all I needed. My recollection is that I couldn't keep a U32 DMA FIFO full constantly, but even if I'd been able to do so I couldn't have sent that data along over SPI that fast so it's quite possible I didn't test it that thoroughly.

0 Kudos
Message 9 of 9
(3,365 Views)