LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA DMA performance Bitfile or VI

Hi, I'm seeing a werid behaviour in my FPGA code.

 

We are looking at expanding the main DMA FIFO in our code and I wanted to run some benchmarks ont eh throughput with different data types (U8, U16, U32 U64 and so on) and I have noticed something quite unexpected.

 

The performance of a VI (Running from the RT System) depends on whether I have the "Open FPGA Reference" running from a VI or a Bitfile.  I have added a wait of 250ms before running any times operations but you can see the results of writing a 4M FIFO of U8 to the FPGA target below.

 

FPGA DMA Speed variation.png

 

This is using the exact came code after the "Open Reference".

 

Any ideas ays to why this would be the case?  It's super reproducible in my code.

 

FPGA DMA Speed variation BD.png

 

Shane.

 

PS I'm using LV 2011 SP1

0 Kudos
Message 1 of 21
(4,562 Views)

Oh, I think my VI version might be throwing errors.... need to wait a while to compile before I can confirm.... D'OH!

0 Kudos
Message 2 of 21
(4,559 Views)

Nope, I'm still seeing differences.

 

For sending 8M U8s per DMA I'm seeing write times of 42ms for VI and 81 ms for Bitfile.  Read times are the same.

 

Weird.

0 Kudos
Message 3 of 21
(4,554 Views)

Send times for 8M U16 is 60ms for Bitfile and 23ms for VI.

0 Kudos
Message 4 of 21
(4,552 Views)

I found the soruce of the Weirdness.

 

If I attach a ocnstant to the "Open Reference" node with my Bitfile the format of the constant makes a difference in the timing.

 

If I have a format like rio://192.168.0.X/RIO0" then it takes longer than if I have an Alias defined as simple "RIO0".

 

I didn't say it's less werid, I just said I found the cause.  Is there some kind of extra work being done AFTER opening the reference which is dependent ont he format of the location specifier?  It seems like this should not matter in the slightest but what do I know?

 

Shane.

0 Kudos
Message 5 of 21
(4,549 Views)

Oh man.  My Alias was pointing to a different chassis.  Different hardware was the reason for the different timings.

 

Ignore me, I'm full of useless information recently.

 

Bye.  I need a coffee.

 

Shane.

 

PS No, Coffee didn't help.  The differences are still there if I compare Bitfiles with Rio://192.168.0.X/RIO0 format device names.

0 Kudos
Message 6 of 21
(4,548 Views)

I'm having a little difficulty following which cases you're looking at.  I believe there are 4 possible permutations:

 

VI Mode with resource string of "RIO0"

VI Mode with resource string of "rio://192.168.0.X/RIO0"

Bitfile Mode with resource string of "RIO0"

Bitfile Mode with resource string of "rio://192.168.0.X/RIO0"

 

Have you tried each of these cases?

0 Kudos
Message 7 of 21
(4,527 Views)

I have only tried case 3 and 4 with the VI call not having any resource input since the VI itself has that defined.  Of these the VI method is fast as is the Method 3 you list whereas Method 4 in your list is slower.

 

What I have also noticed is that the throughput scaling for U8 to U64 is as expected (U16 requires double the time of U8, U32 double that of U16 and so on) whereas ALL FXP values require the same time (Exactly the same as U64).  It seems that ALL FXP datatypes are transmitted as 64-bit which is kind of inefficient.

 

Shane.

0 Kudos
Message 8 of 21
(4,518 Views)

As soon as I try the VI method with a resource control it tells me I need to recompile for the target.... I'm telling it to run on the same hardware but now I need to recompile....

0 Kudos
Message 9 of 21
(4,514 Views)

My guess/expectation is that if you tried cases 1 and 2, 1 would have the same performance as 3, and 2 as 4.

 

If that is indeed the case, I believe that is expected.  "rio://..." is the format for accessing remote devices.   So by using that format, it's involving at least some portion of the network stack. 

 

>> It seems that ALL FXP datatypes are transmitted as 64-bit which is kind of inefficient.

I believe this is as designed, and actually avoids introducing worse performance.  The format for fixed point values in software is always 64-bit (you can think of arrays of these in software as being 8-byte aligned).  If the transferred buffer of 64-bit values did not have this alignment, the processor would have to "manually" copy these values out of the dma buffer and into another buffer (increasing cpu usage and memory usage). 

0 Kudos
Message 10 of 21
(4,497 Views)