LabVIEW 2019 DMA transfer speed differences

Intaris · ‎03-08-2022

We have a long-standing code base in LV 2015 SP1 and 2019 SP1. It implements quite a few control loops on RT including FPGA communications.

The main time consuming portion of the RT loop is the transfer to and from the FPGA via DMA.

On a PXIE-8840 with Linux (NI Linux Real-Time x64 5.10.59-rt52) we have an interesting and hitherto completely unexplained effect.

When executing the main RT loop at 20kHz, we would expect a CPU load of approximately 68-70% with the DMA transfers taking approximately 6us for the Read and 8us for Write. Now, when I deploy from my development machine, I see pretty much half that, 3us for read and 4us for write. The software seems to execute correctly. This means the data is actually being sent to and from the FPGA correctly. Taking the exact same code, moving it to a different directory (to force a re-compile of the compile cache) shows no difference, still the unexpected "fast" execution.

When a colleague takes the same code to their PC, the performance observed is as we would usually expect. Same hardware, same bitfile, same code. So the difference seems to have something to do with my LabVIEW environment on my development PC or my LabVIEW installation.

One difference I have noted is that I'm running LabVIEW 2019 SP1 f3 whereas my colleague runs f4 patch. Don't know if that's relevant. I'm also running Windows 7 (due to FPGA toolchain limitations) and my colleague is running Windows 10. Both 64-bit OS.

Does anyone have any inputs as to this really unexpected behaviour? If this performance could be replicated, it would be the single most impressive bosst to performance we've ever been able to produce for our software. Something tells me it can't be real.... I'm at a loss here.

rolfk · ‎03-08-2022

I would not expect the f3 to f4 patch level to have such an influence. But have you checked what RIO software you have installed on both machines? That would be the actual software doing the DMA transfers so there is a real possibility that changes in there may influence the transfer speed, for instance packing more data into the 64-bit alignment that the RIO DMA channels normally use.

Rolf Kalbermatter
My Blog

Intaris · ‎03-08-2022

The RIO driver on the RT system is the same, it's the same RT device used for both tests. The driver responsible is the RT-installed RIO driver, right? There may be differences on the host, but are they relevant here?

We can both connect to the same hardware (in turn obviously) and see such hugely difference results. The system image is listed int he above post. It's RT-Linux.

rolfk · ‎03-08-2022

They can definitely be relevant. It's one half of the DMA channel and does have to do some of the actual control and maintenance of the DMA channel, so yes there is definitely a possibility that the host part of the RIO driver is responsible for this difference.

NI has changed several things about DMA transfers in the various versions but kept of course backwards compatibility whenever possible. So there might be a minor change in the host driver that suddenly enables a major performance improvement that was already supported by the RT version but not exercised with the older host driver.

I'm not saying that the LabVIEW bugfix difference CAN'T be the reason but it is very unlikely and not seeing any version specified for the host NI-RIO installation simply makes me suspect that the difference is much more likely in there.

Rolf Kalbermatter
My Blog

Intaris · ‎03-08-2022

The DMA reads are purely on the RT system. I don't see how the host RIO driver could have any infuence.

We transfer only processed data to the host via TCP.

The RT system is the DMA master so to speak.

I have a suspicion that it may all actually be an issue with the reported timings on the RT system. My colleague has actually just said he also rant he software at similar speeds (with much higher reported CPU load, but it ran stable). It might be that the timings I'm seeing are off by factor 2. That the reported 3us are actually 6us. I'm going to run more tests.

Terry_ALE · ‎03-08-2022

On the Host, do you configure and start the DMA-FIFO? Or do you go straight to reading?

Can you make an isolated throughput test?

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

Intaris · ‎03-08-2022

Host = RT or Host = Windows PC?

Michael_Strain · ‎03-08-2022

What hardware are you using and what software versions?

Any chance the different setups are using different RIO resource names? ie, PXI1Slot2 vs rio://localhost/PXI1Slot2

Intaris · ‎03-08-2022

Nope. It's literally the exact same source code.

rtollert · ‎03-10-2022

What's the VI Properties»Memory Usage»Compiled Code Complexity? Is it possible your Options»Environment»Limit compiler optimizations setting differs between the two LabVIEW installs?

LabVIEW

LabVIEW 2019 DMA transfer speed differences

LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences

Re: LabVIEW 2019 DMA transfer speed differences