LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

sth · ‎09-09-2008

I have an application that is heavily threaded and has multiple (ie 6 or more) parallel loops. I recently migrated this from LV 8.2.1 to LV 8.5.1. With one exception (bad control that crashed LV) it upgraded fine. However performance has degraded immensely!!!! This program does multiple serial I/O, GPIB I/O, DAQ, Digital I/O etc. all in different loops.

The DAQ used to loop at a 1 second time and keep up with 80 kHz on 32 channels (6033 board). Now it doesn't keep up most of the time at 16 kHz and I am plagued by buffer overruns. This is 5 times slower and still not keeping up. This is a dual CPU system and has almost maxed out one of the CPUs now.

I have also migrated from VISA 4.2 to 4.4 and NIDAQmx base from 3.0 to 3.2. This is all under Mac OS X just so folks don't suggest using regular NIDAQmx. The VISA updates are minor and should not affect this as are the NIDAQmx updates.

Any ideas as to why the simple upgrade from 8.2.1 to 8.5.1 would result is such a terrible loss of performance?

Norbert_B · ‎09-09-2008

sth,

what kind of mechanism are you using to pass data between the loops?

Norbert

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.

Ben · ‎09-09-2008

Hi Scott,

This is outside my area of expertise but I'll throw this out there just in case.

You are opening the DAQ tasks before the loop and using reference inside, correct?

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

sth · ‎09-09-2008

Actually the loops are all on an independent 1 second cycle. There is very little data passed between loops. Mostly written to a local array. One slow loop (ie 15 minute cycle) writes to a data log file from that array. Others do independent PID etc. One checks and pages operators.

The only synchronization is an occurance that I use to create an interruptible wait in the loops of 1 second.

Ben,

Yes, I open the DAQ reference and don't close it ever. In fact this is one of the large problems with NIDAQmx base. I have to have the data acquired continuously and then read it. If I keep the reference but "restart" the DAQ every loop it re-initializes the hardware and is VERY slow. So I actually start the DAQ outside the loop. I then read continuously, when there is the inevitable buffer overrun, the DAQ stops and I detect this error and restart it in this case. But that is a good point and was a BIG problem in moving from old DAQ to NIDAQmx base when I upgraded from LV 5.1 to LV 8.0!

This app has exposed many weaknesses in LV since I wrote it as a massively parallel system back in 199X (Started in LV 3 and finished in LV 5.1). Since then I have made minimal tweaks as it upgraded to newer hardware and newer LV.

At the moment I am suspecting the Asynchronous VISA calls. I have found that Asynchronous VISA calls actually chew up CPU more than Synchronous ones. (Yes I did not reverse the terminology). An asynchronous VISA call actually internally spins a poll on the I/O and uses a lot of CPU time. A synchronous call just blocks that thread. Since it is in maintenance mode there may be something that is unplugged and taking a long time to timeout and locking up the CPU.

Today's project is to run the profiler and find the VISA calls (there should be only 4 of them) and mess with the async vs. Sync setting. Also to crank up the number of threads allocated to the execution systems. ASFAIK this system ran perfectly fine yesterday with LV 8.2 and is now having buffer overruns with LV 8.5. This was supposed to be a simple upgrade. Of course I have all the old stuff saved and can back out.

sth · ‎09-09-2008

I found it! It was the newer NIDAQmx base update I think that broke everything. There is a VI to convert the raw binary data to floating doubles. This is called "ESeries-- AI DMA Scale.vi". It is a fairly basic VI that merely scales the data by the gain and the appropriate factor of 2. HOWEVER it is so badly written that it bogs down any system. Normally I am taking 2500 scans of 32 channels each second. This is a total data rate of 80 KS/s which is well within the settling time of my 6033 board. However to scale each of these scans takes about 7.5 seconds (on a quad processor 2.5 GHz G5). Considering that each CPU is good for a GFLOP at least this is really really badly written VI.

I am including my test VI
and for those without NIDAQmxbase here is the culprit VI as well.

This is a bug and I assume is just as bad in the corresponding S series, USB series and MSeries scaling VIs.

-Scott

sth · ‎09-09-2008

I don't know how they made that VI so slow!! Anyway here is an improved VI "ESeries-- AI DMA Scale.vi". For my cannonical case with 2500 scans of 32 channels it is about 7800 times faster (the resolution of my clock) changing the time for computation from 7800 mSeconds to 1 mSecond!!! If I go to bigger arrays I'll bet the speed up is greater!

Judicious use of array multiplication can remove nested FOR loops. For loops bad..... Bad, bad, bad.

Norbert, if you "own" this issue or thread, can you get a CAR issued against this. I am attaching my "improved" VI named "ESeries-- AI DMA Scale-sth.vi" and the test harness.

Enjoy! And feel free to use this code (it would be nice if my name was in the comments, but what the hell) if it gets fixed we all win.

7800 times faster!!!

-Scott

PS: I sure hope they get the troubles with Safari fixed on the forums, switching back and forth to alternate browsers is annoying).

PPS: Sorry this got posted in the wrong forum, should probably have ended up in multi-function DAQ

sth · ‎09-09-2008

Final result:

I plugged my fixed VI into the actual application. CPU usage dropped down to 20% of one CPU even after I cranked it back up by a factor of 5 to 2500 scans/sec. This was obviously the problem VI. A code search for all the similar scaling VIs in NIDAQmx base would be a real good idea since I assume this bug is duplicated throughout the code package.

That was a monstrous speedup. There may be a way to tweak it a bit more but the 3 orders of magnitude is enough for today.

I only have one comment about the distributed code. Being as this is Florida... 🙂

-Scott

Message Edited by sth on 09-09-2008 06:55 PM

Ben · ‎09-10-2008

Nice work Scott!

Kudos to you.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

sth · ‎09-10-2008

Thanks Ben,

The rewrite is fairly straight forward once you realize that this is the problem VI. But what surprises me is that it was necessary. Even a simple test suite should have picked up this. This was one of the slow 100 kS/S boards where it took 7.5 seconds to scale 1 second worth of data, I hadn't even tried this update on one of my 1 MS/s boards. The standard "Stream Data To DIsk" or whatever it is called example should fail spectacularly with this update. I just checked, since the NI version scales as N^2, processing 1 MSample takes about 770 seconds to process 1 second of data!! The improved routine takes 5 mS.

I assume that they at least run the examples with the different families of hardware and push them to some of their rated speeds. But obviously I could be wrong.

Once I used the profiler on it, my first reaction was that the profiler was returning false data! 🙂 I figured that there was some wait or timing issue that was falsely inflating the reading for that VI. That simple data scaling could not be such a bottle neck!

Now I have to put a special patch file in my software imaging system to make sure all the systems get this identical setup.

It's always something. This was the first LV programming I had done in a couple of months and it was only supposed to be a simple software udpate! I suppose the word simple was just false hope. Anyway I hope a CAR gets issued and this gets thoroughly fixed for all cases in the next release, I have only patched the Series E boards for DMA.

Tom_W_[DE] · ‎09-11-2008

Hi Scott-

Thanks for bringing this to our attention. We'll look into it for the next release of DAQmx Base, but thanks for posting your proposed solution here. I logged this issue as CAR 125846.

Tom W
National Instruments

LabVIEW

LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Re: LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)