DAQmx program using almost 100% of RT controller CPU

Texas_Diaz · ‎10-18-2013

Hey Sarah,

Ah, I think I understand your use-case a little better now. Running as a startup app means you'll likely never stop the DAQmx task - that's fine, but when you're debugging (or if you remotely connect to the target) you probably want to be able to stop the app cleanly.

Ugh, I think I preferred the previous implementation (while loop surrounding the timed loop). Is it important that you NEVER MISS data, or is it more important that you always have the LATEST data? Your previous architecture always made sure you had the LATEST data, but your current architecture makes sure you NEVER MISS data. The new architecture has the possibility of "backing up" - meaning it's possible the timed loop could provide data faster than the while loop can consume it. Yeah, sure, your consumer loop runs at twice the speed as the producer, but I don't see anything here actually keeping track of how many elements are in the queue and doing anything in response if the queue gets out of hand. Since you're not using an RT FIFO in the timed loop (and you can't because the RT FIFO doesn't handle multi-dimensional arrays), you run the risk that the normal FIFO causes the timed loop to get behind.

See, the problem here is that you're now allocating a lot of memory within the timed loop that shouldn't need to be allocated - you're allocating memory for the queue element AND allocating memory for the data storage. Memory allocation is not recommended within a timed loop, because the timed loop may have to fight with a lower-priority thread for the memory manager (and that's not possible to predict), which can cause the timed loop problems. Also, in this implementation you're performing a fairly large copy each time you push data into the queue. I really liked your previous architecture because you didn't have unnecessary memory allocations and copies. I agree that you should not be reading "all available samples" - if you're finding that waiting for the buffer to fill up is causing issues, you can manually set the buffer size to ensure you have the proper amount of data available to you. Personally, I don't think waiting for the buffer is going to be a problem.

I liked your old implementation (while loop around the timed loop) because you capture data as you need it, don't hold onto it (you add it and push it into a shift register, and then pass the sum to the while loop directly), and with the recommendations of just having the DAQmx Read and a simple add in the timed loop your timed loop runs very efficiently. The work you'd be doing in the while loop (reading/writing the shared variables) may slow down the capture of the data, but it's not like that's a problem (if you only care about the LATEST data).

-Danny

SarahW · ‎10-21-2013

Hi Danny,

Thanks again for all your help!

You are absolutely correct, I do only care about the latest data, so have gone back to my previous architecture (attached picture). I've also now added a stop button 🙂

My trigger happens every 20ms, if I set the period of my timed loop to 20ms I do get the -200279 buffer overflow error. If I set the period to 10ms it seems happy. Is it bad practice to set the period to 10ms when I know it mostly wont be able to run quicker than 20ms?

It now seems to sit happily with it's CPU usage at about 60%, is this about what you would expect?

Just out of interest, why when you swap the timed loop for a standard loop and add a 10ms wait does the CPU usage jump back up to 100%?

Sarah

Texas_Diaz · ‎10-21-2013

Hey Sarah.

There's still one item I'm not sure why we're still doing - why do you still have the "ReadAllAvailSamp" node set to TRUE? That's what's driving your 10ms timing restriction, it's not allowing data in the buffer to expire (or else it's throwing the error when it does). If you set that to FALSE, and see how long each loop iteration really is, you can see if waiting for the buffer to fill up really impacts your performance (it likely will not). It won't hurt the timed loop if the call is really a blocking call (which it is for that DAQmx function) since while you're waiting on the blocking DAQ call the rest of the system can be doing other things - you just might hit 10-15 microseconds of jitter once the system comes back (which is peanuts when you're talking waiting milliseconds). Otherwise, you're required to continuously read data as it's put into the buffer, which you're currently doing. Look in the inner loop "left data node" terminal (and the outer loop output node terminal) you can pull the previous iteration duration to see how long the loop actually took (by default it shows an error cluster, but you can select anything you want). If it took less than 20ms, and your CPU utilization is less than 50%, you're good.

Yes, you generally do not want to ever set a period to be smaller than how long it actually takes. The danger is that on some platforms (PXI in general) the timed loop "runs to completion" (meaning nothing except hardware and timer interrupts can preempt it) and if your timing is "just right" your timed loop can suck all the resources on the system. The crux of the problem is that the timed loop "fires" every dt period; let's say your dt is set to 10 and it REALLY takes 5ms to run the contents of your timed loop; Great, every 10ms your loop is scheduled for execution, and it takes 5ms to run which gives the rest of the system 5ms each period to do everything else it needs to do. Assuming you're not doing any heavy TCP or heavy number crunching elsewhere, you're good to go. However, if dt is set to 10ms and it really takes 19ms to execute your loop, you're in trouble - by default your loop fires every 10ms, which means it fires on the 10ms, 20ms, 30ms, 40ms, etc... time marks. When your loop runs late (takes more time than the set period) the default setting is to keep on the same schedule and fire at the next available iteration mark, which in this case means the 20ms mark. If your loop took 19ms, and your loop fires again at 20ms, you only have 1ms of time to execute everything else in the system. OUCH! But the reverse'ish also applies - if your dt is 10ms and it takes 11ms to execute, you won't get another chance to run the loop until the 20ms mark - which means you lose out on a period! You want to profile the loop (using the iteration duration values inside the loop) to determine the maximum time it really takes the loop to run, and set your iteration dt accordingly (giving time for "other things" in the system to run). How much time you give to the system is dependent upon what else you've got going on; if all you're doing is what's in your code, you don't need a whole lot (though I personally try to make sure it has no less than 10% of the total runtime "free").

When you set your timed loop to be a while loop, where did you put the 10ms wait? Did you put it in the inner loop, or in the outer loop? If in the inner loop, you may be seeing some inefficiencies in creating/starting the timed loop that are factoring into the lower CPU utilization (there's more work to start up a timed loop than there is to start up a while loop, but that work doesn't necessarily eat CPU time). If in the outer loop, your inner loop is eating all your CPU utilization and you really should be putting that in your inner loop.

-Danny

SarahW · ‎10-22-2013

Hi Danny,

I've now set "ReadAllAvailSamp" to false. I had to change back to finite samples and use a 'Commit' task instead of 'Start' to keep my timing structure.

It appears that the read task within the timed loop takes 12ms+however long it has to wait for the trigger (max 20ms), so a maximum of 32ms but only 12 of that actually doing anything. So with my period set to 14ms it sits at about 45-50% CPU usage 🙂

Obviously this program just puts the data on the network using the shared variable, I then have another program which reads the shared variable and displays the data, which can be run form anywhere on the same network. The CPU usage on the RT system spikes again if you have too many of these display VIs open, is this expected and something we are just going to have to handle?

Thanks again for all your help.

Sarah

Texas_Diaz · ‎10-22-2013

The data being "put onto the network using the shared variable" doesn't exactly describe what's going on.

When you write to the Shared Variable, it's making that data available to the Shared Variable Engine. Clients connect to the RT target to get Shared Variable updates (via the Shared Variable Engine); each client creates a TCP connection PER TARGET and the Shared Variable Engine streams that data to the target over that TCP connection. That's one copy of the data being sent over the network by the controller FOR EACH target connected. That's why the CPU utilization increases for each client that connects to the target, because it's processing a lot of data depending on how fast the data is being updated. That's something that is primary to the design, I'm afraid, and has to be just "dealt with".

-Danny

Multifunction DAQ

DAQmx program using almost 100% of RT controller CPU

Re: DAQmx program using almost 100% of RT controller CPU

Re: DAQmx program using almost 100% of RT controller CPU

Re: DAQmx program using almost 100% of RT controller CPU

Re: DAQmx program using almost 100% of RT controller CPU

Re: DAQmx program using almost 100% of RT controller CPU