strange usb performance disparity

RDVincent · ‎11-18-2010

I have been using a USB-6221 BNC device to control an experiment. The control software, written in C++, runs as a command-line process in Windows, and is quite simple in its basic approach. It uses a polling loop to continually query a single analog input channel and a single digital output channel. Input processing involves a fairly complex set of calculations (several FFT's and some complex tree searches). For the output, we essentially generate a simple string of digital samples that control a pulse generator. The program data is large (>100MB) but fits comfortably into the physical memory of all systems we are testing. We require fairly low latency in response to changes in the input, so we attempt to buffer only about 0.1 sec worth of output samples at a time. This means we need to poll the output at least 10 times a second to avoid underflowing the digital output channel.

What we're seeing is a bizarre performance disparity on different laptops. On one laptop, an AMD 64-bit dual core system with 4GB of RAM, the software works fine. On all of the other systems we test, the USB activity appears to slow the processing so much that we consistently underflow the digital output channel. The elapsed time for processing the analog input data increases from <20 msec on the "good" system to >500msec on the "bad" systems.

In a test version of the software, built with dummy calls to the DAQmx interface functions, the processing time of the software is consistent with what we see on the "good" laptop, even on the "bad" systems. So we are certain that the actual processing time is well within the 100 msec or so that we need to keep up with the output. However, as soon as we build with the "real" NI libraries, the performance drops by over an order of magnitude.

This is with NI-DAQmx 8.6.1 (which we are using for historical reasons) on either Windows Vista and XP. However, upgrading to 9.2.2 did not affect the performance. The malfunctioning systems are all 32-bit Windows, so I guess that is one possible cause.

Are there any known issues with USB performance which could explain this? I am happy to share the source code of the main program if that will help with the diagnosis.

enderWiggin10 · ‎11-19-2010

Hi RDVincent,

The nature of the USB host controller could be the cause of the issue. The USB port uses builk transfer method which can be problematic for

bufferred acquisitions and generations. This has been addressed a bit more with the X-Series USB devices, such as the USB-6351. A good way to avoid any issues with the bulk transfer method is to remove all other devices form the USB port so that it is a dedicated port. Also, you can increase the chunk size of data you are sendign to the buffer, and increase the buffer as well.

Also, please include in the DAQmx libraries and use a simulated device. Here is a Developer Zone article that addresses how to create a simulated device. This will help to point the finger at the hardware or the software.

Here is a KnowledgeBase that discusses DIO USB devices:

http://digital.ni.com/public.nsf/websearch/6AFDC8FF68F3F1D0862570DD004CFA78?OpenDocument

Best,

Adam
Academic Product Manager
National Intruments

RDVincent · ‎11-19-2010

Adam,

Thanks for the reply and the suggestion of the simulated device. I wasn't aware of that feature before, and it will help in this and future testing. The KnowledgeBase link you provided did not seem relevant, however. Perhaps I misunderstood it.

Upon further investigation, I find that the problem is stranger than I had even realized. First off, the software works fine on all systems with the simulated USB 6221 BNC device. However, using the real hardware, the timing of the software is actually dependent on the state of the analog input line. The problem is alleviated if I connect the analog input to a ground or valid port, and the problem returns if I leave the analog input disconnected.

Here's what I see a bit more specifically. If I start the control software with AI0 disconnected, the processing delay is .4-.5 seconds and the control software will not run properly. On the other hand, if I connect AI0 to my digital output in a loopback configuration and start the control software, the processing delay is in the acceptable 0.02-0.05 range. If I then disconnect AI0 with the control program running, I can see that the delay gradually increases back to the unacceptable half-second range. But, if I reconnect AI0 at any point, the delay immediately drops back to the low range.

To me, this seems to imply that state of the input pin is somehow changing the USB utilization, such that a floating pin is somehow causing more system interrupts or data transfers. Is this possible? Is there some way to work around this?

enderWiggin10 · ‎11-22-2010

Hi RDVincent,

Interesting observation.

-Could it be at all possible that the act of grounding the signal makes the FFT calculation much simpler?

-Can you try to take the tick count to find out how long the loop iterations take?

-If you disable the FFTs, does the code still lag as you describe?

-Besically, can you isolate the slow execution to a certain part of your code whe nthe signal is not grounded?

-Do you get the same results when you concect the pin to a known signal from a function generator?

-Are you rendering graphs every iteration of the loop?

-It could be that the graph renderings slow down the execution if the signal is constantly chagning the scale on the chart/graph and is floating.

-How fast are you acquiring on the 6221?

Best,

Adam
Academic Product Manager
National Intruments

RDVincent · ‎11-22-2010

Hi Adam,

The FFT calculation requires the same number of cycles regardless of the exact content of the data (assuming there are no floating-point errors or exceptions). The code spends the bulk of its time in the FFT so that is where the problem is most obvious. The other processing steps are also slowed during the pathological condition. I can get the system to run if I remove or reduce the input processing step, but the same basic relationship still exists.

I have run tests where the data recorded during a "ungrounded input" condition was then run through my test harness, which feeds the data to the processing states using emulated calls to the NI-DAQmx C library. In this case, the computations take the expected, brief time. So it really seems to be something about the activity of the USB interface or drivers.

When I do my "loopback" test, I am connecting to a digital output, so I can only generate a limited class of signals. However, I did run with both our standard digital output (in the range 0.5-2 Hz) and a pseudorandom pulse sequence near the Nyquist frequency (about 2300 Hz average). Neither seems to affect the timing of the calculations, nor does simply tying the analog input to ground.

As I may have mentioned, the experiment requires that we have a fairly low latency, so the buffer size of the output is quite small, on the order of 500 samples. If I increase the buffer size, the performance improves but the increased latency is potentially unacceptable.So again, while I can't say I've ruled out every other possibility, the behavior is very consistent and it seems to point to some kind of heavy overhead from the USB drivers or subsystem.

Also, keep in mind that on at least one laptop, my personal machine, this problem simply can't be reproduced. So it seems unlikely to be inherent to our processing code.

There is no graphical display of any kind. This is a pure command-line program which just uses "fprintf" and other standard Windows I/O calls to log information.

The sample rate is 5000 Hz.

-bert

RDVincent · ‎11-23-2010

Adam,

I have to eat my words a little bit here. I retested the code with my test harness and some newly-collected data from the 6221 with the input floating. I'm not sure why, but somehow the calculation is slowed down substantially in this case even when the driver is not involved.

While I can't easily account for why the FFT routine is slower, there must be some major change in floating-point timing that can occur under some circumstances. The code itself should execute the same number of instructions and library calls no matter what the exact content of the data buffer, but somehow either the sin() and cos() functions or the floating point instructions themselves, must take a different amount of time under some circumstances.

I'm a bit mystified, but it does seem that the NI drivers may be innocent. I'm going to continue to investigate, as I have no idea how this is possible.

Is there some way to feed a specific data pattern for file through the "official" NI simulated device?

RDVincent · ‎11-23-2010

Adam,

Here's the issue - the USB-6221 BNC was returning a buffer which was identically zero when the input is disconnected.

In my previous experience with data acquisition devices, it was generally safe to assume that some noise would be generated from any ADC. So we did not perform necessary range checks to avoid division-by-zero if the signal energy and range were zero. The resulting division by zero errors did not generate exceptions, but simply slowed the calculations to a crawl.

Sorry for the confusion.

enderWiggin10 · ‎11-23-2010

Hi RDVincent,

I am glad that you were able to discover the cause. Looking into the idea of it more, it could be possible that the lack of noise, and the rail appearing to be linear would make the signal appear to have no frequency at all, giving the result you are seeing. Nevertheless, I am glad that you were able to find the cause.

Best,

Adam
Academic Product Manager
National Intruments

Multifunction DAQ

strange usb performance disparity

strange usb performance disparity

Re: strange usb performance disparity

Re: strange usb performance disparity

Re: strange usb performance disparity

Re: strange usb performance disparity

Re: strange usb performance disparity

Re: strange usb performance disparity

Re: strange usb performance disparity