I have cDAQ-9189 chassis with NI9264(AO)
sampleRate = 25000 Samples/sec
DAQmx driver version = 20.7
cDAQ-9189 hardware version = 20.0.0f0
Using Python NIDAQmx API, I create an apllication to generate 10 sin-waveforms in CONTINUOS mode w/o regeneration.
Basically, my application works, but i have important issue!
Output signals delays for 16 seconds with parameters, described in my application!
I attached ni_code.7z (.py inside) and difference.jpg
It seems cDAQ-9189 has very large internal hardware buffer.
I have tried to change ''output_onbrd_buf_size" but it is only one possible value for 10 channel = int(256/10))
task.out_stream.output_onbrd_buf_size = 25 #any other values it complains!
After task starting, every 1sec I print the difference between 'curr_write_pos' and 'total_samp_per_chan_generated'
In my application this difference increases to ≈408000 samp's and then stays at about this value.
See 'difference' figure
408000/sampleRate ≈ 16.3 sec <<-- this time is actually delay I have!
How can i solve this problem?
P.S. how cah i insert my code diretctly into the Subject
Solved! Go to Solution.
I couldn't open your attached file and don't really know any of the Python syntax anyway. Sometimes I can make partial sense of things though.
Anyway, the onboard FIFO for the 9189 isn't nearly big enough to explain even 1 second at 25 kHz. Specs say 127 samples per "slot" for non-regenerating output, 8191 samples split among all channels for regenerating output. Either way, not nearly enough to explain your delay.
I'd expect 'curr_write_pos' to accurately track the total # samples you've written to the task buffer on the app side. I have a nagging recollection that 'total_samp_per_chan_generated' might not be quite as truthful. I *think* I recall that it gets incremented when DAQmx moves data from the task buffer to the device FIFO. (Or possibly to the USB transfer buffer in the case of a USB-connected device.)
I draw two conclusions from seeing that difference grow over time to a large stable value:
1. You have a very large task buffer.
2. You're writing to your task buffer faster than your device is generating data. Eventually you fill the task buffer up and DAQmx blocks you from overwriting it, thus the difference stabilizes.
The first section has a steeper slope where you build a backlog of ~320000 in ~7 seconds. You would also have physically generated 7*25000 samples in that time, so you seem to be writing to the task buffer at a rate of around 500000 samples per 7 seconds.
The second section has a shallower slope. Here you increase the backlog by ~90000 in the next ~7 seconds. I'm guessing that this is a point where DAQmx starts to manage things to prevent you from overwriting data that hasn't yet been delivered to the device. Each attempt to write data requires a little extra time because you're gradually closing in on the point where 'curr_write_pos' would catch up to the marker that designates the next data to be transferred to the device. Part of the time is spent waiting for DAQmx to *allow* the write while preventing interference.
The third section is flat and stable. Your difference slightly exceeds 400000, so apparently, that's the size of your task buffer. It's full to the gills and every attempt to write new data is paced by the way DAQmx prevents overwriting. So it only allows you to write at the same 25000 samples/sec that the device is generating at (and thus the rate that the various buffers are making room for new data to be written).
Typically, an output task buffer size is defined by the amount of data you write into it before starting the task. I think you can explicitly make it bigger, but I don't think you can force it to be smaller, else the attempt to write excess data would error out since there's no where for the previously-written data to go until the task is started. So that means that to reduce your latency, you should probably write only a small chunk of data to the task buffer before starting it.
I'm not sure about this part, but you *might* then find that all subsequent writes need to be limited to be no bigger than the buffer size you established.
Thank You for the reply!
I did not have such issue(output delay) when i used NI cDAQ 9174 chassis (USB) with identical application. It wokred pretty fast.
Abour your conclusions:
1. task.out_stream.output_buf_size = 8192
I have try to decrease this value to 2048 and my "delta" is up to 800k!
2. I use task.register_every_n_samples_transferred_from_buffer_event(2048, my_callback)
Either way, when 2048 (1/4*task duffer) samples is transferet to the device - I refill bufer with the same number of samples(2048).
If i will take into the buffer < 2048 samples at this event - my task returns an underflow error.
If i will take into the buffer > 2048 samples at this event - nothing changes.
I'll try to implement my task in the Labview for more convinient discussion.
- I'm with you that the #'s involved don't seem to make sense. My speculations about buffer size weren't even close. And pacing your writes according to the firing of the DAQmx event *should* prevent the writes from getting way ahead of actual generation
- nevertheless, it still seems there's some important information in the shape of your graphs. There's 3 linear segments -- an initial steep one, a briefer and shallower one, and then a final horizontal one. There's a good clue there, though I don't know what it's trying to tell you.
- can you confirm that not only is that difference growing toward a very large final value, but that there's *also* a long latency time between when you write data to the task and when it gets physically generated on the device? You initially mentioned 16+ seconds delay. I'm hypothesizing about whether those properties might be buggy or lying vs. other aspects of the task acting inexplicably.
- what does that difference curve look like when you run on the USB chassis? It seems like an even *more* important clue that the behavior is chassis-dependent.
Kevin's assessment is correct that you are writing ahead of the data being generated on the front end. And the buffer position you are querying is for the position into the application buffer, and the event you are relying just signals when the data moves from the application buffer onto the bus (USB/Ethernet), not when the data is being generated on the front end. The device onboard buffer size is used more for onboard regeneration from the hardware.
Your application works on USB cDAQ because the available endpoint buffer and device FIFO on a USB cDAQ chassis is fairly small, no more than 10 KB in my experience. But on Ethernet cDAQ, there is a TCP buffer on the chassis to deal with network dropout and latency issue. And the TCP buffer on the chassis is greedy: it tries to buffer up as much data as it can. It is optimized for continuous streaming and not quick update. This TCP buffer is over 10 MB big if this is the sole DAQmx task running.
Thus if the application is relying on the event when data is transferred onto the bus, the application would fill up this TCP buffer with data such that if we want to output a new waveform, we have to wait for the data in the TCP buffer to be drained first. There is no way to programmatically control this TCP buffer. This can be a feature request to NI. But one way to do what you want to do is to limit the rate of data being written so that the TCP buffer does not become fill with data so then if the application can to update its waveform without a long delay waiting for the TCP buffer to drain.
- Yes, I can confirm that i have long latency betwen my writing new data and physically changing output values. Moreover, this latency depends on *differense*. Latensy [sec] = differense [samples] / sample_rate [samples/second];
Here i have the USB-chassis *differense*
I think the problem is in DAQmx driver or/and in Ethernet-Chassis firmware
I have try to override
Also i decided not to use the callback, but instead in my application I start a Thread which control the *difference* and write new data
if: difference < (buffer_size/2)
And it works! The *difference* is between 3000-6000 samples, but as this is not internal driver rezlisation i have some doubts about stability of this method (i have a frequently errors of buffer underflow);
Anyway, I think I should describe this behavior to the NI and request for a fix/feature.
Can you tell me how can I do it in a better way?
- I was previously unaware of the (apparently large and un-configurable) TCP/IP buffer on the 9189 chassis, I merely did a quick look at the spec sheet for the AO FIFO. Thanks to Jerry_X for the info!
- your workaround seems like a pretty good idea, but I'd agree that it seems prudent to have concerns about stability. Still, the bigger the nominal difference you target, the less likely you are to get buffer underflow errors.
- you might have another callback option available if you have another cDAQ module to "waste". You could set up an AI (or maybe a DI) task on another module and configure it to sample based on the AO task's sample clock. (The correct string to designate an external clock source might look something like '/cDAQ-9189/ao/SampleClock')
Decide what you want your minimum "difference" to be, for this example I'll aim for 5000 samples (which is 0.2 seconds at 25 kHz).
So then I'd configure the AI (or DI) task to have a callback for the DAQmx event for "Every N samples acquired" and set N to 2500. I'd write 7500 samples to the AO task, then start the AI task, and then start the AO task.
The first callback will occur after 2500 samples have been generated by AO and 2500 have been acquired by AI. In the callback function, you'll have written 7500 samples and generated 2500 for a difference of 5000. Now write another 2500. And you can ignore the AI data -- the AI task is just there to generate the event that initiates your callback.
Another way to do all this that I'd find even simpler is forget the callback and let the built-in features of DAQmx manage the timing. So no longer worry about the DAQmx event. After starting the tasks, just enter a loop where you:
- request a 2500 sample read from the AI task. Ignore the data
- write the next 2500 AO samples to the AO task.
- return to top of loop
Kevin has a good idea to basically count the AO sample clock, and we can do that without using an AI module. We can use the chassis counters. If we want a software event, we can create a hardware timed counter task and set AO sample clock as its sample clock and then relying "Every N Samples Acquired" event. Or if we just want to query how many sample clocks has been generated to figure out how many samples we need to write in our loop, we can just create a simple event countering task to count the AO sample clock.
👍👍 Definitely a better idea from Jerry_X! No need to waste an AI module when you can use one of the counters that's already built into the chassis.