Continuous write analog voltage NI cDAQ 9178 with callbacks

glucenag · ‎04-15-2020

Hey there,

I am using the nidaqmx-python and would like to understand how to write an analog output, hardware timed, using the registered callbacks, i.e. making use of register_every_n_samples_transferred_from_buffer_event, which I guess is the best way to do it.

To test my ability, I thought I'd try outputing an ever increasing voltage, until I stop it by pressing ENTER, so I wrote the code below (also attached since I can't make sense of how I should format this so that it appears as source code).

The goal of the code below is to increase the voltage output step by step at some frequency. Given the parameters I was expecting an output refresh rate of 1kHz (clock set to 2000 Hz sampling rate and callback called every 2 samples read by device from buffer), and to check this works well I set my code to increase the voltage by 1/1000 Volts every callback. In principle this should thus go from 0 to 1 Volt in 1 second. Well it does not, and there is another problem (see below).

(this is NI cDAQ 9178 plugged via USB on a Windows 10 computer, and with the NI 9264 output module wired to a multimeter so that I can check the output changing)

# Continuous write single channel
import numpy as np

import nidaqmx
from nidaqmx.stream_writers import (AnalogSingleChannelWriter)
from nidaqmx import constants

# global datasize
global bufsize
global bufsize_clk
global rate_outcfg
global rate_callback

bufsize = 2  # issues warnings as is; can be increased to stop warnings
bufsize_clk = 2
rate_outcfg = 2000  # chosen at random; constraint is to update output at 100Hz so whatever works would be fine here
rate_callback = 2  # basically rate_outcfg/100 as I would like to update output at 100Hz (note setting it to that does not work)

global data
data = np.empty((bufsize,))  # cannot be vertical for nidaqmx to work
data[:] = 0  # starting voltage in Volts

global stream

def my_callback(task_idx, every_n_samples_event_type, num_of_samples, callback_data):

    data[:] = data[:] + 0.001

    stream.write_many_sample(data, timeout=0.0000001)  # , timeoutconstants.WAIT_INFINITELY)=
    return 0

def setTask(t):
    t.ao_channels.add_ao_voltage_chan("cDAQ2Mod8/ao0")
    t.timing.cfg_samp_clk_timing(rate=rate_outcfg, sample_mode=nidaqmx.constants.AcquisitionType.CONTINUOUS,
                                   samps_per_chan=bufsize_clk)  # last arg is the buffer size for continuous output

task = nidaqmx.Task()
setTask(task)

stream = AnalogSingleChannelWriter(task.out_stream, auto_start=False)  # with auto_start=True it complains

# Call the my_callback function everytime rate_callback samples are read by device from PC buffer
task.register_every_n_samples_transferred_from_buffer_event(rate_callback, my_callback)

stream.write_many_sample(data)  # first manual write to buffer, required otherwise it complains it can't start

# time.sleep(1)

task.start()
input('hey')  # task runs for as long as ENTER is not pressed
task.close()  # important otherwise when re-running the code it says specified device is reserved!

I have two issues with this simple code:

1/ It yields warnings:

```

While writing to the buffer during a regeneration, the actual data generated might have alternated between old data and new data. That is, while the driver was replacing the old pattern in the buffer with the new pattern, the device might have generated a portion of new data, then a portion of old data, and then a portion of new data again.
Reduce the sample rate, use a larger buffer, or refer to documentation about DAQmx Write for information about other ways to avoid this warning.
error_buffer.value.decode("utf-8"), error_code))
Traceback (most recent call last):

```

I have tried many things but no success so far avoiding these messages.

2/ This code is supposed to increase the voltage (by 1/1000 Volts) every 1/1000th of a second, that is, at 1kHz. However it clearly does so only once every 1/100th of a second, since I need to wait 10 seconds to get to 1 Volt. Somehow something is not working well, and somehow that's exactly by a factor of 10 which is very puzzling...

Any input would be greatly appreciated on how to do this best or on why it is not doing what I would have expected it to do!

Kevin_Price · ‎04-15-2020

A bunch of things, but unfortunately I don't really know anything specific about the python DAQmx API. These are a few general thoughts that will hopefully nudge you in a useful direction, but I probably can't guide you all the way to where you're heading.

The DAQmx driver is very versatile and capable, but as a consequence it certainly isn't lightweight in *all* possible ways. (For example, you can do buffered streaming at impressively fast sample rates, but when you poll 1 sample at a time you'll be constrained by some of its overhead.)

1. First and foremost, there are going to be some constraints imposed by working with a cDAQ system due to its USB or Ethernet connectivity. You simple are *not* going to be able to do 1000 interactions a second with a device that's connected by USB or Ethernet.

2. Over USB, the DAQmx driver is much more optimized for net throughput than for low latency. The implication is that you *should* define a much larger buffer, it's no surprise that a 2-sample buffer isn't big enough. Another implication is that DAQmx probably doesn't naturally want to transfer data in 2-sample chunks. It performs better (in terms of throughput) by transferring larger chunks less often rather than tiny chunks very often.

3. A typical DAQ rule of thumb is to service the task buffer about 10 times a second. I might aim for more like 5 under USB. That would equate to a callback for every 200 samples transferred. And I'd make my buffer size larger than that (maybe 2x or 3x?).

4. I'm not sure why you see your suspicious factor of 10 in timing. It's possible though that the factor of 10 is coincidence. The real constraint might be a 10 msec minimum interval for callbacks or for USB transfers or some other thing. You can investigate by setting up a situation that should generate callbacks at, say, 4 msec intervals or 25 msec intervals. That'll give you more than 1 datapoint for your timing observations.

5. Your code comments refer to wanting to be able to update output signals at 100 Hz, presumably under software control. I assume this means you want <= 10 msec latency between your software deciding on an output value and having that output appear as a real world signal.

This is pretty tricky to accomplish with a buffered output task, and (I suspect) likely impossible when the device is connected by USB or Ethernet. You may need to approach this with an unbuffered, software-timed, "on-demand" task and then live with the corresponding irregularities and uncertainties of software timing.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

glucenag · ‎04-16-2020

Hey there,

first of all, thank you so much for replying! Your information is very useful to me and I am pretty sure this is going to help me on the road to achieve my goals.

I have some comments:

2/ Is that specifically for output ? I'm asking because, for input, I was able to write simple code in a similar fashion which allowed me to acquire analog voltage at high sampling rates (by little chunks as well). This is actually why I originally thought I could easily do the reverse (output, that is). Is acquiring / writing over USB differently affected by the USB then ?

3/ Same comment as above, basically. Any extra light on this would be great!

4/ Yes I was thinking about that exactly. Thanks !

5/ Copy that, I'll try the software timed version as well (not sure how to do that; I guess the callback registering thing is hardware timed by definition, so I'd have to go manual and write a loop with a writing function in it I guess). Also, yes, your assumption is correct: I would like such low latency.

Again thank you so much for this. I'm going to dig in it now!

Gustavo

Kevin_Price · ‎04-16-2020

2. I'm not aware that this caveat should apply differently for output than for input. As to why analog input with high sample rate and small buffer seemed to work for you, I have an educated guess.

For context, first see this article. For input tasks, the buffer size you ask for is often not the buffer size you *get*. So let's say you wrote code for a 1 kHz sample rate and a *requested* buffer size of 2, and set up for a callback every 2 samples. The *actual* buffer size would be 10000 or 10 seconds worth.

I don't know the *exact* mechanism for how USB "Signal Stream" transfer occurs and how the corresponding callbacks get fired off. But let's just suppose for a moment that the stream delivers something like 20 samples per packet 50 times a second. I would guess that the driver queues up 10 callbacks. The first callback executes and you retrieve the first 2 of those 20 samples pretty much instantly (because they've already been transferred into system memory). The second callback then fires off pretty immediately and you get the next 2 samples. And so on.

Maybe this all happens fast enough that your app can keep up. Or maybe it isn't *quite* keeping up, but you won't know it until/unless you get a buffer overflow error. If you're *almost* keeping up (say about 80%), you're slowly accumulating unread samples in your 10000 sample buffer. But it's gonna take quite a while (roughly 40-50 seconds) before you'd get an overflow error to become aware that you're lagging behind.

That's one hypothetical possibility.

3. The rule of thumb wouldn't apply to situations where low latency is the priority. A lot of typical data acq and signal generation apps don't need real-time low latency. They generate pre-defined stimulus signals and collect data for post-processing later. The rule of thumb works well in those cases, allowing live displays with only *moderate* latency, enough for an operator to see what's going on.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

glucenag · ‎04-17-2020

Hey there, thanks for this other reply too!

I guess I'll be switching to on-demand timing for my purposes, and it's great that I know why that is.

However, before changing gears, I thought I'd try and make the above code work, even if it is for very low latency and large buffers like you suggested. But I can't seem to manage. It's funny, even with large buffers as in the code below and only aiming at refreshing 5 times a second, I still have two problems:

1/ I get the same warning about buffer regeneration. I can get around it by increasing my bufsize parameter, but that makes problem #2 even worse.

2/ The timing is totally off. Basically it seems that writing the samples takes more or less 1/5th of a second in this configuration. What I mean is that instead of reaching 5 Volts in 5 seconds (see code below, where we increase by 0.2 Volts 5 times per second), it does so in approximately 10 seconds. This is not a coincidence. I have checked by using the timeout parameter of write_many_sample: it takes this function 200ms to be executed, approximately. Then, on top of that, the callback is called once every 200 samples, so basically it is taking double the time. Increasing bufsize makes this proportionally worse.

I undestand what you have said and why I need to switch to on-demand timing for what I want to achieve, but still, I feel like I should be able to make this simple buffered task work, just to understand how to it for moderate rates. Do you see something I am missing, or maybe this way of writing code is just not intended to work like that ? I thought the whole idea of these callbacks was that you write, say, 200 samples (rate is 1kHz for example) and, while they are being generated by the device (which reads from the PC buffer), the application sends out the next 200 samples. But somehow what this does is: the callback is called everytime the device has read 200 samples from the buffer, and then it takes the callback another 200ms to write out 200 samples at 1kHz.

Any light on this would be fantastic. I really want to get to have basic understanding of how to make the above coder work, even for moderate timing.

Code:

# Continuous write single channel
import numpy as np

import nidaqmx
from nidaqmx.stream_writers import (AnalogSingleChannelWriter)
from nidaqmx import constants

global bufsize
global bufsize_clk
global rate_outcfg
global rate_callback

bufsize = 200  # issues warnings as is; can be increased to stop warnings
bufsize_clk = 200
rate_outcfg = 1000  # chosen at random; constraint is to update output at 100Hz so whatever works would be fine here
rate_callback = 200  # basically rate_outcfg/100 as I would like to update output at 100Hz (note setting it to that does not work)

global data
data = np.empty((bufsize,))  # cannot be vertical for nidaqmx to work
data[:] = 0  # starting voltage in Volts

global stream

def my_callback(task_idx, every_n_samples_event_type, num_of_samples, callback_data):

    data[:] = data[:] + 0.2
   
    stream.write_many_sample(data)  # , timeoutconstants.WAIT_INFINITELY)=
    # Clearly it takes between 1 and 2 seconds to write the data, with bufsize = 1000
    # With bufsize = 100 it takes between 0.1 and 0.2 seconds
    # Basically it takes him slightly over a bufsize/outsamprate-th of a second, hence doing a bit more than doubling the time to reach N volts
    return 0

def setTask(t):
    t.ao_channels.add_ao_voltage_chan("cDAQ2Mod8/ao0")
    t.timing.cfg_samp_clk_timing(rate=rate_outcfg, sample_mode=nidaqmx.constants.AcquisitionType.CONTINUOUS,
                                   samps_per_chan=bufsize_clk)  # last arg is the buffer size for continuous output

task = nidaqmx.Task()
setTask(task)

stream = AnalogSingleChannelWriter(task.out_stream, auto_start=False)  # with auto_start=True it complains

# Call the my_callback function everytime rate_callback samples are read by device from PC buffer
task.register_every_n_samples_transferred_from_buffer_event(rate_callback, my_callback)

stream.write_many_sample(data)  # first manual write to buffer, required otherwise it complains it can't start

task.start()
input('hey')  # task runs for as long as ENTER is not pressed
task.close()  # important otherwise when re-running the code it says specified device is reserved!

Thanks!

Gustavo

Jerry_X · ‎04-17-2020

Continuous generation is optimized for streaming throughput. It is waiting for some minimal of data to be collected before submitting the USB transfer request to the OS or a flush timer will fire off to flush the data. If it submits every sample of data as soon as possible, then the throughput would be really low with all the overheads.

It is possible to make what you want to do to work. You just have to do it without callback, In this specific case, your waveform is a simple ramp up wave step function. You can just construct entire data set as an array and write all of the it into the buffer at once. With regeneration, then your waveform will be regenerated continuously with the clocking timing you have set up until you stop the task. This is a typically streaming workflow.

Kevin_Price · ‎04-18-2020

I would say that you've set yourself up for much extra difficulty by planning to replace your buffer contents all at once. Your bufsize, callback # samples, and data array are all the exact same size. Try doubling bufsize to 400 while leaving the others at 200 and see if that doesn't help.

Theory: the driver has different criteria than your app does for servicing the buffer. I don't know what the rules are exactly, but it can take into account things like the board's internal buffer size, the task sample rate, your software task buffer size, the fact you're connected over USB, etc. In general, you should never count on the driver to have the same clump size and clump transfer rate as your app.

To illustrate, let's just suppose that the driver decides to transfer in clumps of 32 samples at a time. You get everything initialized and started, buffer size of 200, filled with 0 values. The driver will get to a point where it's transferred 192 samples, no callback yet, then it transfers 32 more (wrapping around circularly, starting to regenerate data at the beginning of your task buffer) for a total of 224. It will fire off your callback function, while remembering that it has a headstart count of 24 toward the next callback (I presume it does this).

Your callback function now wants to replace the entire 200-sample task buffer. Each time you write to the buffer, DAQmx remembers where that writing left off. The next write starts one sample beyond where the last write ended. Since you filled the whole buffer, this next write needs to start over from the beginning. Do you see the possibility of conflict looming?

You don't know that the driver has actually transferred more than 200 samples when it fires the callback. So you'd expect that samples 200-399 (that you've passed into 1the driver) get generated out in the real world as samples 200-399. But in fact 224 have already been transferred to the device. It's *too late* to generate them in the real world as #'s 200-399. So now what?

At some point in the past, maybe about a decade ago, DAQmx got smarter about managing this conflict. Your app's successive calls to DAQmx Write need to start from where the last one left off. So your next write needs to get laid down into the buffer starting over again from the beginning. Physically it's at index 0, logically it'll become sample #400.

The driver can quickly copy into physical indices 0-23 because samples 200-223 have already been transferred and are ripe for replacement. But then the driver stops and waits. At some point, 32 more samples are transferred over USB and 32 more of your 200 samples get written to replace them. And so on.

Your call to DAQmx Write gets stuck waiting for the *opportunity* to overwrite the buffer without overtaking the point in the buffer where the next transfer is due to occur. You found this wait time to be right around 200 msec, pretty much the time it takes to wait for the whole buffer to transfer once.

And this is also why your 5 seconds worth of writes takes 10 seconds to appear in the real world. Each buffer full you get to write ends up generating twice before the next write finishes replacing that old data.

Ok, so now what happens if you follow my advice to increase your buffer size to 400? (Also mentioned in point #3 of msg #2). First off, you'll prefill the buffer with 400 0-values. You get your first callback after 224 samples and can *immediately* write all 200 new values into indices 0-199. The next callback comes after 416 samples and you can *immediately* write all 200 values into indices 200-399.

By leaving a little "breathing room" in the buffer, your interactions with it can happen much more quickly because the driver doesn't have to manage conflict avoidance. You've avoided the conflict by replacing *part* of the buffer contents at a time, based on a callback which signals you when the buffer is entirely ready for that part to be replaced.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

glucenag · ‎04-20-2020

Hey there, thanks again for sticking with me on this one.

I think I have understood your reply and it makes sense. However increasing bufsize to 400 (or something else for that matter) increases the time it takes to reach 5 V (proportionally to the increase in fact), and not the opposite as you suggested. I guess we are missing something here.

I'm still very frustrated by this, honestly. I get that one should not aim for very low latency but it seems I can't even aim for decent (200ms) latency here. I'd still love to manage this in this way, but I guess I'm going to have to ditch that goal.

If you still have other ideas I'm all ears though! And in any event, thanks a lot for your answers. It's helped me understand a lot of things.

Gustavo

Kevin_Price · ‎04-20-2020

I'm also surprised that the buffer size increase didn't help. I'm even more surprised at how much it hurt.

All this discussion about lowering the latency for buffered, hw-clocked output is likely to be moot for your present application. You're likely to get better (though less consistent) performance from on-demand software-timed output updates. Nevertheless, we should be able to make some sense of this stuff.

I only program in LabVIEW and don't know any text API's in detail. In LabVIEW, there are "property nodes" for DAQmx tasks. I would expect that any *full* text API would expose the same capabilities somehow, but I wouldn't know what to suggest.

There's a DAQmx Write property that lets you set the regeneration mode explicitly. The default behavior is to allow regeneration and that might be unhelpful here. It can be set to *not* allow regeneration, which might be the better choice.

I note that the specs for the 9178 chassis show that the on-board FIFO for analog output is very much smaller for non-regenerating tasks. It may be the case that you need to *explicitly* declare the task to be non-regenerating in order for the driver to use this smaller on-board buffer.

In LabVIEW there are also DAQmx Channel properties that can affect the way data gets transferred between the task buffer on your PC and the DAQ device (which is then subject to additional buffering from the on-board FIFO). Some relate to USB transfers, though I can't speak to them much as I haven't needed to use USB-connected devices in demanding apps that required such tweaks and fine-tuning.

Try to set your task up explicitly to be non-regenerating and give the bufsize of 400 another go. If there are still issues, post the latest code. I don't know the details of the python DAQmx API, but I'll be able to follow most of it.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

glucenag · ‎04-23-2020

Hey there,

here's what happened: I had previously had the idea of setting the task to not regenerate the buffer. But I did not manage and also I did not really understand what that meant so I stopped trying. With your input now I've given it another spin and it turns out I was just reading the doc wrong: now I am able to set the task to not regenerate the buffer. I did so, even leaving the bufsize to 200, and the result is essentially what I was originally aiming for: no delay, no warnings. I'm still not 100% sure I understand what is going on exactly here with regeneration but I'm going to play around with this for a bit.

Interestingly now, if I double bufsize and set it to 400 (say, when the other value is 200), it takes double the time (so there's a delay of 100% compared to what I would consider normal, say). With 200 (matching the callback rate vs sample rate value), as said above, it works as I had naively expected at first: you get to 5 volts in 5 seconds exactly as per the following (latest) code:

# Continuous write single channel
import numpy as np

import nidaqmx
from nidaqmx.stream_writers import (AnalogSingleChannelWriter)
from nidaqmx import constants

global bufsize
global bufsize_clk
global rate_outcfg
global rate_callback

bufsize = 200  # issues warnings as is; can be increased to stop warnings
bufsize_clk = 200
rate_outcfg = 1000  # chosen at random; constraint is to update output at 100Hz so whatever works would be fine here
rate_callback = 200  # basically rate_outcfg/100 as I would like to update output at 100Hz (note setting it to that does not work)

global counter_limit
counter_limit = 0  # 0 to update every callback call (which is supposed to be at 100Hz rate)

global data
data = np.empty((bufsize,))  # cannot be vertical for nidaqmx to work
data[:] = 0  # starting voltage in Volts

global stream

global counter
counter = 0

def my_callback(task_idx, every_n_samples_event_type, num_of_samples, callback_data):
    global counter
    global counter_limit

    if counter == counter_limit:  # with 100, voltage will change at 1Hz given the above parameters (should be config better)
        counter = 0
        data[:] = data[:] + 0.2
    else:
        counter = counter + 1

    stream.write_many_sample(data)  # , timeoutconstants.WAIT_INFINITELY)=
    # Clearly it takes between 1 and 2 seconds to write the data, with bufsize = 1000
    # With bufsize = 100 it takes between 0.1 and 0.2 seconds
    # Basically it takes him slightly over a bufsize/outsamprate-th of a second, hence doing a bit more than doubling the time to reach N volts
    return 0

def setTask(t):
    t.ao_channels.add_ao_voltage_chan("cDAQ2Mod8/ao0")
    t.timing.cfg_samp_clk_timing(rate=rate_outcfg, sample_mode=nidaqmx.constants.AcquisitionType.CONTINUOUS,
                                   samps_per_chan=bufsize_clk)  # last arg is the buffer size for continuous output

task = nidaqmx.Task()
setTask(task)

task.out_stream.regen_mode = nidaqmx.constants.RegenerationMode.DONT_ALLOW_REGENERATION

stream = AnalogSingleChannelWriter(task.out_stream, auto_start=False)  # with auto_start=True it complains

# Call the my_callback function everytime rate_callback samples are read by device from PC buffer
task.register_every_n_samples_transferred_from_buffer_event(rate_callback, my_callback)

stream.write_many_sample(data)  # first manual write to buffer, required otherwise it complains it can't start

task.start()
input('hey')  # task runs for as long as ENTER is not pressed
task.close()  # important otherwise when re-running the code it says specified device is reserved!

# NOTE somehow once ENTER is pressed it takes some seconds to actually stop if bufsize is very large, I don't know why

Any more thoughts ?

Multifunction DAQ

Continuous write analog voltage NI cDAQ 9178 with callbacks

Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks

Re: Continuous write analog voltage NI cDAQ 9178 with callbacks