I've attached a couple of slides taken from a presentation explaining how data transfer across PCI works in the DAQmx driver. The example in the presentation discusses analog output, but the concepts are equally valid for digital output. The notes are pretty detailed and will hopefully give you an idea of what's going on. To attach the document, I had to change the file extension (the forums don't allow power point attachments by default) so you'll have to change it back to .ppt before opening.
In your case, it sounds like you can fix the problem by changing the Regeneration Mode Write Property from DAQmx_Val_AllowRegen to DAQmx_Val_DoNotAllowRegen. This can be done by using the DAQmxSetWriteRegenMode function. This will ensure multiple copies of the host buffer are not downloaded to the board's FIFO and create a worst case latency of one buffer. Hopefully this helps.
Unfortunately, the regeneration mode is a configuration time setting on the task and not a run time setting.
The sample mode and the regen mode are not the same. The sample mode (finite or continuous) controls how many total samples are transferred to the device. The regen mode controls whether the same data sample in the buffer can be transferred to the device multiple times or not. Allowing regeneration is equivalent to saying write once and transfer many. Disallowing regeneration is equivalent to saying write once and transfer once. For instance, with regeneration allowed, you can write one period of data to the buffer and set the sample mode to continuous to continuously generate a waveform without writing new data to the buffer. By setting the sample mode to finite, you can generate exactly N periods of the waveform without having to write new data. You can accomplish the same thing with regeneration disallowed, but instead of writing the waveform period once, you have to continually write new data to the buffer and keep up with the generation or you will receive a buffer underflow error.
Each approach has its trade offs. Regeneration is convenient when you want to replay a pattern. However, as you've discovered, with regeneration allowed, the driver will attempt to keep the onboard FIFO as full as possible to maximize throughput and resistance to jitter in the system. This means the worst case latency before seeing new data output is (FIFO Size + Buffer Size) samples. If your buffer size is significantly smaller than the FIFO size, this latency may not be acceptable. On some devices, you may be able to set the data transfer request condition to FIFO empty. This shrinks the worst case latency to Buffer Size samples, but it also effectively eliminates your FIFO. This means your overall throughput will suffer because you have less tolerance to jitter. With this setup, you'll likely only see throughput rates around 10 KS/s. The only other option I can think of to decrease latency while using regeneration is to increase the buffer size, pad the data appropriately, and clock the data out faster. This is pretty inconvenient, but it could work to solve your problem.
With regeneration disallowed, your worst case latency is simply the amount of data you've written to the buffer. This approach provides good latency, but it means your application will have to work harder to keep up with the generation since it has to continually write new data to the buffer. Without knowing more about your application, this is the approach I would recommend trying first.
I posted the Advanced DAQ System Development presentation and the demo VI on our ftp site: ftp://ftp.ni.com/outgoing/ under the Advanced DAQmx - NI Week folder.
I am curious, is the problem you mentioned in one of the forum threads?
I hope the presentation gives you some ideas!