Why is my fetch speed slow on my 6542 DIO?

bhetke · ‎04-07-2008

Hello,

My application is intended to aquire image data from a digital image sensor for display. The sensor outputs streaming data at 30 frames per second. The sensor outputs a clock which is running at 12MHz which I am using to sample 8 parallel data lines. 1 frame contians 400K samples. However the actual image data is contained in 316244 bytes whcih I am aquiriing with a multi record aquisition. My aquisition records 488 records of 648 samples each. I can aquire the data into onboard memory without a problem. Since the data is streaming from the sensor I have a while loop which contains niHSDIO initiate.vi and niHSDIO Fetch Multi Record (2D U32).vi. I fetch all 488 records which contian all of the image data.

I am expecting the fetch to occur at the rate of the PXI Bus which should be close to 100MB/s. This should mean that I can fetch my 316244 bytes in:

.316244MB * 1s/100MB= 3.162ms.

My problem is that the fetch is taking much longer than this. I modified my code to aquire just one frame and then I loop on the fetch command. I am measuring an actual fetch rate of aboutt 25 fetches per second. So each fetch is taking approx 40ms.

When the fetch takes this long I miss my start trigger on the next aquisition which causes my frame rate to drop to 15 frames per second because I am actually only aquiring every other frame.

Appreciate any help!

DJ_L. · ‎04-08-2008

Hey bhetke,

When you call the fetch multiple records vi, there are multiple data transfers that occur. Currently, the implementation of the HSDIO driver uses a separate DMA transfer for each record that gets transferred. For large number of samples per record and a relatively small number of records, this is not an issue and the performance is much better. There is some overhead in making a DMA transfer that can slow down your overall transfer speed, especially with a large number of small records. NI is aware of this issue (for tracking purposes - CAR ID: 37929), and some improvements are currently in progress to improve the performance in this particular case that will be released in a future version of the driver.

So a workaround to improve your data transfer speed is to reduce the number of records that you have to transfer, which in turn reduces the number of DMA transfers. This might also mean that you will want to increase the number of samples per record. This may or may not be possible for you, depending on what you are trying to do as far as triggering your samples goes, which I will discuss next.

So it sounds as if you are acquiring a video signal of 640x480 with some blanking pixels which give you the extra 8 pixels. I am assuming that you are acquiring each line (648 pixels in one line) of the image into their own Record, and then taking multiple records (488 lines per frame) to give you one image as a 2D array. So to reduce the number of records, what you can do is make one record with all of the image data in it. You can use some other types of triggering for the data. So for each line of your image, you can set up a Pause Trigger (check out the Dynamic Acquisition HW Pause Trigger Example program) so that the pixels acquired will be paused when the Hsync (Line Valid) goes low (or high depending on your setup). Or you could even use a data enable signal if you have one, which would not return your blanking pixels. Then you can use the Advance Trigger for your Vsync (Frame Valid) signal, which would tell the board that all the lines are acquired. Now all the image data is in the onboard memory as one record, not multiple records. Since each image is one record, this will reduce the number of Fetches, which will reduce the number of DMA transfers and will speed up your data transfer. This will also return the data as a 1D array instead of a 2D array, so you might have to take that into consideration for you image processing code. You will want to replace the Fetch Multi Record with a Fetch Single Record, which you can place inside of a loop to get multiple images. (One thing you don't want to do is place the HSDIO Initiate.VI inside of a loop.)

Is there any reason that you cannot use the above mentioned method of acquiring your images into one record instead of multiple records with the 6542? Also, let me know if I have made any incorrect assumptions. I hope this makes sense. Please let me know if you need any clarification or further help on this issue. Thanks, and have a great day.

Regards,

DJ L.

bhetke · ‎04-08-2008

HI DJ,

Thanks for the information and advice. You are correct with your assumptions about my application. I will try to set up my algorithim in the way you are suggesting. One thing that I dont quite understand is that you say not to put HSDIO Initiate.VI inside of a loop. My goal is to stream the images to PC memory so that they can be displayed realtime as long as the application is running. If I dont initiate a new acquisition for each frame (or perhaps each 2/3/4/5...) frames then the onboard memory will be full after I acquire 25 frames or so. (My card has 8MB/per channel.) I need to aquire continuously. Is there another way to keep acquiring data indefinitely other than looping? Perhaps using a software trigger that I never send to constanly fetch pretrigger samples will work, but can I trigger from the frame sync and row sync also in this mode?

I also realized since yesterday after some more playing around that it was not only the fetch speed that was slowing me down. Since I am currently looping on HSDIO Initiate.VI to capture each frame (See attached code) it is essential that the DIO card be ready for a start trigger in time for the next Vsync rising edge. The problem is that there is only about 400us between the end of 1 frame acquisition and the rising edge of Vsync for the next frame. Therfore I always miss the next frame even if I remove the fetch completely. So, this is a problem as well. I was thinking of getting around this by aquiring 20 frames or so and then fetching in a parallel loop. This way I would only lose a frame every 20 frames instead of every frame.

Thanks for any additional info you can give me!

Brian

Message Edited by bhetke on 04-08-2008 04:55 PM

bhetke · ‎04-08-2008

Hi DJ,

I acutally ended up doing what I said in the previous post and it worked well. I am using a software trigger that never triggers to continously acquire pretrigger samples. I acquire exactly the number of clks that occur between Vsync rising edges. this gives me close to 400K samples. I then use the start trigger and set it to start on a Vsync rising edge and I also set the advance trigger to Vsync rising edge. I fetch in a loop using a single record fetch. I then just use some array math and reshaping to get the data into the right format and delete the blanking pixels that I dont want. I am now streaming at 30fps! 🙂

Thanks for your help!

Brian

Sci-Vi · ‎03-13-2016

After meeting the same blocking issue these days with a PXIe-6548 HSDIO board, I wanted to update this thread with new information.

After standing there for more than 5 years with no corrective action, the old CAR # 37929 mentioned above by DJ_L has been closed by NI as "rejected" in 2013, with only mention of the workaround stated above : reduce rate of records - acquire less records of larger size.

It surely is one possible workaround, but it can prove to be very unpractical, if not totally impossible in some circumstances. In our case, we have to acquire small asynchronous bursts of serial communications, each only 12 to 16 bits long but at high clock frequency (100 to 200 MHz). Either we design the task so that it creates one record for each burst - but then we reach the limit of only a few thousands of records transferable to the PC per second, at least 5 times too low four our test. Or we perform continuous acquisition between the bursts, but this starves the system with a huge stream of useless data - and brute force is not the solution I would expect from this class of advanced devices.

We indeed probably found a workaround, with a continuous acquisition controlled by a Pause Trigger. But this makes the software part more complicated (sort the individual bursts of data out of the continuous flow), and, more problematic I think, implies generating and routing an additional signal to a PFI, by external, physical wiring. So we have to explain to the customer that his test PCB he just designed must be modified, and that this mod is required because the 12000 $ HSDIO device cannot transfer 25000 times 12 bits each second to the PC. That's not even 50 kB/s... What does HS in HSDIO stand for, again ? We surely have trouble next time we advise him to invest into top-end PXI instruments...

It is a pity that a board of that price range and performance level is limited to transferring 3000 to 4000 records of a few samples each per second only because of the DMA transfer strategy. Note that on NI-Scope devices a multi-record fetch is obviously much more efficient and thus probably optimized for single DMA access for multiple records. Now I have no idea about the technical insights - maybe nothing better can be done for hardware reasons ?

Anyway. A new CAR # 577552 has been filed some days ago for the same issue - let's hope a better fix can be proposed by NI this time. And if not, it would be great to at least update the device specifications to clearly state this rather low limit of records transfer rate, so that people designing test systems can be aware of it before spending 12000 $ for the device - which cannot even be simulated.

Vincent

marco_inzunza · ‎02-05-2019

Hello Vincent,

Did you ever find a more efficient way to transfer the data via Multiple Records Fetch? I am using a PXIe 5162, with a 1071 chassis and a 8381 controller. I am transferring (5000wfms*100pts +448)*2bytes = 1MB of data, with a trigger rep-rate of 20KHz (50us) in 0.55 seconds. The total acquisition time is 5000wfms/20,000 Hz = 0.25 seconds.

Using the ni Scope Stream To Memory Maximum Transfer Rate Single Channel.vi, I have measured the max transfer speed of about 700MBs/second. I am bottle necking somewhere, and I do think it has to do with the PXIe Bus and transfer speed of computer.

Computer system is a Windows 7, 8GB RAM, 2.0GHz Interl Xeon E5 Processor. Labview 2017.

PahlM · ‎02-06-2019

Hello Marco,

So there is no new information for the CAR report that Vincent mentioned at this time.

What type of trigger are you implementing? Edge, Digital, Immediate, Hysteresis, Software?

My understanding is that the 8318 is a Remote Control Module, so which cable are you using to connect the chassis to the 8381 controller?

Also for your setup, is it just the PXIe 5162 in slot 2 and the 8318 in the chassis and nothing else? Or are there other cards that are present?

marco_inzunza · ‎02-06-2019

Hello PahlM,

Thank you for your reply.

The trigger is a rising edge, using a sync output from a function generator with a rep-rate of 20kHz. Having checked this two different ways, the scope is triggering exactly, acquiring real-time (5000 waveforms), without missing any triggers.

Yes, the 8381 controller is connected with a MXI-cable (3 meters) that was included with the 8381.

For my setup, the chassis is the 1071, the 8381 controller is in stot 1 and the PXIe-5162 is in slot 4. There are no other cards in the chassis.

At this point, I am leaning towards a data transfer issue to the PC bus, or perhaps the windows operating system is not real-time.

PXI

Why is my fetch speed slow on my 6542 DIO?

Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?

Re: Why is my fetch speed slow on my 6542 DIO?