Machine Vision

cancel
Showing results for 
Search instead for 
Did you mean: 

Inefficiancy of memory allocation and access with a large number of buffers in NI-IMAQ

Dear all,

I am trying to create a program which has to acquire a large (10,000-100,000) number of images in a Ring acquisition, and then either save it to memory or save it to disk. Unfortunately I have ran into sever inefficency problems which seem to source in the the way the IMAQ driver allocates and manages the image buffers.

I have began my tries programming in VB.NET (2003) on a win XP system with 3GB of Ram, using the CWIMAQ ActiveX (NI-IMAQ 3.7.1 and Vision 8.2.1). My frame grabber is the PCIe-1429 with a full 10 tap camera, working at ~600Mb/s.

The program flow is very simple, and it is basically the "Triggered Ring" shipped-in sample code with some changed UI. I have checked and found the exact same behavior I am about to describe with the original "Triggered Ring" examples, so the problem I've ran into seem to be more general. Basically, all the programdoes is add a number of buffers to the cwimaq.images collection using cwimaq.images.add(numberofbuffers) method. later I acquire into those buffers, using cwimaq.start and cwimaq.stop and save the captured images to disk using cwimaq.saveimagetodisk.

I am working with a relatively large number of small images - between 10,000 to 100,000 images, of 1280x10 to 1280x1 (width X height, 8bit depth) respectively. This is because I need to acquire in frame-rates of 50,000-250,000 frames per second.

The program seems to work just fine as long as the number of buffers is small (~100 or even up to 1000 buffers). The time required for images.add seems to not depend heavily on the size of the buffers, i.e. to total memory size allocated, and the allocation time is fast. However, when I raise the number of buffers allocated to the numbers I need, the time required for images.add rises nonlinearly, it actully seems to slow down as the allocation progresses. This happens whether I issue just the one add command with the total number of buffers as the argument, or if I loop from 1 to the number of buffers and add a single buffer each time. To give a feeling to the scales of time -  allocating 1000 1280x10 buffers will take ~1 second or less, while ten times the number will take more than 30 seconds. The situation worsens when the number is further raised, getting to a excrutiating situations where I have to wait several minuets for the images.add method to finish.

In order to check where the problem lies I did several simple experiments. First I compared the allocation times to the time it takes to allocate and populate a new ArrayList with the same number of buffers of the same size. I.e. I created a new arraylist and then reptitively used arraylist.add to add buffers. The allocation time was mere fractions of the time the images.add method takes. It took perhaps 2 seconds to allocate 1.3 Gigabytes of 1280x10 bytes buffers.

I then did another experiment and switched to the VC high-level NI-IMAQ dll interface. I used the shipped-in hlring sample code. I ran this code in two ways-
1. The way the sample program is originally written:

for(i=0; i<NUM_RING_BUFFERS; i++)
        ImaqBuffers[i] = NULL;
    // Setup and launch the ring acquisition

    errChk(imgRingSetup (Sid, NUM_RING_BUFFERS, (void**)ImaqBuffers, 0, TRUE));

which let's the driver do the allocation on it's on.

2. Using malloc to allocate the buffers myself:

for(i=0; i<NUM_RING_BUFFERS; i++)
        ImaqBuffers[i] = malloc(AcqWinWidth*AcqWinHeight);
    // Setup and launch the ring acquisition

    errChk(imgRingSetup (Sid, NUM_RING_BUFFERS, (void**)ImaqBuffers, 0, TRUE));


Way number 1 took the same extremely long times that the ActiveX in VB.NET program took. In way number 2 the allocation itself was very quick (about the same as the Arraylist.add actually) and the imgRingSetup command took an extra ~15 seconds for a list of 65,000 buffers (1280x10 bytes long).

I gather that the 15 seconds is the over head the driver need to setup the ring's control sturcture (or however it's implemented) and the several more minutes way number 1 takes isdues to some ineffeciency in the allocation/setup mechanism.


I would very much like to be able to work with the ActiveX in VB.NET, we have all the code written, and VB.NET is the most common environment in my lab, which means that my codes can help others in their applications. Is there someway to overcome this problems?
Furthermore, IIran into another probelm when using the hlring sample - it seems like somewhere the maximum number of buffers in the bufferlist is defined with some uint16 or something similar, which means that I can't setup a ring with more than ~65000 buffers in it. If I try to do so I receive errorcode -1074396883. Is there any way to overcome this problem?

Another problem I have with the VB.NET ActiveX is that I can't seem to find an efficient way to save the data once it's acquired in the memory. I tried to use image.ImageToArray but it seems to be rather inefficient compared to memcpy when the images collection has a large number of images in it. Is there some way to access the buffer iteself directly? So that I can memcpy it in some way?

Thanks for your assistance,
Oded Ben-David
0 Kudos
Message 1 of 5
(4,100 Views)
Hi Obed Ben-David,

You are right that it won't be a linear increase in time when you are allocating multiple buffers.  That is partially due to how IMAQ handles buffers and how they are created in windows.  There is some additional error handling that causes each additional buffer to take longer to be allocated then the previous.  Is this a problem?  You should only have to allocate memory once and then once the program is finished all of the memory should be deallocated. 

Also with saving the images have you tried using the write functions?  The IMAQ Image to Array isn't actually saving the image, it is just changing the place in memory where it is being stored, and changing the way it is stored.  If you are using memcpy that isn't realy saving the image either as far as I know, but just putting a copy of the image in memory.  If you want these images to be saved you'll need to use the IMAQ Write JPEG or one of the other IMAQ Write functions. 

There was a lot of information in your message, so I hope I understood your question correctly.  If not let me know.

GG

Message 2 of 5
(4,087 Views)
Dear GG,

Thank you for your reply. I am aware that "saving" the captured images into a different part of the RAM (using memcpy or something similar) is not really saving them. And I of course save them to disk later (should have mentioned it in my original post). However, for my application I need to take several short "movies" in every run, each one capturing a rapid event (which takes about a tenth of a sec, hence the need for high frame rates). The events I am capturing are seperated in time by a second or two, so I need an efficient way to capture a quick movie, put it somewhere in the RAM for the duration of that run, and get ready to capture the next movie (the next event).

My idea was to setup the buffers ring, and then copy the captured images to some other place (in the RAM or on disk if that wasn't so time consuming). I could then use the same ring of buffer, already allocated and set up, to capture the next set of images. This will probably work if I'd use the high level imaq dll api, and allocate the buffers myself (except for the problem with the limit on the maximum number of buffers). I would much rather, however, to use the ActiveX controls, as they make everything much easier and quicker to program, as well as easier for other people from my group to use and maintain the code.

The problem with the long time allocation takes is twofolds -
1. With the number of buffers my application requires the allocation time (using the ActiveX, or letting the driver do the allocation in the dll api) is several minutes (about five minutes, perhaps a little more). This is extremely inconvenient, as I need to vary the number of buffers, buffer size etc. in between runs, and the several minutes wait will be a serious problem.
2. Notwithstanding problem 1, I would still not be able to efficiently copy the captured images to a different part of the ram in a rapid enough way. Assuming I want to use to ActiveX control.

I must admit that I do not understand why the error handling makes the allocation not being an O(n) operation, as I would have expected it to be. The same goes for the imageToArray convertion. I must be missing something pretty basic here.

I hope my problems are clearer now,
Thanks again for the assistance,
Oded



0 Kudos
Message 3 of 5
(4,076 Views)
Hi,

I see why this could be a problem now.  And  I think the best way to copy the images to another place in memory is using the memcpy call that you were using.  As for the allocating memory for the buffers the reason that allocating memory takes longer with each additional image is because it has to do error checking from all the previous buffers and also finding a place in memory for those buffers becomes more difficult when you are allocating a large number of buffers. 

But it seems the real problem is going to be the inability to save to disk at fast enough speeds.  You may want to consider using a RAID array to write multiple images to disk simultaneously.  If we are writing to disk fast enough, you shouldn't have to varry your buffer size inbetween runs.  But just set a number of buffers that allows you to write your images to disk (or do image processing) fast enough  that you aren't loosing images. I hope this helps.

Have a great day,
GG
0 Kudos
Message 4 of 5
(4,056 Views)
Hi,

I found a solution, which seems to work fine, I post it here in case anyone runs into the same problem.
The trick I used is as follows:
I allocate two large buffers, each big enough to accomodate half of the number of buffers I wish to use in the ring's bufferlist. I don't need the ring to be very long, in my case about 1000-2000 image buffers is enough.
Then I create a list of pointers, i.e.  **bufferlist, and set the pointers to point to address within the two large buffers I've allocated (the first half of the ring points into one buffer, and the second half into the other). As far as the driver is concerned I've just given it a list of pre-allocated image sized buffers.
This kind of allocation scheme allows me then to use a fast multimedia timer in order to monitor the acquisition into the ring. Whenver the timer's callback function detects that half of the ring is acquired, it uses memcpy to copy that entire half of the ring IN ONE memcpy call into another pre allocated buffer. This does the trick. Without the overhead of having to memcpy each buffer the copying takes next to no time in comparison to the time it takes to acquire half of the ring. This, in fact, is nothing but double-buffering on the acuisition ring.

Have fun,
o.

Here is a code snippet of the allocation, and then of the copying-

//allocation:
//Allocate th buffers myself -   
    //allocate in big chunks
    BigChunk1=(unsigned char *)malloc(ShouldBufferSize*NUM_RING_BUFFERS/2);//shouldbuffersize is the size of each image
    BigChunk2=(unsigned char *)malloc(ShouldBufferSize*NUM_RING_BUFFERS/2);
    //create the logical bufferlist, for passing to the driver
    ImaqBuffers = (unsigned char **) malloc(sizeof(*ImaqBuffers) * NUM_RING_BUFFERS);
    for(i=0; i<NUM_RING_BUFFERS/2; i++)
        ImaqBuffers[i] = &(BigChunk1[i*ShouldBufferSize]);
    for(i=NUM_RING_BUFFERS/2; i<NUM_RING_BUFFERS; i++)
        ImaqBuffers[i] = &(BigChunk2[(i-NUM_RING_BUFFERS/2)*ShouldBufferSize]);

//double buffering (inside OnTimer callback):
if (currBufNum > NUM_RING_BUFFERS/2 && first)
        {//done with the first half, copy it into the pre-allocated ring of big buffers- CopiedBufs
            memcpy(CopiedBufs[CopiedIndex],BigChunk1,ShouldBufferSize*NUM_RING_BUFFERS/2);
            first=false;
            CopiedIndex++;
            if (CopiedIndex == NUM_COPIED_BUFS)
                CopiedIndex=0;
        } else if (currBufNum < NUM_RING_BUFFERS/2 && !first)
        {//done with the second half
            memcpy(CopiedBufs[CopiedIndex],BigChunk2,ShouldBufferSize*NUM_RING_BUFFERS/2);
            first=true;
            CopiedIndex++;
            if (CopiedIndex == NUM_COPIED_BUFS)
                CopiedIndex=0;
        }


Message 5 of 5
(4,033 Views)