I've been working on a LabVIEW 8.2 app on Windows NT that performs high-speed streaming to disk of data acquired by PXI modules. I'm running with the PXI-8186 controller with 1GB of RAM, and a Seagate 5400.2 120GB HD. My current implementation creates a separate DAQmx task for each DAQ module in the 8-slot chassis. I was initially trying to provide semaphore-protected Write to Binary File access to a single log file to record the data from each module, but I had problems with this once I reached the upper sampling rates of my 6120's, which is 1MS/sec, 16-bit, 4-channels per board. With the higher sampling rates, I was not able to 'start off' the file streaming without causing the DaqMX input buffers to reach their limit. I think this might have to do with the larger initial memory allocations that are required. I have the distinct impression that making an initial request for a bunch of large memory blocks causes a large initial delay, which doesn't work well with a real-time streaming app.
In an effort to see if I could improve performance, I tried replacing my reentrant file writing VI with a reentrant VI that flattened each module's data record to string and added it to a named queue. In a parallel loop on the main VI, I am extracting the elements from that queue and writing the flattened strings to the binary file. This approach seems to give me better throughput than doing the semaphore-controlled write from each module's data acq task, which makes sense, because each task is able to get back to acquiring the data more quickly.
I am able to achieve a streaming rate of about 25MB/sec, running 3 6120s at 1MS/sec and two 4472s at 1KS/sec. I have the program set up where I can run multiple data collections in sequence, i.e. acquire for 5 minutes, stop, restart, acquire for 5 minutes, etc. This keeps the file sizes to a reasonable limit. When I run in this mode, I can perform a couple of runs, but at some point the memory in Task Manager starts running away. I have monitored the memory use of the VIs in the profiler, and do not see any of my VIs increasing their memory requirements. What I am seeing is that the number of elements in the queue starts creeping up, which is probably what eventually causes failure.
Because this works for multiple iterations before the memory starts to increase, I am left with only theories as to why it happens, and am looking for suggestions for improvement.
Here are my theories:
1) As the streaming process continues, the disk writes are occurring on the inner portion of the disk, resulting in less throughput. If this is what is happening, there is no solution other than a HW upgrade. But how to tell if this is the reason?
2) As the program continues to run, lots of memory is being allocated/reallocated/deallocated. The streaming queue, for instance, is shrinking and growing. Perhaps memory is being fragmented too much, and it's taking longer to handle the large block sizes. My block size is 1 second of data, which can be up to a 1Mx4x16-bit array from each 6120's DAQmx task. I tried added a Request Deallocation VI when each DAQmx VI finishes, and this seemed to help between successive collections. Before I added the VI, task manager would show about 7MB more memory usage than after the previous data collection. Now it is running about the same each time (until it starts blowing up). To complicate matters, each flattened string can be a different size, because I am able to acquire data from each DAQ board at a different rate, so I'm not sure preallocating the queue would even matter.
3) There is a memory leak in part of the system that I cannot monitor (such as DAQmx). I would think this would manifest itself from the very first collection, though.
4) There is some threading/threadlocking relationship that changes over time.
Does anyone have any other theories, or comments about one of the above theories? If memory fragmentation appears to be the culprit, how can I collect the garbage in a predictable way?