Thoughts on Stream-to-Disk Application and Memory Fragmentation

wired · ‎09-07-2007

Ran another overnight test, and again my timed loop had slowed down. I started the profiler and took a few snapshots (not sure if this is totally valid, since I started it while it was running), and it looks like excessive time is being spent in a 3rd party DLL that I am using to interface to a Star Controller/GPS card. The calls are happening in a different timed loop, but it seems to be affecting my queue loop. The timed loop with the DLL calls seems to be running at the correct rate, however (I have a loop counter indicator showing). Just one more thing to add to the list...

Ben · ‎09-07-2007

"a 3rd party DLL that I am using "

on that subject....

dll calls can be configured to NOT operate in the UI thread if they are thread safe.

If the UI thread is a bottle-kneck because the dll is running in that thread, configuring the dll call as thread-safe may help.

Can you run a test were the dll is not being used?

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

DFGray · ‎09-07-2007

Timed loops do have a relatively large processor overhead. Are you running on a Windows system? If so, you can get some fairly specific timing information using the following technique.

Create a VI which performs a Call Library to the Kernel32.dll. Use function OutputDebugStringA with a single CStr input. Use the WINAPI calling convention. Here is a screenshot of the LabVIEW 8.5 config page for Call Library.

The string input can be any unique identifier you wish. It is trivial to just use numbers, or you can put errors, subVI call chain lists, etc. into it. You can also embed it in a conditional compile structure to turn it on and off easily.
Place said VI in strategic locations throughout your code. Every time it is called, a message will be posted to the Windows debug location.
Download a copy of DebugView from Microsoft SysInternals.
Run DebugView while your code is running. Every time a message is posted, it will show up in the DebugView window, complete with processor tick count accurate timing. Note that there are a plethora of options for DebugView. Set it up the way you want it.
Save the DebugView data to a text file and analyze with LabVIEW to get your timing information (you can also use a spreadsheet - I find Excel's pivot tables to be highly useful when doing this)

Using a binary search style algorithm should highlight your performance problems with an hour or so of work.

Message Edited by DFGray on 09-07-2007 08:37 AM

Christopher.Seethaler · ‎09-07-2007

I’m not sure if these are covered already but it think they may be worth mentioning...

1. Every call to Obtain Queue creates a copy of the queue reference. So for every Obtain Queue you must do a Release Queue. If you are using Obtaining Queue within a loop or scattered throughout the program and this annoyance is not taken care of, this can obviously be a problem.

2. To my understanding, the memory space allocated to a given "Queue" in Labview can grow to any size as more data is enqueued. However I have been told that this memory space only grows, and does not deallocate itself as the queue flushes. So if at one point during your acquisition the disk was backed up, the Queue would grow rather quickly since acquisition is uninterrupted. Even if the disk catches up, and the Queue returns to a reasonable size, the memory allocated within labview for that queues data remains allocated.

I don’t have much experience with high speed data streaming in Labview so forgive me if I’m pointing out the obvious.

I would think the best way to achieve maximum throughput with limited jitter would be a combination of...

-free space defragment

-pre-allocation of a large sequential data file on the disk.

-Limiting the Write Queue memory footprint. This should be managed both automatically with 'max queue size', as well as programmatically limiting the size of each block of data enqueued.

-And the obvious last (or first) step would be getting more RAM and a Raid0 setup.

wired · ‎09-11-2007

Well, y'all will get a chuckle out of what I found to be the problem with my timed loop slowing down (I, however, am not amused). I discovered it by adding indicators to other loop indices and trying to right-justify them.

I posted this issue in another thread, but in a nutshell, LabVIEW apparently creates a numeric spinner type when you right-click on a loop index in a while loop and do a Create Indicator. The created indicator looks just like a normal numeric indicator. I left it left-justified. When I came in the next morning after running my loop overnight, the loop index had incremented far enough that I was not seeing the least-significant digit, which is why it had 'slowed down' by a factor of ten. I believe I would have immediately known this to be the problem if the entire field had been filled with numbers, however, because LabVIEW created a spinner indicator, there is a blank area where the spinner control would normally be. I assumed that I still had not filled the field completely, and that the loop had slowed down.

In trying do diagose the problem, I created some other loop indicators for other parallel loops and tried to right-justify them. They wouldn't right-justify all the way! It turns out that if you have your default control style set to System, LabVIEW creates the spinner indicator, and the 'blank area' is reserved for the spinner, which of course will never be seen since it's an indicator. I tried doing the same thing with my preference set to Classic, and LabVIEW did exactly what I expected. Right-justifying goes all the way to the right.

So, let this be a warning to everyone who uses System style controls - you may not get what you expect from a numeric indicator.

Ben · ‎09-11-2007

Hi wired,

Does this mean all of the observations in this thread are null and void?

If not, how would you summarize your observations reagarding performance?

Trying to stay on top of this,

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

wired · ‎09-11-2007

No, Ben,

That was a completely different issue that I felt needed to be addressed before getting back to original issue of the streaming performance. I'm in a bit of a multi-tasking mode right now, so I haven't been able to spend much time on the performance issue. I did make my own driver for the 3rd party card, so I am no longer calling a dll. Everything is being done using VISA calls.

One thing I did do that was helpful for my startup problem was to pre-allocate my flattened string queue and fill it with strings larger than my max expected size. So far, that has seemed to take care of the 'slow startup' problem. I need to perform some more tests, which I am running while working on another project. I will be sure to keep this thread updated with my results.

Kevin

Ben · ‎09-11-2007

"I will be sure to keep this thread updated with my results."

5-stars for the promise.

another 5-stars are waiting for your debreif.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

wired · ‎09-14-2007

Just a tidbit of an update: I have decided that writing my flattened data to disk in chunks does not make any noticeable improvement. I suppose the overhead of looping to extract the chunks from the string may negate any possible benefit of doing this. Either that, or the Windows I/O driver or Binary File Write vi is doing the same thing (chunking), and I am just duplicating what it would be doing itself...

I think my main problem is still the disk fragmentation: Sometimes I can fill the disk at the maximum rate, and sometimes I can't. I'm thinking it has to do with the sessions I previously ran, where I acquire large files and then delete them to make room for the next tests. Prewriting a file is not always feasible, because my file size could be up to 100GB, and that would take too long. What I'd really like to do is create a partition for the data that I can do a quick format on before the acquisition - I'll need to ask the users if this is permissible.

I can say for certain that doing the flatten to string in each of my tasks and enqueueing the strings, then dequeueing and writing to disk from another loop brovides a significant benefit over doing the semaphore-controlled write in each DAQ task.

That's all for now...

BSeegmiller · ‎03-24-2011

One thing that we've run across as we've run many of our LV delivered applications is file-system fragmentation. We test digitally-controlled actuators in a test environment, and we can have a couple of tests running at the same time, streaming about 800 Mb/day/actuator to the disk. After running these and observing various hits to the quasi-real-time performance (all that you can expect under Windows XP [lots of RAM and a good graphics card are our defenses against degradation]), we noticed that the disk was becoming badly, badly, fragmented. (Tool of choice at the time was Piriform's "Defraggler".) After looking up the issue on-line ("XP file-system fragmentation" was one search), I wondered if pre-allocating a file size and writing to the pre-allocated file would help. Certain caveats would apply.

One of the challenges we faced was that these test programs were controlled externally (by a test chamber), and could run for an arbitrary amount of time. At 800 Mbytes/day per device, we could get two days' worth of data in a data file before the file size crept past 2 gigabytes (2^31 -1 bytes). Logic was placed in the program to adjust just how often the data file would get rolled so as to keep the file sizes under the 2 gigabyte limit. This wasn't too hard. However, with two test programs running over a period of days, file-system fragmentation would always increase. It was not uncommon to see a 1.6 Gbyte file that had over 100,000 fragments. That was worrisome.

Assumptions about the file-system:

From what I'd read, and from my experience working with various operating systems, I knew that as data was written to disk, it was beholden on the operating system to allocate space on the disk for the data to be added to the file, updating the file-system structure. Our data streamed out at 50 Hz per test program, and with two programs running, we could see that allocating disk space would consume a certain amount of the system resources, especially as the disk filled. From what we'd read about the NT file system under XP, a contiguous portion of disk would be allocated when the file was initially opened, depending on the buffer size of the program doing the writing. Rather than mess with the buffer sizes used in LabVIEW (we are using 7.1, and file system buffer size settings seem to be somewhat hidden from the user), we decided to make a good upper limit guess on the size of the file we would be writing, and then when the test was concluded, to update the End-of-File pointer before the file was closed. VIs to perform these are in 7.1, and one of the challenges we faced was that for character-based files, an upper limit on the file size (for random access) was 2 gigabytes (2^31 -1 bytes), based on 7.1's use of a signed 32-bit integer for a file size. (We know there are packages that allow larger file access for 7.1, but we looked for simpler solutions, initially.)
Once the space on disk was pre-allocated, we assumed that the "find-space-on-disk" portion of the file-system code wouldn't need to be executed, since the space was already there, and all that would be needed would be to schedule the write of the data to the disk.

Caveats to using a pre-allocated data-storage file:

Never write past the end of file (meaning, always allocate a file bigger than you think you'll need, because it can be cut back). This would almost sure cause fragmentation.
Before closing the file, ensure the End-of-File pointer is set to the current file position. Otherwise, there's a problem with the file being padded with NULs out to the originally set length.

Results:

All the new files produced by this method have not produced any fragmentation (of the files themselves). The only fragmentation seen is in the XP-scheduled system snapshots and operating system logs. As a matter of course, we would ZIP our data files (using Info-Zip's command-line utilities), and those ZIP files would be fragmented, but these were much smaller than our original data files. I suppose our next step would be to use LabVIEW to zip up our data files, using the same trick to avoid fragmentation.

One thing we did was to sort of "manually" buffer the data that was to be written to disk, rather than let the operating system do it. After doing some arithmetic on the data sizes, we settled on setting a threshold so that writes would occur at about 1-second intervals. We aren't sure if this was the right thing to do -- second-guessing an operating systme is not always wise, unless one is sure of what one is doing.

Bob Seegmiller
NG UMS Ryan Aeronautical Center

LabVIEW

Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation

Re: Thoughts on Stream-to-Disk Application and Memory Fragmentation (adding file-system fragmentation)