Large data array to disk

sjandrew · ‎08-06-2015

I need to save a 1D array of 10,000,000 singles (32 bits -> 38MB) to disk. The "Write to spreadsheet file" complains that it runs out of memory when trying to execute, so I tried using "format into file," inside a for loop, in which I just pop off and write one sample per iteration. That solution is on track to consume two hours. Anybody know what the deal is here?

I'm pretty sure I can speed up the loop by removing the indicator that shows me the current loop iteration, but other than that, I'm stuck.

I'm running on LabView 8.6, on Windows XP.

Any help appreciated.

RavensFan · ‎08-06-2015

That doesn't sound that bad.

Can you attach your code?

Have you tried rebooting your PC and trying again?

sjandrew · ‎08-06-2015

I'm not allowed to attach the code. Otherwise, I would. I've rebooted the PC multiple times (and tried running with a smaller data set), and still come out slow.

Dennis_Knutson · ‎08-06-2015

What's the source of the data? How much arrives at once? Do you have to save it as ASCII text or will a faster and more efficient binary save be okay?

RavensFan · ‎08-06-2015

@sjandrew wrote:

I'm not allowed to attach the code. Otherwise, I would. I've rebooted the PC multiple times (and tried running with a smaller data set), and still come out slow.

Then attach the simplest form of code that you can that still exhibits the problem.

Even the act of just cleaning up the code to something you can attach might be enough for you to find the problem yourself.

nathand · ‎08-06-2015

@sjandrew wrote:

I need to save a 1D array of 10,000,000 singles (32 bits -> 38MB) to disk. The "Write to spreadsheet file" complains that it runs out of memory when trying to execute, so I tried using "format into file," inside a for loop, in which I just pop off and write one sample per iteration. That solution is on track to consume two hours. Anybody know what the deal is here?

Check your math here. You are writing the text version of your singles, not the raw binary representation. That means you have no idea how much space they'll require on disk. For example, if for each single you only need to write one digit, then each occupies 1 byte (plus a delimiter on each side), but chances are you're writing much larger, non-integer values (otherwise you wouldn't use a single). If you need to write 9 digits plus a decimal point for each single, plus a delimiter (tab or comma), then each value requires 11 bytes, which is substantially more than the 4 bytes that would be required if you wrote in raw binary. Converting to text also requires substantially more processing time, and the array to spreadsheet string can't determine in advance how much space to allocate (again, because it doesn't know how many characters are required for each value) so there's no way to allocate a large block of memory in advance.

What are you doing with this file after you write it? The best solution would be to write raw binary, if the program that will later read it can open such a file.

Otherwise, I'm going to guess that in your "format into file" approach, you are wiring the file path into the function, and not using the file refnum output. That means that for every single write, the operating system opens the file, scans to the end-of-file mark, writes the new data, flushes it to disk, and closes the file. The writes will be much faster if you open the file once, do all the writes, then close it. For even more speed, if you know approximately how large the file will be after you write it, you can set the end-of-file mark to that number of bytes (plus some extra), which will cause the operating system to preallocate the file to that size. When you finish writing you can then move the end-of-file mark back to the point where writing actually stopped.

sjandrew · ‎08-06-2015

Binary vs. text...good call. That, and cutting the indicator out of the loop cuts exec time down to 36 minutes. Still longer than I would expect, though. I wasn't opening the file every loop, either - I'm shifting the file pointer around each time.

I can't post anything of the code, since that machine is isolated and I don't have LabView on this one. But the code grabs some data from the measurement source, and is able to graph it in less than 10 seconds. Then, it hands the array of singles to the for loop, and writes each one to the file (now in binary). Nothing else is performed in the loop. Then it closes the file, and terminates execution.

I will always know how much data I have to write - how do I preallocate the proper size for the file? I assume that by not doing this, I am incurring a boatload of malloc() calls?

Also, in my first run, all my data after sample 10,000 was set to zero (as in, the measurement device showed that it was not zero, but after going through the VI it became zero). This was running with 10 million samples. Running with only 1 million samples, none of the data gets set to zero. Is LabView overwriting a buffer or something?

Trager · ‎08-06-2015

Did you try breaking the data into chunks and writing smaller chunks to one contiguous file? By doing a single write call, you're trusting that whatever LabVIEW is doing under the hood is optimized for a large data set, which is probably not a great assumption. It shouldn't take that long to code up a variant that lets you break the file down into n iterations that each append to the file.

nathand · ‎08-06-2015

If you're working with binary data, then you should have no problem writing the entire array at once. Do not use a loop. The data will take exactly the same amount of space on disk as in memory (plus a few extra bytes if you leave the "Prepend array or string size" input to true, the default behavior) and if you write it as a single block there's no need to preallocate anything. The calls to "malloc" aren't the problem in terms of writing to disk; the problem is that when you write small blocks of data at a time, the operating system only sets aside enough space on disk for that little block. When the next block of data arrives, if it can't be appended to the existing location, then it needs to set up a pointer to somewhere else on disk for that next small block, leading to fragmentation. This takes more time than a call to malloc to get more RAM. If you write a larger chunk of data at a time, or you set the end of file to a large value first, then the operating system allocates the space on disk once, and then you're simply filling that space.

I hate to suggest this, but if there's really no way to get the VI or a screenshot off the development computer, you might take a photo of the screen and post that.

nathand · ‎08-06-2015

sjandrew wrote:

Also, in my first run, all my data after sample 10,000 was set to zero (as in, the measurement device showed that it was not zero, but after going through the VI it became zero). This was running with 10 million samples. Running with only 1 million samples, none of the data gets set to zero. Is LabView overwriting a buffer or something?

Was this with the text version, or binary? With the text version, that might be where it ran out of memory (or at least, as much as it had written to disk when it ran out of memory). You really haven't provided enough information to provide a good answer. How did you determine that the data was all 0's - if it was in binary, did you read it using a LabVIEW program, or some other program?

LabVIEW

Large data array to disk

Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk

Re: Large data array to disk