11-01-2010 02:28 PM
For the first time I have tried using Read from Binary File on sizable data files, and I'm seeing some real performance problems. To prevent possible data loss, I write data as I receive them from DAQ, 10 times per second. What I write is a 2-D array of double, 1000 data points x 2-4 channels. When reading in the file, I wish I could read it as a 3-D array, all at once. That doesn't seem supported, so I repeatedly do reads of 2-D array, and use a shift register with Build Array to assemble the 3-D array that I really need. But it's incredibly slow! It seems that I can only read a few hundred of these 2-D items per second.
It also occurred to me that the Build Array being used in a shift register to keep adding on to the array, could be a quadratic-time operation depending on how it is implemented. Continually and repeatedly allocating bigger and bigger chunks of memory, and copying the growing array at every step.
So I'm looking for suggestions on how to efficiently store, efficiently read, and efficiently reassemble my data into the 3-D array that I need. Perhaps I could simplify life if I had "write raw data" and "read raw data" operations that write and read only the numbers and no metadata.then I could write the file and read it back in any size chunks I please -- and read it with other programs besides. But I don't see them in the menus.
Suggestions?
Ken
Solved! Go to Solution.
11-01-2010 03:17 PM
Have you tried auto-indexing the 2D data out of the loop to form the 3D array?
11-01-2010 07:53 PM
11-01-2010 07:57 PM - edited 11-01-2010 07:59 PM
@altenbach wrote:
just read it as 1D and reshape to 3D right away.
Would he run into problems with the 4 byte array length headers?
Also, Ken, are you able to supply a pared-down version and a sample data file?
11-01-2010 08:03 PM
11-02-2010 02:17 PM
I quote the detailed help from Read from Binary File:
data type sets the type of data the function uses to read from the binary file. The function interprets the data starting at the current file position to be count instances of data type. If the type is an array, string, or cluster containing an array or string, the function assumes that each instance of that data type contains size information. If an instance does not include size information, the function misinterprets the data. If LabVIEW determines that the data does not match the type, it sets data to the default for the specified type and returns an error.
So I see how I could write data without any array metadata by turning off "prepend array or string size information", but I don't see any way to read it back in such bare form. If I did, I'd have to tell it how long an array to read, and I don't see where to do that. If I could overcome this, I could indeed read in much larger chunks.
I'll try the auto-indexing tunnel anyway. I didn't tell you the whole truth, the 3-D array is actually further sliced up based on metadata that I carry, and ends up as a 4-D array of "runs". But I can do that after the fact instead of trying to do it with shift registers as I'm reading.
Thanks,
Ken
11-02-2010 02:40 PM
Can you post your code? At least the read/write loops?
11-02-2010 02:53 PM
My generic advice in these situations is to read the largest practical chunk (usually the whole file) as a 1D array of bytes. I then try to treat that array as read-only to avoid memory issues and parse the file using strategic applications of array subset followed by Unflatten from String. You can chain that function so the next one starts where the previous one left off in the string, very helpful for mixed datatypes. Another trick I use is to prepend an array size to data that does not have it. If I know the length, I'll flatten an I32 to string, prepend that to the data and then Unflatten. It will peel off the array and pass the remainder of the string downstream, saves a lot of string subsetting and datatype dependent math.
Works like a charm, especially if you have control over the binary format. After a few times you learn useful tips like grouping datatypes for easy extraction, building in chunk size info for rapid traversing, etc.
11-02-2010 04:47 PM
Ken Brooks wrote:So I see how I could write data without any array metadata by turning off "prepend array or string size information", but I don't see any way to read it back in such bare form. If I did, I'd have to tell it how long an array to read, and I don't see where to do that. If I could overcome this, I could indeed read in much larger chunks.
You can calculate the total number of elements from the file size and the bytes/scalar (8 for DBL).
You could even write the size header for a 3D array explicitely, then append flat data (no data sizes). Whenever you append data, you can also update the header for the correct current 3D array.
If you do this right, you can later read the 3D array directly.
11-03-2010 08:45 AM
Thanks, Darin. That will be helpful, but I'm back to this:
"If the type is an array, string, or cluster containing an array or string, the function assumes that each instance of that data type contains size information."
Have you found any way to read in your 1D array of bytes without having to stick this header into the file? Some of the files I deal with will be too large to ever hold in memory all at once, and it would really help if I could just seek to an appropriate place in the data and read from there, without having to insert this header into the file.
Thanks,
Ken