LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

pare binary file with different datatypes

Hi,

 

I'm trying to parse binary files that are about 60kB but I'm having trouble figuring out how to do it efficiently. Each binary file has a metadata file associated with it. The metadata file is organized something like this:

 

Field ID, Field Name, DataType, Offset

0, Count, INT32, 0

1, Energy, FLOAT, 4

2, Mode, UINT8, 8

3, Wavelength, INT32, 9

etc...

 

All the fields together make up a chunk, and these chunks are repeated (usually 2500 times) for the length of the binary file. Because the data fields are organized differently for every binary file, I first have to parse the metadata file to know how the binary file is organized. Hope this makes sense so far...

 

First Method.jpg shows how I first tried to parse the binary file, but each file took over 15 seconds to parse (too long for my application). Essentially, I parse field by field, and depending on the datatype, I format the data accordingly.  I'm pretty convinced there's a faster way to do this based on Second Method.jpg

 

Second Method.jpg shows a much faster way to parse the binary files (takes only a few hundred msec), but it requires chunks to always have the same datatype. I parse based on a typedef, but typedefs can't be declared dynamically during runtime.

 

Are there any suggestions on a more efficient way to parse these files?

 

Happy Halloween!

 

 

 

 

Download All
0 Kudos
Message 1 of 9
(3,029 Views)

What is the type of the data you eventually want out of this function? In one situation, it's a 2D array of strings; in the other it's an array of clusters. Your first attempt is slow for two reasons: you are building an array in a for loop using build array, and you are doing a lot of conversion from strings to numbers. In the second method, LabVIEW can pre-allocate the entire array ahead of time and there's no processing of the data coming from the file so it's very fast, just reading the file into memory, but you also don't have an array of formatted strings at the end.

 

You could speed up your first attempt somewhat by autoindexing the string array in the outer loop, instead of using build array.

 

If you can provide more information about what sort of data is in the file - or upload your code, instead of a screenshot - it might be easier to help suggest ways to improve it.

0 Kudos
Message 2 of 9
(3,020 Views)

If you don't have to worry about trying to read strings or arrays (data types that are variable in length), you could precalculate how many bytes are in a chunk.  Then you read off that many bytes with a single read.  You could then use Unflatten From String to get your data types out.  Because I really like the Producer/Consumer, you could make a producer loop read that data in chunks and then a consumer loop to process the chunk data.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 3 of 9
(3,013 Views)

Wow, it's been a long time since I've had any time to work on this project...

 

As requested, I'm uploading the meta data file which has a header at the top "Field ID, Field Name, DataType, Offset." The possible DataTypes are

0:     INT32

1:     UINT8

2:     UINT16

3:     UINT32

4:     FLOAT(SGL)

 

I'm also attaching an example binary file that I'm trying to parse. The binary file is composed of chunks w/ the data structure outlined in the meta data file. I want to find an efficient way to parse the binary file, as I'm going to have to parse hundreds at once and don't want the user to have to wait. The files have already been streamed to me before I get to them. So the files are complete when inputted into the parser function I'm trying to create.

 

The output of the function should be an array (usually of size 2500) of clusters.

 

Are there suggestions on how to do this efficiently? An answer with code will get mucho kudos!!

Download All
0 Kudos
Message 4 of 9
(2,944 Views)

This is the kind of thing I would normally be slapping into a database, especially since it appears that you have a fixed set of fields that you choose from.  Short of that, and without know what you want to do with the data other than read it, I would probably next go with a variant.

 

SimpleFileParsing.png

 

Your fields end up as variant attributes which you can then look up by name.  Still some work to decide how you want the data, you could promote all of them to DBL (makes things bigger), you could use strings (bigger and some ambiguity with floats). 

 

SimpleFileParsing.png

Message 5 of 9
(2,933 Views)

I'm going to try your solution by recreating the block diagram. Is there anyway you can upload that code snippet as a .vi?

0 Kudos
Message 6 of 9
(2,864 Views)

He already posted it as a snippet.  Save the block diagram image to your computer.  Then drag the file onto a new block diagram.  Instant code.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
Message 7 of 9
(2,860 Views)

Ok, that works pretty well for creating the variant, but how can use this variant to read the rest of the file? I want to end up with usable data for all 2500 chunks.

0 Kudos
Message 8 of 9
(2,828 Views)

Can you be more specific than "usable data"? You wrote in an earlier post that "The output of the function should be an array (usually of size 2500) of clusters" but as you're already aware, that's not an option because the cluster type would need to be defined at edit-time. So, what do you plan to do with this data once you've parsed the file?

 

One possibility is to adapt Darin's solution to produce an array of variants. Each individual variant provides easy access to a specific field by name, using Get Variant Attribute, and you would loop through every variant in the array to retrieve a column.

 

Another option is to parse the metadata into an array of clusters, where the cluster has elements matching the fields in the metadata file. You can easily sort or search that array. If you need to retrieve only specific elements, don't parse the data file until that specific element is requested, at which point you do a lookup on the metadata and jump to the appropriate offset in the file (or, read the entire file into memory as an array of bytes, and select the correct array subset), then typecast (or unflatten from string) to the correct element type.

 

Or, you could create a 2D array of 32-bit values, and store all the data into it (so every data type gets stored in an I32, regardless of length) but then you have to typecast the SGLs when requested.

 

Explaining what you want to do with the data will make it easier to provide suggestions specific to your needs.

0 Kudos
Message 9 of 9
(2,817 Views)