LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Unexpected behaviour of Preallocated Read From Binary File

So I've been playing with some code that calculates a SHA256 hash on a file and for this purpose was reading the file in in chunks to balance disc accesses and CPU work calculating the hash. To try and get the best performance, I use the Preallocated Read from Binary File node, passing it a U8 array that I pre-initialise wiyj my chunk size.

What I've discovered is that if my chunk size is not aligned to the file size then I get rather unexpected behaviour on the  final read.

The node correctly returns the actual number of bytes left to read in the file, and correctly throws an Error 4 (which I can safely clear since I know this is my last read, but what it does that seems wrong is it clears the output buffer (i.e. retuirns a zero length array).

What I would expect is that the first n elements of the buffer are overwritten with the final n bytes of the file (since my buffer is a U8 array) and the remaining elements should be unchanged. I would then use the output from node that reports the elements read to process the the correct part of the output buffer.

The actual behanviour is workable around, but it seems inconsistent with what the non-preallocated buffer read binary file node does and unless I've missed the warning in big flashing letters in the help for the node, seems under-documented.

The attached (2019 SP1 is a minimalist example - I see similar behaviour in 2018SP1, I haven't been able to find the equivalent node in NXG.

--
Gavin Burnell
Condensed Matter Physics Group, University of Leeds, UK
http://www.stoner.leeds.ac.uk/
0 Kudos
Message 1 of 6
(2,146 Views)

What do you mean it returns a zero length array?

I built a little example to see how this function works (I never used it). I'm reading a binary file which contains 25 bytes with values from 0 to 24 in order in 10 bytes chunks. Indeed, the last read returns 5 expected bytes and the other 5 bytes have o value of 0 even if data array in does not contain any zeroes. The documentation says that those bytes are invalid data so you should not care about the values. You use num elements read output and split/reshape the output array to the correct size. Buffer allocation tool shows no copies made to the data in array so for me the function works as expected.

Preallocated Read from binary file.png

Lucian
CLA
0 Kudos
Message 2 of 6
(2,092 Views)

Ok, so I took your example file, I gave it a 52Mb file (an ascii dxf file as it happens, but I don't think that's important) and I asked it to read it in 10485760 byte blocks (10Mb). I asked it to run the for loop 6 times to make sure I read the whole of the file. I've also reversed the lengths, numbers of elements and errors arrays just to make it easier to see the last few entries.

This is what happens...

Front_panel.pngsnippet.png

--
Gavin Burnell
Condensed Matter Physics Group, University of Leeds, UK
http://www.stoner.leeds.ac.uk/
Download All
0 Kudos
Message 3 of 6
(2,088 Views)

Ok, a little more playing reveals that the 'magic' buffer size where this happens is 1Mb - 1048576 elements and it works as you would expect, 1048577 elements and the last array gets truncated to zero elements.

If you change the array element to U16 representation, then that magic length falls by a factor of 2 - so it looks like there is a 1Mb limit somewhere in that node that causes a difference in operation.

Definitely not documented!

--
Gavin Burnell
Condensed Matter Physics Group, University of Leeds, UK
http://www.stoner.leeds.ac.uk/
0 Kudos
Message 4 of 6
(2,085 Views)

Yes, I see that same behavior if chunk size is greater than 1048576 bytes.(LabVIEW 2018 SP1f4)

Lucian
CLA
0 Kudos
Message 5 of 6
(2,080 Views)

I was doing some performance benchmarks on file IO and found this issue too.  In my case I had a buffer size of 0x400000 bytes.  Each read was fine except the last which returned an empty array.  There are work arounds, like reading the file size at the start, and then on the last read either read the rest of the file with a normal read binary, or create a second preallocated array of that size and use it.  In all test cases I performed I could not get the preallocate read from binary to perform better than just the normal read from binary file when using the same chunk sizes.  I did also test by reducing my chunk size to 0x100000 and this also didn't perform as well.  So I'm not sure how this thing is supposed to be used but it always does worst.

 

I also tested it in LabVIEW 2020 Community and this issue still exists.  This is a bug, can we get a CAR?

0 Kudos
Message 6 of 6
(2,004 Views)