Accessing parts of large files (~2GB)

Azazel · ‎02-03-2007

Hello all,

I am using LV 8.2, winXP

I am streaming data to disk and storing information continuously in ~2GB (cpu limited). I then need to process the data I have stored. The information is a series of data sets that when processed will make images for a medical device. Since the data is saved continuously I would like to seperate the file to process frames individually. Information is stored in the header of the file such as, # of frames, frame width, frame height, etc. I can than use this information to pic out individual frames depending on their position in the large file. I have two questions. 1) When I am reading the header file with my current method I do not think I am closing something correctly as the vi is huge (~57 MB for my sample raw data) so I think I am saving all the data that I have loaded when I only need to see a couple of bytes in the header to determine the characteristics of the file (ie., # of frames, width). So what is the most effiecient way to read specific bytes in the file? I have attached .png of my vi's I have written to accomplish this. The read position vi reads each of my requirements individually (is their a better way?) and passes these values to my top level vi displaying the positions (Read Header vi). 2) Does anyone have a sample vi or words of wisdom for processing small sets of data from a huge file without crashing the computer? (ie. without keeping the entire 2GB file in memory while processing the data)

Thanks for your help,

Azazal

Azazel

Pentium 4, 3.6GHz, 2 GB Ram, Labview 8.5, Windows XP, PXI-5122, PCI-6259, PCI-6115

altenbach · ‎02-03-2007

I am not familiar with the tool you use to read the header. Who wrote it? How many bytes does it actually read? In any case, there does not seem to be a logical reason to read four positions in parallel, since file operations are serial by nature anyway. Can't you just string'em up along the same error cluster? If the first one fails, it is not reasonable to continue with the remaining operations instead of banging the head against the same wall four times in a row. 😉 You might aslw want to flatten this out with low level file I/O. I am guessing that each of the subVI calls opens and closes the file.

The way you are reading the position also seems a bit odd. First you possibly (?) set the offset correctly, but then you are reading a big chunk that only depends on the size of the entire file (4/14 or ~30% of the entire file), possibly running beyond the EOF, depending on the offset.. I don't quite see the relation between reading I32 and dividing by a data size of 14. What is the structure of the file?

In general, you should open the file once, then do all the reads, and only close when done with all operations.

How big is one dataset? How big is the header? I guess you can calcualte the size of each dataset from the header information. I assume all records are of the same size.

Why don't you attach a simplified version of your code along with a small datafile that contains a handful of frames so we can test? Thanks! :).

Message Edited by altenbach on 02-03-2007 08:40 AM

LabVIEW Champion.

tst · ‎02-03-2007

I haven't looked deeply into your post (I assume Altenbach can help you better there), but as for some possible resources which will help you:

NI has a tutorial called Managing Large Data Sets in LabVIEW. You can find it on this site.
I remember seeing something in the LAVA forums about having some sort of problem when working with very big files (I think it was that LV had a problem reading past a certain boundry). I suggest you search for that thread.
OpenG has the Large File I/O package which uses Windows API functions to allow you to process files larger than the 2GB limit. You might want to check them out. Note, these are only equivalent to file primitives and do not add any additional managing functionality.

___________________
Try to take over the world!

altenbach · ‎02-03-2007

Thanks tst. Since the OP mentioned LabVIEW 8.2, file size is probably not an issue (and since the files are under 2GB, there would not be a problem with pre 8.0).

Starting with LabVIEW 8.0, the plain LabVIEW file I/O functions can use files up to 9.2 exabytes (reference).

(Paraphrasing Bill Gates, "9.2 exabytes ought to be enough for anyone". :))

Here is the link for Managing Large Data Sets in LabVIEW, a useful resource in general. 🙂

LabVIEW Champion.

Azazel · ‎02-08-2007

Altenbach,

I have attached a small data set, my vi (worked off of Read Binary file example in shipping Vi's), and a jpg of the required info I need from the header. One thing to note is that since labview arrays start at position 0, the actual data from the header I require is in the Starty Byte -1 position. The correct data from the raw data (attached file) is Number of files = 1, width = 192, and depth = 512. The structure of the file is long integer, the size of the files will vary, but the header is always constant ending at (Start Byte -1) position of 512. The raw data starts in the 513 byte position. Thanks in advance.

Azazal

ps(the raw data is not in txt format as I needed to change the extension of the file as it would not let me attach the orinigal .frg extenion type)

Message Edited by Azazel on 02-08-2007 10:16 AM

Azazel

Pentium 4, 3.6GHz, 2 GB Ram, Labview 8.5, Windows XP, PXI-5122, PCI-6259, PCI-6115

altenbach · ‎02-08-2007

OK, here are some very preliminary comment:

Look at your "read header" For each call, you are opening the file, setting the offset, then read the entire rest of the file into an array (you actually try to read past the end of the file by offset), close the file and return that array to the caller.

Repeat five times!

In the following code each of the arrays contains nearly the entire datafile, staritng with the desired element. You now have five copies of the entire huge file (!!) in memory where all you want is 20 bytes. 🙂

Quickest fix for that: After setting the position, read only one I32 element by leaving the count unwired and replacing the output array by a number. (or you could simply wire a "1" to count and return an array with one element).

Still, there seems little reason to do all these individual calls. Just set the offset to "16" and count to "5" and you get an array with five elements (20 bytes), corresponding to the five desired values (# of images, depth, ..., # of volumes), all in one call... and nothing else.

Similarly, once you know the file position and dimension of a frame, read the exact amount of data.

Message Edited by altenbach on 02-08-2007 09:19 AM

LabVIEW Champion.

LabVIEW

Accessing parts of large files (~2GB)

Accessing parts of large files (~2GB)

Re: Accessing parts of large files (~2GB)

Re: Accessing parts of large files (~2GB)

Re: Accessing parts of large files (~2GB)

Re: Accessing parts of large files (~2GB)

Re: Accessing parts of large files (~2GB)