I have seen a few posts regarding using the HDF5 for large data sets, etc. This is probably fairly simple, but I would appreciate a little help. I would like to write a 3D array to disk and be able to access either a 2d or 1d data set in any direction through the array. Typical data set size would be 255x64x100k of 16-bit integers. I don't need all of the fancy aspects supplied by the spf file stuff. I am sure with enough wading through I can get that to do what I need - but don't see an easy way.
I need options to work with file sizes in excess of 2G - which is the limit with LabVIEW writes whether binary or otherwise. I have made some progress using the information here: Can I Edit and Create Hierarchical Data Format (HDF5) files in LabVIEW?. My mission has changed slightly and I would prefer to write a 2D array one at a time into the file to create a 3D array. Similar idea - but solves my problem of limited computer memory. I have downloaded a couple of sample programs that work with 2D arrays - and I am trying to build on them.
To get around the fle size limit, is it acceptable for your application to write multiple files to distribute the data across files? You could keep track of the number of bytes you write, and when you realize that a subsequent write will put you over the limit, simply close the current file and create/open another file. You could programmatically name these files appropriately, and make the process of reading them easy as well.
Multiple files is one possibility I have considered. But this is data I will need to read multiple times for analysis - so speed of reading various data points out of the file is critical. For example I would like every 65000 integer from the one gigabyte file. This has been slow on my systems.
What you are describing is a rather trivial application of HDF5, but the code you need is not in the sfpFile set of VIs (which was designed to handle 1D waveforms). However, the sfpFile code can be easily modified to do what you want. First, however, a digression on how HDF5 handles data that will help you on your way. HDF5 is a self-describing file format. All files contain all the data you need to get the data out - such things as compression, byte order, data type, and array dimensions. When you create a data set in a file, you have to specify the array dimensions (in HDF5 parlance, this is part of the dataspace). You have a choice of whether to make the array expandable or not. If you make it expandable, you have to specify the chunk size you will grow the array by. Note that HDF5 handles multiple dimensions (512 is the limit, I think) as easily as one.
Go to the HDF5 website for documentation on how all this works. It is very low-level, but the sfpFile code will show you working examples of 1D and 2D waveforms. For example, open H5D Create-Write 1D DBL array.vi. The first call creates a data space that can be expanded with initial size equal to the data being written. The creation parameters list for the dataset is then created and populated with the chunk size and compression parameters in the next two VI calls. Next, the dataset itself is created. Finally, the dataset is written to disk. The references are then all closed.
One important thing to note. When writing data, two dataspaces are required - the memory dataspace and the disk dataspace. They do NOT need to be the same. You can write a 2D array in memory to a section of a 3D array on disk. The full dataspace specification includes not only the size of the array, but what portion of it you are reading or writing. That portion can be as simple as a continuous section or as complex as a bunch of random points, with many variants between. See the HDF5 documentation for details.
To modify this VI for a 3D data set, change the DU64 Dims input of H5Screate_simple.vi from a one element array to a three element array giving your initial 3D array size. The chunk size input of H5Pset_chunk.vi will also need to be changed to a three element array. I have found that best performance results when your chunk size is 65,000 bytes on any Windows system. Getting larger or smaller will result in slower performance, sometimes dramatically so. Your disk read/write should be essentially hardware limited (somewhere between about 10MBytes/sec to 25MByte/sec, depending on how new your PC is and how defragmented your drive).
When you write the data using a modification of H5Dwrite xxx.vi, you will need to create two dataspaces, one 2D one for your input data (wire to mem_type_id), and one 3D one for where in your disk data you are storing it (wire to file_space_id). See the HDF5 documentation and the sfpFile examples for details. sfpFile should contain all the HDF5 primitives you need. The VI filenames are usually exact copies of the HDF5 subroutine calls, with changes made in the case of different data types (e.g. HDF5 only has one H5Dwrite, LabVIEW has one for every data type supported, with more trivial to make). Note that sfpFile was written before the higher level API for HDF5, so it contains similar higher level functionality, but the subVIs do not correspond to the HDF5 higher level API subroutine calls.
Two more practical notes. Be very careful with HDF5 references. You must close them when you are finished with them or the HDF5 file will remain open, even after you exit LabVIEW. You can sledgehammer the problem away by calling H5close.vi, but this totally shuts down the HDF5 runtime engine, stopping any other users at the same time. Second, HDF5 is not multi-thread safe. Make sure you don't try to use it simultaneously from two locations in your code (very easy to do with LabVIEW). You will get errors and could corrupt your data.
Don't expect to learn HDF5 quickly. It is a very complex and low-level API. However, it can do just about anything you would want in a binary file API, so it is definitely worth the effort. Once you figure it out, you will wonder what you ever did without it. Good luck. Let me know if you have problems. The HDF5 helpdesk is also fairly responsive (within 24hrs) and highly informative if you hit a sticky spot.
This account is no longer active. Contact ShadesOfGray for current posts and information.
I have seen some of your posts on the HDF5 and have started down that path. I have started with the sfpFile set, but they add a bunch of functionality I don't need, such as adding alot of supporting information to the file along with the waveform. I will start with this as the basis as you suggest. I am currently struggling through what all the terms mean, i.e. file space versus memory space and Groups versus datasets, slabbing, etc. Once I figure this out - it should be exactly what I am looking for.
For others who are travling down this path - I have found the free HDFView program to be helpful as a way to look at the files I am creating and making sure they are being formed correctly. It also helps show the relationship between groups and datasets, etc. This program can be found by following a few links on the official HDF5 website referenced earlier.
Now one more question before I go full tilt ahead. My understanding is once I have created this 3D file by building up a bunch of 2D data sets. I should be able to easily and quickly grab an chunk of that data. I.e. a 2D array slice out of it. Or 1D array in any direction in the array. Is this correct?
Thanks for all the help. I will post future questions here and a vi if I am successful in creating one.
You are correct. Once you have your 3D data set on disk, you can easily access 1 point, or 1D, 2D, or 3D sections of it. The sfpFile function which sets this is H5Sselect_hyperslab.vi. It is part of the dataspace API. Hyperslab is the HDF5 term for a generic subset of a dataspace. Said subspace can be contiguous, regularly spaced, or randomly specified, all in multiple dimensions.
You are also correct about HDFView. My life would have been MUCH harder without it. Are you aware that you can create and edit files with it? This makes creating test files very easy.
I certainly feel your pain about all the nomenclature. It took me a week to feel like I had some idea of what was going on. After a month, I was pretty much there, but I still missed a couple of key points (e.g. HDF5 is a directed graph, not a tree, so the concept of parent to an object is fuzzy, at best, but you can have circular references as a result - very useful).
This account is no longer active. Contact ShadesOfGray for current posts and information.
Folks - I'm thrilled to see more HDF5 / Labview activity. We have been
using the HDF5 libraries quite actively for the last year in exactly
the manner that DF Gray suggests, ie, reading the HDF5 API and then
trying to translate that into the underlying sfpFile functions. At
times I have had to call functions in the DLL directly since there was
no wrapper provided in the sfpFile functions. This is a little hairy
since you don't always know exactly how the types will match up.
Anyway, one quick comment on large files: We stream terabytes to disk
and ended up using a feature of HDF5 called "file families" -- it's a
"driver" that HDF5 provides that splits your file into consecutively
named pieces of size 2^n bytes, where you can specify n (we use n=31).
Search for 'family' in the HDF5 docs. Note that file families are not
supported by the HDF5 viewer program that you mention.
And a question to the HDF5/LV community: has any of you managed to do
the hacking required to upgrade to HDF5 1.6.4 ? Unfortunately we all
seem to be stuck with 1.4.4. The NCSA people provided a DLL for 1.6.4,
but unfortunately now some of the constants, such as H5T_NATIVE_INT,
are not constant anymore -- they are generated at runtime. If it
weren't for that, I think most of the sfpFile wrappers would still
work. Anyway, just curious if anyone has tried this.
Please feel free to email me directly about this stuff. I would love to
create a smaller HDF5/Labview subgroup for sharing knowledge.