data storage format

Neil.Pate · ‎06-18-2007

Hi all,

I am sure what I am about to ask is a fairly common question, but I can't seem to find a sensible answer on the forum, so here goes: (apologies for the long post, I am trying to be thorough)

What is the best format to store data in?

Of course there are dozens of answers, and in reality it depends on the application. So without further ado, here is some additional information:

The data is logged from a DAQ device, at rates up to 10 kHz, several channels, and for a few hours. So the data files can potentially grow up to several hundred megs big (typically if the data acquisition period is more than a few minutes then the sampling rate will be limited to 1 kHz).

At the moment, I am collecting data and writing it, on the fly, as text to a file. As mentioned, the files can get up to a few hundred megs big. This file is then visualised in MATLAB, and can take several minutedson my rusty Pentium IV to display, but MATLAB manages to do the necessary file reading/memory allocation/graphing without too much problem.
I wish to remove the MATLAB requirement, so I decided to create a viewer application in LabVIEW.

I know that text format is quite inefficient, and so I moved over to a binary format in the flavour of TDM (not TDMS, I am using LV 7.11) and this is where the trouble starts. Firstly, doing on the fly writes to a TDM file caused my PC memory usage to climb unbounded! This was solved using the TDM header VIs which someone on the forum suggested. The next problem comes when trying to view the data. I have a separate viewer application, which is just a simple read from TDM file and plot to graph. Wow! The performance is horrible!!! As TDM does not, as far as I can tell, allow only portions of the data to be viewed the whole application basically hangs as windows tries to allocate several gigabytes of memory to process my 200 MB data file!!! So thats a dead end there...

I have read about hierachical waveform storage (HWS), and also know that streaming TDM (TDMS)exists in newer versions of LV.

Can anyone provide any insight into which of these formats would be more suitable for my application. I quite like the TDM format, it seems to provide most of what I require except the performance, so I am hoping TDMS will solve most of these. I don't know that much about HWS. Another reason I want to use TDM(s) is that it integrates well with Diadem.

Any advice??

Thanks to anyone that took the time to read this far, more thanks to anyove who replies!

mikeporter · ‎06-18-2007

First let me say that if yo really do have a "...rusty Pentium IV..." the first thing you might want to do is invest in a good dehumidifier!

But to your main point, the issue that you mention in relation to TDM files is a known problem. One alternative would be to save the file in binary form rather than ASCII. The conversion to and from this format isn't hard and as of V7.1 I believe there were still example VIs that illustrated streaming DAQ data to a binary file.

Another thing to consider though is wheter or not you really need all that data. For example, if you know that in your application changes of less than 1% are not significant, you can process the data before saving it to add timestamps to the datapoints and delete any values that vary from the previous starting point by less than 1%. To see how this would work, let's say that the input voltage at the start of the test is 100 V (to keep the math simple). As long as the voltage did not go higher than 101 V or below 99 V you would not save another datapoint. When the signal does exceed that limit, a new timestamped datapoint would be saved, and a new set of limits would be calculated using this datapoint.

Depending upon your application, this approach can sometimes be very valuable because it allows you to sample very fast, but not have to save tons of essentially repetative data.

Mike...

Certified Professional Instructor
Certified LabVIEW Architect
LabVIEW Champion

"... after all, He's not a tame lion..."

For help with grief and grieving.

Neil.Pate · ‎06-18-2007

Thanks Mike,

Thanks for the tip, I have put the request into management for a dehumidifier...

I can quite successfully write to binary files, its just I am quite fond of the whole TDM approach where the DAQ info (sampling rate etc) is stored along with the data.

Your idea about only storing points that have a delta of a certain amount is really interesting, and is something worth considering, but for now I just want to write it all to disk and process afterwards. Thanks for keeping the maths simple, I knew my master's in engineering would come in handy some day

Do you have any experience using TDMS or HWS?

johnsold · ‎06-18-2007

To retain the simplicity and speed of the binary file while keeping the documentation of TDM you could write two files, perhaps with the same name and different extensions. One would be the binary data file. The other would be a text file with the documentation of the setup. Not quite as convenient, but perhaps a useful workaround until the dehumidifier comes through.

Lynn

Neil.Pate · ‎06-18-2007

I am not holding my breath for the dehumidifier!

My problem is not really in writing the data, I have done some tests and can stream several channels to disk at 100kHz with no trouble. The headache arises when trying to process the data later on. Reading large files using TDM causes my machine to bog down as I assume it is trying to allocate a buffer for all the data, but it seems to try and allocate a disproportionally large buffer!

I suppose if the data was just in binary format then I could read it in chunks and decimate it before attempting to graph it. I had just hoped, as this is a fairly common problem surely, that there is some sensible data format (and associated VIs) that do all the heavy lifting for me.

I have had a look at the Giga_LabVIEW library that NI provides, and it provides some routines for processing large data sets before plotting them, but there still does not appear to be any way to interface the to the TDM format.

Does TDMS solve this problem? i.e. the native storage VIs are smart enought to not try and read all the data in a file at once.

DFGray · ‎06-19-2007

HWS was designed to handle the problem you are facing, and it is available in LV7.1. You can find it on the driver CD with the computer based instruments (which it was designed to support). It's performance is essentially system limited (I have seen 15MBytes/sec streaming speed on a 650MHz Pentium III). HWS gives you the ability to read subsections of a file. The new TDMS format gives many of the same benefits, but is only available on newer versions of LabVIEW (8.2 and higher, if memory serves).

HWS is based on HDF5, so it should be readable by most analysis packages. I believe MATLAB has a reader. Please ask if you do this and need help, since the file layout is somewhat involved (it started as SCPI-DIF implemented in HDF5). HWS allows you to create files as large as your disk size. It offers lossless compression. HDF5 writers can add other info to the file without corrupting the HWS portions.

Use HWS. If you have problems, let us know.

Neil.Pate · ‎06-19-2007

Thanks for this,

I will try HWS, although after giving the problem some more consideration, I think I am not really going to make any headway. My goal of trying to produce a better data viewer than MATLAB was quite naive I suppose!

I have created a simple data viewer that can stream data at will from a binary file, however my desire to produce a single plot of 200 MB of data, in as little time as possible, is probably better left to MATLAB (which I suppose has ten of thousands of man hours of programming behind it).

Thanks
nrp

DFGray · ‎06-20-2007

Don't lose hope immediately. You can modify the GigaLabVIEW memory store and browse VI to do what you want fairly easily by replacing the in-memory data access with on-disk access. The functions should be almost one-to-one, so it should not take more that a couple of hours, mostly learning the HWS interface. You probably want to remove the overwrite capabilities, as well.

Remember also that the LabVIEW graph has a lot of zoom/pan capabilites, but some of these will be somewhat useless if you decimate your data.

But... you are also correct in that it is difficult to do better than someone who has had a lot of time to tweak their code. LabVIEW makes things easier, but it is not (quite) a miracle

.

LabVIEW

data storage format

data storage format

Re: data storage format

Re: data storage format

Re: data storage format

Re: data storage format

Re: data storage format

Re: data storage format

Re: data storage format