TDMS write buffer performance

Swimmer · ‎05-12-2010

I am writing log data to a TDMS file. We log one sample for each of 16 channels at the beginning and end of each step in a test profile and every minute in between.

The profile has 640 steps, most shorter than one minute. This results in approximately 400,000 writes to the TDMS file with only one data point per channel. As a result the file is very fragmented and takes a long time to open or defragment.

In this post Brad Turpin mentions a fix that works well but greatly diminished the TDMS write performance.

http://forums.ni.com/ni/board/message?board.id=170&message.id=403179&query.id=7209265#M403179

I also found that it takes about 40 seconds to set the NI_MinimumBufferSize attribute on 10,240 channels. (640 groups * 16 channels)

I did test this and it works very well but it took hours to generate a file of dummy data using this method. Generating the dummy data file with the same number of writes but not using the buffer size attribute took seconds.

In this post Brad also mentioned that LV 2009 contains TDMS VIs with an updated option to create a single binary header for the entire file.

I have not been able to find any more references to this nor have I found the attribute to set this functionality.

Does anybody know how to set this attribute or have any suggestions on how to better deal with my file structure?

Thanks,

Dave

Herbert Engels · ‎05-12-2010

Are you writing one value per channel for all 16 channels with a single call to TDMS Write, or are you calling TDMS Write 16 times for that? Consolidating this into one call to TDMS Write should improve performamce and fragmentation quite a bit. In addition to that, you could set NI_MinimumBufferSize to the maximum number of values a channel in a step can have. After each step you could call TDMS Flush in order for these buffers to be flushed to disk. That should further reduce fragmentation.

The feature Brad mentioned is used in 2009 automatically, you don't need to enable it. Unfortunately, that won't do much for your application, because you create new channels for every step, which results in a new binary header for every step. The 2009 improvements would only kick in if you would use the same channels all the way through.

We are currently working on some improvements to the TDMS API that will help making your use case a lot more efficient. These are not yet available to customers though, so I'll describe some ways of working around the issue. It's not going to be pretty, but it'll work.

1) The TDM Streaming API is built for high performance, but even more than that, it is built so whatever you do with it, it will always create a valid TDMS file. These safety measures come at a cost, especially if you have a large number of channels and/or properties versus a rather small number of values per channel. In order to better address use cases like that for the time being, we have published a set of VIs a while ago, which will write TDMS files based on LabVIEW File I/O functions. This API does a lot less data processing in the background than the built-in API and therefore is a lot more efficient tackling use cases with thousands of channels.

2) Another way of improving performance during your test steps is to push some of the tasks that cost a lot of performance out into a post processing step. You can merge multiple TDMS files by concatenating them on a file system level. A possible workaround would be to write one TDMS file per step or maybe one every 10 or 100 steps and merge them after the test is done. An example VI for concatenating TDMS files can be found here.

Hope that helps,

Herbert

Message Edited by Herbert Engels on 05-12-2010 11:33 AM

Swimmer · ‎05-12-2010

"Are you writing one value per channel for all 16 channels with a single call to TDMS Write, or are you calling TDMS Write 16 times for that?"

I am writing 16 channels with a single call, I can see where 16 individually would cause even more issues.

I'll give the buffersize / TDMS flush a try, that might get me what I need for performance as I'm obviously not streaming large amounts of data to disc.

Would it be a bad idea to write the attributes for each channel in a group just before writing data to that group instead of setting all of the channel attributes before starting the test? That might make the attribute set up seamless to the user.

I had noticed the G TDMS vi's but didn't see how they were going to help from the information I found on them. The Caveats and Recommendations still encouraged writing larger arrays instead of single samples to avoid fragmentation. Am I missing something here?

Herbert Engels · ‎05-12-2010

It doesn't really matter at what point you write the properties to the file. The only thing to watch out for ist that you set NI_MinimumBufferSize on each channel before you write the first data values to that channel. Other, purely descriptive properties can be set at any point between opening and closing the file.

The remark about larger arrays is a general recommendation, but since you're not writing large arrays, the best you could achieve is writing all values for an entire step in a single operation (the buffersize/flush method is going to do exactly that).

What slows your application down the most is that the TDM Streaming API keeps a complete list of all groups and channels for an open file in memory. Every time you create a new group or channel, that list will be searched in order to find out whether the group or channel already exists. If it does, we apply optimizations and sanity checks. If it does not, you'll still suffer the performance impact, but you might not get anything out of it except maybe for some sanity checking. We use lookup methods that perform very well, but with small amounts of data written to 10,000+ channels, you will see a significant impact on writing performance.

That's where the VI based API comes in. It doesn't do any of the checks or optimizations and therefore doesn't need to do any lookups, either. For your application, the absence of the lookups is going to make a huge difference, whereas the absence of checks and optimizations won't matter. If you use the VIs from that API the same way you're currently using the TDM Streaming functions (they match pretty much 1:1), you should not see any repercussions for your application.

Herbert

Swimmer · ‎05-13-2010

OK, ran this using the G_TDMS vi's and it does take care of the performance hit but the file is still horribly defragmented.

It created a 478MB file. After defragmentation it was 59MB.

Defrag took 15 minutes 34 seconds.

I have the buffer size set to 1000 and I'm only saving 696 samples.

It seems to me the file should only have one write per channel, why is it so defragmented?

Would saving 16 groups (each channel is a group) and thousands of channels (each channel is a step) be a better idea?

Swimmer · ‎05-13-2010

Sorry I didn't think that through well enough.

It will have more writes to the file this way as I would have to write each single point individually.

LabVIEW

TDMS write buffer performance

TDMS write buffer performance

Re: TDMS write buffer performance

Re: TDMS write buffer performance

Re: TDMS write buffer performance

Re: TDMS write buffer performance

Re: TDMS write buffer performance