"TDMS Delete Data" extremely slow for large TDMS files

EthanJ · ‎01-30-2018

I was noticing my software getting hung for a really long time in certain situations but then eventually completing the operation it was performing, and when I probed down to find the source of the hang I found that it was on a "TDMS Delete Data" function I am using to "clean up" my data files by deleting groups/channels that did not have any data saved to them. For the particular operation I was testing, the data files were about 4GB in size, and the "TDMS Delete Data" function was taking over three minutes to complete. This is on a solid state drive.

I threw together a quick test to create dummy TDMS files of various sizes and then write and delete a property to/from the file, timing the delete operation. Note that in this test, I was creating a property in the TDMS files with the text "test" and only deleting that, so it does not seem to matter how much data is actually being deleted (just the overall size of the file) These were the results:

To the best of my knowledge, this function is relatively new (2015 and later) and is the only way in LabVIEW to delete data from a TDMS file. Is there any other way of doing this faster that I am unaware of? And if not, does NI perhaps have an explanation for why it takes so long? It seems odd that I would be able to write massive amounts of data to these files at such fast rates but then have it take several minutes to delete several bytes of data.

adena.l · ‎01-31-2018

Hi EthanJ - What kind of computer are you using? Particularly do you have a solid state drive (SSD) or a hard disk drive (HDD)? What is the read/write speed of your hard drive? Also, have you tried testing this trend with any other types of files other than TDMS?

Adena L.
Technical Support Engineer
National Instruments

MaxJoseph · ‎02-01-2018

Have you tried TDMS defragment? It should perform a similar action to what you are asking for. I am not quite sure what is meant by channels that do not contain data; if they do not contain data then they should not exist in the file. It is my understanding that all data put into a TDMS file has a header which includes things like group and channel, data length and type. You don't get headers etc without data.

Consider that during the delete operation, the whole TDMS_index file must be searched for matches and then the TDMS file recreated without the relevant data. For large file sizes this process will just take a long time! Can you avoid writing the data in the first place?

CLA - Kudos is how we show our appreciation for comments that helped us!

EthanJ · ‎02-01-2018

Thanks Adena,

The files are written to a Samsung EVO 850 SSD. The specs list 540 MB/s read and 520 MB/s write, and running winsat on the drive confirms these speeds:

The rest of the computer specs are quite up to standard and there shouldn't be anything acting as a bottleneck for deleting small amounts of data:

There are not any other file read/write operations that are performing at an unexpectedly slow rate. Only the "TDMS Delete Data" function is running slowly.

EthanJ · ‎02-01-2018

Thanks MaxJoseph,

My mistake, I should have explained in more detail what is happening.

When I run a routine in my software, the routine is made up of a number of sub-routines. I create one data file per routine, and the data file has one group for each sub-routine. Within the group is a collection of provenance data (time stamps, paramaters for the sub-routine, system configuration settings at that time, etc.) as well as a channel containing the actual data that is acquired during that sub-routine.

However, there is some user interaction that starts the data acquisition for each sub-routine. The provenance data for that sub-routine is written to the file before the user interaction. However, if the user cancels instead of starting the sub-routine, then I delete the provenance data for that sub-routine so that the file only contains provenance data for the routines that were actually acquired.

So, what I'm trying to do in my software is not what is accomplished with the "TDMS Defragment" function.

I know that there are a number of workarounds for this particular problem, but my main question is regarding the "TDMS Delete Data" function so that I can learn more about why it is so slow, what conditions cause it to be slow, and whether or not there are any other methods of achieving the same behavior. This is so that moving forward, I can have a sense of whether I should be ignoring this function altogether when designing performant software.

Ethan

MaxJoseph · ‎02-01-2018

I think that the problem is not so much that the TDMS delete function is slow so much as it is slow to perform TDMS delete operations on files that are the size/complexity of yours. I think a correct workaround would avoid the problem rather than 'fix' it per se.

It sounds like you aren't actually deleting that much data from the file relative to the total size of the file. If the channel was unused and does not contain acquired data, just some provenance data, then is it harmful to just leave it there? You could write in an additional Boolean flag of 'Used?' that is false iif the user hits cancel and later ignore the channel if it does not have true for this flag.

CLA - Kudos is how we show our appreciation for comments that helped us!

Hooovahh · ‎02-01-2018

It sounds like you are getting some of the help you need, but I'd also suggest you look into writing your data correctly in the first place so a delete isn't necessary. Don't get me wrong if this function can be better bugging NI about it is the right solution. But if you can minimize the amount of deleting (by not writing them in the first place) that would help your situation.

Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.

16 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord

EthanJ · ‎02-01-2018

Thanks all for the helpful suggestions for workarounds! I certainly have some paths for moving forward and this issue is not a blocker for me.

My main interest in posting about this is in highlighting how slow this function is and perhaps learning more about the causes for it being slow so that I know when I should be avoiding it in the future (and also perhaps help anyone else with similar issues who finds this thread with a Google search in the future).

It sounds like this function doesn't actually delete data from the TDMS file, as the name implies, but rather recreates the file excluding the data from the specified group and channel? This is not particularly clear in the documentation for the function, which may be why I assumed that the "under the hood" implementation for this function might be a bit more optimized. Then again, perhaps I don't have a good understanding of how data is typically deleted from files on disk in the first place, so this might just be the norm.

That being said, I'm still not totally certain that this is the only issue with this VI based on some additional profiling I did. I would assume that the following VI would be a slower (or, best-case scenario, the same speed) drop-in replacement for the built-in TDMS Delete Data Function, given that it is manually looping over all groups and channels of the TDMS file and creating a copy with one group and channel excluded:

If I profile this VI vs the built-in one, I get that it's actually in fact faster:

I would have assumed that my implementation is the worst-case scenario for what the built-in LabVIEW function could be doing "under the hood" but it seems that is not so. I've attached the code I used for the profiling if anyone's curious.

Anyways, I think the takeaway with this is that one should avoid using this function on files that are more than a few 100 MB in size if they can't have their software hanging for more than a few seconds. As a few posters have mentioned, the only way around this seems to be to design code such that you don't have to delete anything from TDMS files once they have been created.

LabVIEW

"TDMS Delete Data" extremely slow for large TDMS files

"TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files

Re: "TDMS Delete Data" extremely slow for large TDMS files