TDMS logging from multiple concurent tasks

Ben · ‎07-09-2010

@Herbert Engels wrote:

@Ben wrote:

Using multiple file Opens on a TDMS file does not operate like every other type of file I have seen where if the file is already open we get a pointer to the existing resource.

In TDMS a sepearte open results in a sepearte reference and any data written using one is not visable using the other.

What you're saying is that if you have 2 TDMS Open in the same application and you write to one of them, the data you wrote to that reference cannot be read from the other one? If so, are you referring to data values or to properties?

In this use case, it does happen that data you write to one reference is temporarily invisible to the other one. You wouldn't be loosing data though, because as soon as the first reference flushes that data to disk, it is visible to the second one. This rarely applies to data values, but it happens with properties a lot. If immediate visiblity for all open references is important to you, you can enforce that at any point in time by calling "TDMS Flush" on the reference you were writing to. Does that sound like it might fix this for you?

If you're referring to 2 references being opened in two different processes, things are a bit different. In that case, you would actually need to close and re-open the second reference in order for the reading application to pick up the most recent changes. This is certainly not pretty by any means, but it should solve the problem for now.

Did I get this right or is the problem you were bringing up different from what I described?

Thanks,

Herbert

Yes 2 references in seperate processes...

That behaviour of TDMS is not the same as other file type where they both point to the same (?) File Attribute Block

but there is a sepearte (?) Record Attribute Block to keep track of the read/write index.

Just pointing out what is special about TDMS not taht it si wrong only different and its an imporatant difference to know.

Ben

FAB and RAB are the logical names of the internal data structured that were used by VMS. I don't know what they are called in the Windows version.

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

CarstenPXI · ‎07-09-2010

You guys are fast!!!

Let me describe the my scenario in greater detail:

I have multiple paralell processes doing various types of signal analysis on the same WDT recorded in a TDMS file. This is stuff like FFT, Octave, and a bunch of other analysis. This part of the ap works great and is automatically distributed to multiple cores, just as advertised.

The outputs of these processes are then fed to separate groups and channel names to the same TDMS file.

Up front, when I initialize the whole ball of wax, I create a new TDMS file, and then create multiple references to it which I then use as handles to write the different results to that same file. See Enclosed Screen shot tdmsinit.jpg.

On that screen shot, I first open the TDMS file, write some overall project meta data, then shut it down, then create an array of 28 handles to the same file. In the middle of the diagram there are a bunch of parallel loops embedded in Case structures so they can selectively be activated. After the processing, I shut down all the Qs and TDMS Refs. (Here we did find a probably unrelated bug, indicated by the arrows, the first opening of one TDMS ref, as in the VI on the left of the screen, then opening the 28 refs, and shutting them down in reverse order, gave TDMS crashes (errors). When they were re-ordered, they went away. )

For each parallel analysis task I have a Producer/Consumer loop. Inside that loop, I do the analysis in the producer loop, stuff it on a queue, and eat it in the Consumer loop, (with ketchup and mayo), and write to the TDMS file, using one of the many references.

For a while I thought I had re-entrancy problems, and spent a fair amount of time playing with re-entrancy of the various routines. Each parallel TDMS Write VI lives in a re-entrant VI, and gets its reference from a functional global containing the array of Refs. But today, I noted that when I ran precisely the same operation on the same data set, the results seen by the LabVIEW TDMS data reader varied from run to run. And when I put a graphic probe on one of the write to TDMS VI's, the apparent file corruption went away.

Some corruption screen shots are shown. I am careful to not have the TDMS file viewer active while my main program is running.

In the start I thought it was important to have separate references for these parallel write operations, but I have also tried it with just a single reference, and it seems to make no difference. Both cases work, most of the time.

The file corruption appears as monster spikes in the data, typically in the range of 1E38 or so!!! These spikes are interleaved with data that appears to be sensible and differ from run to run on precisely the same data set.

Unfortunately, I am out of the office Monday, so I don't have more time now to give you more info. Also, since I am not a very good programmer, there is still a chance that it is my fault.

Finally, is the TDMS defrag bug fixed in LabVIEW 2010.

Again, thanks for your responsiveness.

Carsten

Herbert Engels · ‎07-09-2010

Parallelism

The TDMS API is not re-entrant at this point. Any running TDMS function will block any other TDMS function from executing in the same process at the same time, regardless of whether they are accessing the same file or separate files. Getting rid of this rather crude method of protecting our in-memory data structures has been on our to-do list for a pretty long time, but it wasn't until recently that we got customer requests confirming the urgency of getting this done, so it got deferred a few times. We are taking steps right now to first of all make sure we can have multiple threads writing to separate files at the same time without any sort of interaction between threads as far as TDMS goes. From there, we will lift [some of] the restrictions for accessing the same file from multiple threads. I can currently not comment on when this functionality will be available to customers.

Even assuming a fully re-entrant API, threads would have to take turns accessing a file on operating system level. Multiple threads writing to the same file have an inherent race condition between them. For the existing TDMS API, that shouldn't be a problem as long as you don't have multiple threads writing to the same groups and channels (which I understand you don't have). The worst thing that should happen is that objects appear in a different order, depending on how the race conditions turn out.

=> Your application should work the same if you have all TDMS functions operate on the same refnum. Sorry if our documentation is not clear about this, I take it that you put some effort into implementing parallel file access where unfortunately it is not supported on our side. I will double-check with our docs team to make sure we communicate this properly.

=> The parallel architecture itself of course is still advantageous in that the data processing routines do run in parallel, so you're probably still seeing a good load distribution for that.

File Corruption

Addressing the file corruption (and the error on TDMS Close) is a matter of us being able to reproduce it. I played around with a VI that I figure does something along the lines of what your application does, but so far, I haven't seen a corruption. Therefore, I have a quite a few questions that might help us narrow down the issue ... please feel free to skip some of them if answering them requires a lot of effort or if they are not applicable to your solution.

- Do you still see a corruption if you wire "FALSE" to the "disable buffering" input on TDMS Open?

- Could you try to delete the index file for a corrupt file, open the TDMS file again and see if it is still corrupt?

- How often does the corruption occur? Is it more like once in 10 executions or once in 100 or 1000?

- Does it always occur in the same area of the file (e.g.. in the same channels, for the same task)? Or does it occur in varying areas?

- Do you see any other variations in the files that are in line with the corruption? For example, do certain objects swap positions in files where you see the corruption? Or do certain channels contain fewer or more values than usual?

- Is it correct that the file has been written by another application, and the application we're talking about opens that file, sets a few properties, reads and processes the data in the file and finally appends the results to the end of the file? How do the processing functions retrieve the waveform data? Is it part of the global variable you're using?

- How big is the waveform you are processing? Do you need to read and process it piecemeal and then merge together the results or can you pass it around as a whole?

- What role do the parallel loops play, i.e.. what are they looping over?

- Do you use a producer- and a consumer-loop for each task or do you use a producer-loop for each task with a single consumer loop for all of them?

- Now that your app can handle 28 separate refnums for the processing tasks, what will happen if you have them write to separate files? Would you still see a corruption? Would you see it in the same area? Would it occur only after you have merged the files together?

Is there any way we could get a copy of your code so we could run it on our development machines (we have an FTP dropbox for that, so no need to post it to the forum)? Or do you have a smaller application that we could use to reproduce the problem? Any way we can get some of the corrupted files?

Thanks,

Herbert

Carsten Thomsen · ‎07-11-2010

Hi Herbert,

Thank you for your comprehensive reply and set of questions which I will answer in italics interleaved with your text.

Also kindly send me the ftp upload site (my email is cth@delta.dk). It may be a day or two before I can upload due to travel tomorrow.

The TDMS API is not re-entrant at this point. Any running TDMS function will block any other TDMS function from executing in the same process at the same time, regardless of whether they are accessing the same file or separate files. Getting rid of this rather crude method of protecting our in-memory data structures has been on our to-do list for a pretty long time, but it wasn't until recently that we got customer requests confirming the urgency of getting this done, so it got deferred a few times. We are taking steps right now to first of all make sure we can have multiple threads writing to separate files at the same time without any sort of interaction between threads as far as TDMS goes. From there, we will lift [some of] the restrictions for accessing the same file from multiple threads. I can currently not comment on when this functionality will be available to customers.

Even assuming a fully re-entrant API, threads would have to take turns accessing a file on operating system level. Multiple threads writing to the same file have an inherent race condition between them. For the existing TDMS API, that shouldn't be a problem as long as you don't have multiple threads writing to the same groups and channels (which I understand you don't have). The worst thing that should happen is that objects appear in a different order, depending on how the race conditions turn out.

Yes, we are reading and writing at the same time but to different files. The read file is one file with raw time domain data, the write file has multiple analysis results in it such as FFT. We have seen some out of order of the groups due to the race conditions you mention, but this has not been a problem.

=> Your application should work the same if you have all TDMS functions operate on the same refnum. Sorry if our documentation is not clear about this, I take it that you put some effort into implementing parallel file access where unfortunately it is not supported on our side. I will double-check with our docs team to make sure we communicate this properly.

This was just a dumb guess on my behalf, and creating mutliple refs is easy if you stick them in an array in a functional global and them index them out by a type def'ed name so that the Refs have meaningful names.. I later changed this to just use one ref, and that worked just as well and also had the same issues.

=> The parallel architecture itself of course is still advantageous in that the data processing routines do run in parallel, so you're probably still seeing a good load distribution for that.

I have thought about dropping it due to development time constraints. Originally I used a pure sequential write to the same TDMS file, and of course there were no issues. However for long analysis times (we sometimes process 24 hours of raw time domain files), the CPU use is not optimal since the 24 hours are often broken down into many shorter time slices which have different processing requirements, hence the totally parallel, asynchronous processing gives optimum CPU utilization if we can get it to work reliably.

File Corruption

Addressing the file corruption (and the error on TDMS Close) is a matter of us being able to reproduce it. I played around with a VI that I figure does something along the lines of what your application does, but so far, I haven't seen a corruption. Therefore, I have a quite a few questions that might help us narrow down the issue ... please feel free to skip some of them if answering them requires a lot of effort or if they are not applicable to your solution.

The code the I sent a screen shot of, I pulled out in a sandbox last week, and wasn't able to reproduce the issue when I didn't call the underlying parallel file access VI's. Perhaps some of the many parallel refs got lost in the woods or I did something stupid. I'll upload the entire ap after you give me the upload link.

- Do you still see a corruption if you wire "FALSE" to the "disable buffering" input on TDMS Open?

I believe it is always disabled, I don't have my code at home so I don't recall if it is explicitly wired, I thought it defaults to False.

- Could you try to delete the index file for a corrupt file, open the TDMS file again and see if it is still corrupt?

Will try on Tuesday when I get back from trip.

- How often does the corruption occur? Is it more like once in 10 executions or once in 100 or 1000?

Almost afways if one of my parallel processes is FFT. I may have some 40 different resutls in my TDMS file, and a handful of them may be corrupted in an unpredictable manner.

- Does it always occur in the same area of the file (e.g.. in the same channels, for the same task)? Or does it occur in varying areas?

In the application, I can analyze the same files again and again, and I don't necessarily get the same corruption places from run to run.

- Do you see any other variations in the files that are in line with the corruption?

I've seen that sometimes the corruption jumps into a third octave result. For example, do certain objects swap positions in files where you see the corruption?

I'll have to look at this.

Or do certain channels contain fewer or more values than usual?

Yes, when my FFT spectrum gets corrupted it contains twice as many values. I can see this in the TDMS properties automatically written in the files by the TDMS Write function. I have also seen cases where the first half looks OK and the second half is filled with garbage numbers.

- Is it correct that the file has been written by another application, and the application we're talking about opens that file, sets a few properties, reads and processes the data in the file and finally appends the results to the end of the file?

How do the processing functions retrieve the waveform data?

It is a fairly large program that works in several stages: A .wav file is imported, converted and saved to TDMS. Later on the user may choose to perform batch processing on one or more saved TDMS raw data files. Results from this are saved in one TDMS Result File.

Is it part of the global variable you're using? No.

- How big is the waveform you are processing? The total waveform can be seconds minutes or hours.

Do you need to read and process it piecemeal

It is read piecemeal, and for some analyses types, averaged over the entire length (for example FFT og octave), while other analyses perform decimation and filtering. Typical chunk sizes are limited to a max of 50kSamples, seems to give optimal performance (at least on a laptop).

and then merge together the results or can you pass it around as a whole?

Some results are first written to the TDMS are first at the end of the analysis, while others write for every chunk being analyzed and append the decimated waveform to the new TDMS file.

- What role do the parallel loops play, i.e.. what are they looping over?

- Do you use a producer- and a consumer-loop for each task or do you use a producer-loop for each task with a single consumer loop for all of them?

In my current scenario, I have 3 concurrent loops, each containing Producer (read TDMS file containting audio rate WDT and stuff to Queue) and Consume (eat WDT data from Queue (the queus are unique for each loop), analyze, and stuff results to the TDMS Result File. None of this processing is real time, i.e. while data is being acquired, but is done as a post-processing of previously recorded data.

- Now that your app can handle 28 separate refnums for the processing tasks, what will happen if you have them write to separate files? Would you still see a corruption? Would you see it in the same area? Would it occur only after you have merged the files together?

Haven't tried

Is there any way we could get a copy of your code so we could run it on our development machines (we have an FTP dropbox for that, so no need to post it to the forum)? Or do you have a smaller application that we could use to reproduce the problem? Any way we can get some of the corrupted files? Yes. Send me an email.

Thank you again for your responsiveness.

Carsten

Herbert Engels · ‎07-13-2010

Thank you so much for taking the time and writing up the answers to all of that. I'm looking forward to try out your code.

The fact that you're seeing twice the number of results on an FFT might be a hint at what is going wrong. Apparently, the index information that precedes (and describes) a binary data block in the file declares two occurrences of a binary block where there is only one. It could also be that the file is correct, but we're calculating the number of blocks the wrong way when reading. That would also be a good explanation for the peaks you're seeing, because the E+18ish floating point values are typically a sign of that we're interpreting something (in this case the header of the following data segment) as floating point values that was never meant to represent floating point values. With this information at hand we now need to further dig into whether this happens on writing or reading.

There are two suggestions out of my previous post that I think have a chance of fixing the problem for you (admittedly, not a huge chance). I'd suggest to first of all try to write the file with Window Buffering "enabled", which means you would need to wire "false" to "disable buffering" on "TDMS Open" in order to overwrite the default value "true" (which obviously disables buffering). The good news is that we do have parameters that are more self-explanatory than this.

The other suggestion is a bit more of a long shot. It has happened in the past that index files and tdms files get out of sync. If that is the case, it can easily be fixed by deleting the index file. LabVIEW will generate a new one the next time the file is opened.

I still owed you a response regarding the defrag CAR. We have indeed submitted a fix in 2010 that, to the best of our knowledge, eliminates the problem

you were seeing.

Thanks again for working with us to get all this fixed,

Herbert

CarstenPXI · ‎07-14-2010

Hello again,

As per your suggestions, we have tried to remove the index file and read again, and the file corruption was still there. We have also tried to set the bufffering to False, and this did not help any. Yesterday we also tried using a single reference for writing to the TDMS file instead using separate reference for each parallel loop. This seemed to rectify the problem, but our testing is still incomplete.

Thanks for your continued effort to help fix this. We'll let you know when testing is complete.

Carsten

Ben · ‎07-14-2010

@CarstenPXI wrote:

...

using a single reference for writing to the TDMS file instead using separate reference for each parallel loop. This seemed to rectify the problem, ...

Carsten

Thanks for the update.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

DanRichards · ‎03-21-2014

Carsten,

Based on what appears possible from this thread, I am attempting a very simple proof-of-concept that builds a single file from data in parallel loops at variable speeds. Are you stating that you are able to do that?

I get an error if I run the TDMS Advanced Open function on the same file path - your posts seem to suggest you are able to do that.

FYI, one main reason I switched to the Advanced pallette is because I don't want to write group/channel data every time I call the write function. Every performance enhancement helps as we are battling performance limits in some of our loops.

Regards,

Dan

Dan Richards
Certified LabVIEW Developer

HeroOfHyrule · ‎03-24-2014

It may be faster to start a new thread on the forums, it looks like the last post before yours was a few years ago. If you create a new post, it would help to get more of the community involved and probably help solve your issue faster.

Rob S
Applications Engineer
National Instruments

LabVIEW

TDMS logging from multiple concurent tasks

Re: TDMS logging from multiple concurent tasks

Re: TDMS logging from multiple concurent tasks

Re: TDMS logging from multiple concurrent tasks

Re: TDMS logging from multiple concurrent tasks

Re: TDMS logging from multiple concurrent tasks

Re: TDMS logging from multiple concurrent tasks

Re: TDMS logging from multiple concurrent tasks

Re: TDMS logging from multiple concurent tasks

Re: TDMS logging from multiple concurent tasks