Reading a chunked TDMS file

SebastianBuettner · ‎09-29-2020

Hi!

I have some trouble reading TDMS files that have been written in chunks using the mechanism described here: https://www.ni.com/en-us/support/documentation/supplemental/07/tdms-file-format-internal-structure.h... with NI software (LabVIEW (2018), DIAdem (20.1.0f7908 SP 1) or dependent software like TDMS File Viewer/SuperViewer from DMC).

Issue Description

According to the document linked above (see section "raw data"), raw data can be appended to the most recent segment (omitting a new lead-in and metadata) if there are no changes to these informations):

If meta information between segments does not change, the lead in and meta information parts can be completely omitted and raw data can just be appended to the end of the file. Each following raw data chunk has the same binary layout, and the number of chunks can be calculated from the lead in and meta information by the following steps:

1. Calculate the raw data size of a channel. Each channel has a Data type, Array dimension and Number of values in meta information. Refer to the Meta Data section of this article for details. Each Data type is associated with a type size. You can get the raw data size of the channel by: type size of Data type × Array dimension × Number of values. If Total size in bytes is valid, then the raw data size of the channel is this value.

2. Calculate the raw data size of one chunk by accumulating the raw data size of all channels.

3. Calculate the raw data size of total chunks by: Next segment offset - Raw data offset. If the value of Next segment offset is -1, the raw data size of total chunks equals the file size minus the absolute beginning position of the raw data.

4. Calculate the number of chunks by: Raw data size of total chunks ÷ Raw data size of one chunk.

What I expected

I (think) I've done exactly that and processing my generated file using that algorithm results in the expected amount of data. However, LabVIEW and DiAdem and the third party software, all of them show very different results. The file consists of one group ('group1') containing four channels ('channel1' ... 'channel4'), each of which contains 100 values per chunk of data type uint32. The raw data is written interleaved.

The file consists of three segments.

The first one contains a new-object-list (and the object descriptions for all the objects) and one chunk of raw data.
The second segment only contains one chunk of raw data using the established objects and raw data index.
The third segment contains two chunks of raw data. This is reflected by a proper value for next segment offset (see below).

(Please find the file attached to this post)

My understanding of the algorithm quoted above is the following:

raw data size of each of the four channels is calculated by size of Data type × Array dimension × Number of values which is 4 x 1 x 100 = 400
raw data size of one chunk which is the sum all chunks raw data sizes: 4 x 400 = 1600
raw data size of total chunks is the next segment offset - raw data offset, which in case of the attached file is 3200 - 0 = 3200 (we have no metadata in that third segment, so raw data offset equals 0)
number of chunks simply is Raw data size of total chunks ÷ Raw data size of one chunk, which means 3200 / 1600 = 2

So the conclusion here would be that we have two chunks in that last segment plus the first two segments containing one chunk each which gives four chunks in total with 100 values each per channel which in turn means 400 values per channel in the entire file.

Reality (What NI and third-party Software gives me)

Some of the tested software behave very differently, I try my best to sum up the different results. Please feel free to ask clarifying questions!

If I am storing the data interleaved (attached test file):

LabVIEW thinks each channel only contains 300 values (NI_ChannelLength). The extracted data shows that the second chunk in the third segment is ignored. If I append another full segment to the file the same behavior can be observed: the second chunk of the third segment gets ignored and the chunk of raw data provided by the next full segment is read properly as if there was no data in between at all.
DIAdem throws up and gives many "unexpected EOF" errors, and gives the same channel length. Also the first chunk of data is read correctly. From there on results are very strange: All channels give no read outs (displayed as 0) beginning from the second segment/chunk, except for their first value of each chunk (values 101 and 201).
DMC SuperViewer shows the same results as LabVIEW which is not that much of a surprise considering it is build on top of LabVIEW.
npTDMS (a third party python lib for writing and reading TDMS) gives the expected channel length of 400 values and the proper values, regardless of whether stored interleaving or not.

If the data in each chunk is stored non-interleaving the applications (except for npTDMS which gives the expected results) I tested only show 300 values per channel, but are at least consistent about the read-out of those 300 values (just missing the "appended" chunk to segment 3).

Interestingly the description of the TDMS file internals linked above which I based my implementation on is what I get when I follow the link to the "support document" from within the LabVIEW help for the "TDMS Open Function" ("file format version") block. So this seems somehow official...

I also attached a log file (read.log) that is generated by my application (might help understanding my interpretation of the data probably, the buffer dumps can be ignored most likely as they are incomplete anyways).

Also just for the sake of completeness: I am not referring to https://forums.ni.com/t5/Example-Code/Read-TDMS-Channel-Data-in-Chunks-Using-LabVIEW/ta-p/3500008 , which is about chunked reading. These two things do not share the same meaning as the chunking I am talking about happens deliberately when writing files and fixes the chunk size and the chunked reading issue described and solved in the referred link is about reading variable sized chunks of data to reduce memory usage while reading.

So what am I missing here?

Any help is appreciated - Thank you!

Sebastian

GerdW · ‎10-04-2020

Hi Sebastian,

@SebastianBuettner wrote:

If I am storing the data interleaved (attached test file):

LabVIEW thinks each channel only contains 300 values (NI_ChannelLength). The extracted data shows that the second chunk in the third segment is ignored. If I append another full segment to the file the same behavior can be observed: the second chunk of the third segment gets ignored and the chunk of raw data provided by the next full segment is read properly as if there was no data in between at all.

DIAdem throws up and gives many "unexpected EOF" errors, and gives the same channel length. Also the first chunk of data is read correctly. From there on results are very strange: All channels give no read outs (displayed as 0) beginning from the second segment/chunk, except for their first value of each chunk (values 101 and 201).

DMC SuperViewer shows the same results as LabVIEW which is not that much of a surprise considering it is build on top of LabVIEW.

npTDMS (a third party python lib for writing and reading TDMS) gives the expected channel length of 400 values and the proper values, regardless of whether stored interleaving or not.

If the data in each chunk is stored non-interleaving the applications (except for npTDMS which gives the expected results) I tested only show 300 values per channel, but are at least consistent about the read-out of those 300 values (just missing the "appended" chunk to segment 3).

The TDMS file in your message contains 400 samples per channel, with largest value being 1599 (0x0000063f), as can be seen using a hex file viewer. The filesize also gives a hint as it is expected to have 4 channels × 400 samples × 4bytes/sample=6400 bytes atleast, plus the TDMS file header…

The DIADEM message gives a clue: did you write and close this TDMS file properly? Somehow there is an error in your file prohibiting LabVIEW/Diadem/DMC to read the full file correctly. npTDMS doesn't seem to care about some internal file format errors…

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

SebastianBuettner · ‎10-06-2020

Hi and thanks for your reply!

So to clarify: I was using my own implementation of TDMS to write that file. It is my intend to verify my implementation at this point, but unfortunately it seems that either DIAdem and LabVIEW do not support the "chunking" described in the link I mentioned above or my implementation contains an error which I did not spot yet.

I was hoping that someone was capable of confirming my interpretation of the "spec" (or alternatively point out my error) and either reproduce the error or validate my file somehow...

Either way I would end up with a clue what my error is...

Best regards,

Sebastian

johntrich1971 · ‎10-06-2020

@SebastianBuettner wrote:

Hi and thanks for your reply!

So to clarify: I was using my own implementation of TDMS to write that file. It is my intend to verify my implementation at this point, but unfortunately it seems that either DIAdem and LabVIEW do not support the "chunking" described in the link I mentioned above or my implementation contains an error which I did not spot yet.

I was hoping that someone was capable of confirming my interpretation of the "spec" (or alternatively point out my error) and either reproduce the error or validate my file somehow...

Either way I would end up with a clue what my error is...

Best regards,

Sebastian

Well, it is difficult to spot the error in the code that you did not attach.

SebastianBuettner · ‎10-06-2020

That's true.

This is an attempt to debug this issue step by step and I was hoping for either a confirmation or an error in my interpretation which I described (by example) in the initial post as a first step. If that assumption that I made there is wrong, there is no need to debate the code.

If it ends up being confirmed (or at least that people think the interpretation is correct), then I think it would be a good point to dig into the code. However the information alone that the interpretation is correct (or not) would be a helpful start.

Thanks and kind regards,

Sebastian

SebastianBuettner · ‎10-06-2020

Apparently I found the issue. I would guess it is actually a bug in NIs reference implementation or at least it is a discrepancy between the documentation of the format and NIs implementation.

In case someone finds this useful:

You can append data in chunks as mentioned in the document quoted in the initial post, but unfortunately not always. You can do so only if the segment you are appending to contains an object list. That object list might be empty (then with the kTocNewObjList-Flag unset obviously) but it has to be present (kTocMetaData must be set). Only then the NI implementation accepts multiple chunks within that segment (and thus appended chunks to that segment).

Appended chunks to segments that do not contain metadata will be ignored. So, for example if I take the same file from the initial post, but set the kTocMetaData for the last segment with an object count of 0, NI software will happily accept the appended chunk. For the sake of completeness: writing a new object list works as well or appending objects to an existing list. To be precise: it works as long as that segment contains a metadata object-list, even if that list is empty.

I attached the file for reference.

Thanks and kind regards,

Sebastian

LabVIEW