Example Code

Read and Write array of Cluster to TDMS - Get Set Cluster and Enum Properties using VIM

Products and Environment

This section reflects the products and operating system used to create the example.

To download NI software, including the products shown below, visit ni.com/downloads.

    Software

  • LabVIEW

Code and Documents

Attachment

Description

Overview

This example program shows how to write any array of clusters to a TDMS channel, and then shows how to read it back.  With the advent of VIMs, the ability to have an input or output change based on an input makes this task much easier.  Also include is a set of VIMs for setting and getting Enum, and Cluster data types as properties. 

 

Read/Write Cluster Description

Attached is a 2015 version and 2018 version.  The 2018 version uses VIMs and is a much cleaner way of using the type adaption functionality.  In both examples the array of cluster data is flattened into the data, and the type.  The type is written as a TDMS property while the data is written as the channel data.  Reading it back will read the type, along with the data and then a variant to data can convert it back into the array of clusters that was first written.  This solution is much for flexible since unlike other solutions, doesn't rely on unbundling and bundling data based on the specific type of cluster you are reading or writing.  To update the data type just update the type def. 

Main Write Read Example_BD.png

 

Known Issues and Limitations

Because of how data and properties are stored, there is a restriction on the Write function, in that you should not try to write a different cluster to the same TDMS Group Channel Pair.  Similarly the Compression option used for a Group and Channel pair should not be changed. 

 

Compatibility/Compression

Starting in version 3, the amount of padding needed in the data is greatly reduced making file sizes smaller.  And version 5 adds better compatibility with platform portability.  This is when a file is made on one platform, then copied to another.  The Read function has the ability to read the older version, and read the newer version automatically.  The Write function only writes Version 5, but Version 3 and later comes with three different modes to write.  The first is with no compression, the second is compressing each cluster as it is written, and the third compresses the array of elements being written.  When using the last mode a second TDMS channel will be used to keep track of the number of elements written with each array.

 

Version Mutation

Due to how meta data is stored in the TDMS file there are restrictions on appending data to existing files.  If a file for instance was made and written with Version 3, and then Version 5 was used and more data was appended to the existing group and channel, then the old data wouldn't be able to be read properly.  For this reason there is a check before writing data, that there doesn't exist old data in an old format.  If there is the write is aborted, and an error is generated.  Included with Version 5 is a Mutate Previous Cluster TDMS Version To Newest.vi.  This VI can read versions 2 through 4, and then convert it to Version 5, and re-write it.  Developers are only going to need this if they intend on using older files, with the newer version and they want to append data to it.  Reading old data works just fine without any extra work.

 

LabVIEW Version Compatibility

Because this tool uses the LabVIEW primitives for flattening and unflattening strings, it is restricted by the features of those tools.  One such restriction is that clusters that are written in a newer version of LabVIEW, might not be able to be opened in older versions of LabVIEW.  An attempt to read it may return error 122 (0x7A).

 

Hardware and Software Requirements

File name has the LabVIEW version required. 

OpenG Zip Library

 

Read/Write Cluster Steps to Implement or Execute Code

Unzip the attached zip, and open and run Main Write Read Example.vi and follow the instructions on the front panel which mention running the VI, changing the cluster, and running it again.  Each run will write to a temporary TDMS file, and then read from it.

 

Get/Set Enum Cluster Property Description

VIMs for Reading/Writing Enums is also included.  The Enums will be set as strings, while the Clusters will have their data and type flattened to strings and written to two separate properties.  This allows for reading the cluster data as a Variant if the data type of the cluster is unknown.

photo.jpg

 

History 

Version 2 Added Compression, and reduced padding needed.

Version 3 Bug fix to address padding type

Version 5 Platform portability was added, allowing for files to be copied between targets and have the data read properly

Other Resources

This source is part of the Tremendous TDMS toolkit posted over on VIPM.IO.  That package contains dependency information to other packages, as well as has several useful features not part of this source.

A discussion is over on LAVAG.

 

 

 

Example code from the Example Code Exchange in the NI Community is licensed with the MIT license.

Comments
brentjustice
Member
Member
on

Writing flattened string data to TDMS as a tab delimited U8 array is a neat trick.  I just now needed to do this, and was discovering that TDMS will reject non-standard characters.  Thanks Hooovahh!

And very cool VIMs

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Is that how I did this?  When you say it out loud it does sound inefficient.  Instead of using the %d then formatting it into a flatten string with tabs, I think a better way would be to flatten it to %02x.  This would represent the U8 as hex values.  But since each U8 decimal will always be two ASCII bytes, the tab wouldn't be needed as a delimiter.  Anyway glad you like it, I use this often but one thing I recently found is that the Excel TDMS add-on version 18.0 doesn't like large amounts of values in a property and often crashes. 17.0 is fine and other viewers are fine.

brentjustice
Member
Member
on

Hooovahh, I do have a question for you:

is there any reason that you chose to use the "Flatten/unflatten string to variant" functions?  Perhaps I'm missing something, but it seems that it would be more advantageous to use the "flatten/unflatten to string" functions since these functions do not require the "Type" input.

 

I created the following VI:

Flatten_debate.png

 

As expected, string 2 is not equal to the other flattened strings.  This is explained by the following article.  (Something I just learned.)  As such, it does indeed seem better to use the variant to flattened string function:

https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000019RPPSA2

 

With respect to unflattening the data though, it seems that I can use either the "unflatten string to variant" or the "unflatten from string" functions.  Both seem to work just fine.  However, the unflatten to variant solution requires that I store the "Type" property within the tdms file properly - which isn't terribly much work, but it is an extra step.  Is there a downside to simply using the "unflatten from string" function?

brentjustice
Member
Member
on

Follow up to your initial response (apologies for double-posting)

I made the following VI:

flatten2.png

I tested your "better way" by flattening to %02x.  At first glance, this seems to work like a charm!  This eliminates the need for a delimiter.  Great suggestion.

 

This VI also demonstrates how I think that I can get by without having to write to the "Type" property in the tdms file.  Is it potentially a bad idea for me to be using the "unflatten from string" function here?  It seems to work well for me.

 

Lastly, thanks for linking your CAR topic.  In my use-case here, I don't need to write any tdms properties... only string data.  (Assuming that I don't update the "Type" property.)  So, it sounds like I'm safe from the issue that you were seeing.

 

Thoughts?  Thanks Hooovahh!

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Okay so in making this I discovered two things, one of which you likely have already discovered.  When flattening to a string, it needs to be a valid real string.  By that I mean it can't be made up of \00 or any other nonprintable characters.  For this reason whenever the type or data is being flattened, it needs to become something that contains only printable characters.  The TDMS write will truncate a string at the first null and so the data won't be written properly.  This is why in my case the type, and data, are both turned into an array of bytes, which then becomes ASCII string.  If the flatten, or unflatten to string allows for creating a string that only contains printable characters (which I think it does) then that solution could be used too.

 

The second thing I wanted to do, is allow for storing the type of the data in the file.  Ideally all you need to store is the Data as you've discovered, and Type isn't required in most situations.  This is because the Type is provided by the cluster input on the read function.  However lets think of a scenario where we write some software, that stores a cluster in a TDMS file.  Then years later we update that cluster in our software.  Now if we try to perform a read with the updated cluster, on the older file format we will get an error as expected.  However if we only are storing the Data, and not the Type, then even LabVIEW can't understand the data format in the file.  If you store the Type, and Data in the file together, then you can at least pull the data out of the file as a Variant.  Then looking at the Variant you can display the data, or recreate the cluster needed later.

 

Basically this allows the data to be recovered if the type is unknown, which your logging type won't allow since it relies on the type to be provided, and requires it to be correct.  That's why I store the Type as a property of the TDMS file once, and then the data is written into the channel.

brentjustice
Member
Member
on

Got it!  I definitely see the merit in storing the "Type" now that you have provided the data mutation example.

 

Everything here makes good sense now.  I've gone with the %02x string format solution and it's working pretty well.  This will always yield printable characters, so I should be safe here with respect to writing to tdms.

 

Thanks Hooovahh!

brentjustice
Member
Member
on

I recognize that this thread is 3 years old now, but I just revisited this topic and made a silly discovery.  The array to/from spreadsheet primitives don't accept "empty string constant" as a valid delimiter, and will override to using a tab delimiter.  I didn't catch this 3 years ago.  (And, actually, the documentation for these primitives don't mention this behavior, which is lame.)  The code all still works fine, but my string-based tdms files have been getting unnecessary tab delimiters, bloating the files by... 33% or so I would assume.

I've adjusted things with the following code:

to hexto hexfrom hexfrom hex

brentjustice
Member
Member
on

I just now realized that I updated this thread exactly 3 years later to the day.  winning

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Hey no worries added context, and bug discovery is great, even if it is for one person years later.  Revisiting this topic to come up with a solution that pads the data in the smallest amount would probably be a great idea.  I've tried coming up with other solutions that would make this work better, but one issue I've come up with is if I change the format, then the files I already have won't be able to be opened up, without some kind of mutation preservation code.  Maybe I think I can probably come up with some dummy property on a channel that would tell it if it is reading on the old format or a newer one.  I think I could maybe add just escape codes to the unprintable characters, but then there is a performance hit for reading and writing.  It really is a balance that should probably be up to other developers since there isn't a clear best solution for the read.  The write it can probably be just a look up table regardless of the format.

 

EDIT: Actually now that I'm testing this, only the \00 null character doesn't work, this might simplify things.

brentjustice
Member
Member
on

Oh, interesting.

Per the "TDMS Write Function" documentation:

Alphanumeric strings that do not contain null characters

 

I heavily assumed that this meant only [A-Za-z0-9]

If you're right about the null character being the only disallowed character, then that could open the door for just escaping that single character.

 

I ran into the same issue as you.  I have many many files using the old tab delimited hex data format.

I solved this mutation issue in exactly the same manner that you suggested:

  • Add "Version" property name to TDMS file.  Datatype  = I32
  • If "Version" property is not found, assume Version = 1 (which would be the delimited hex data format files)
  • New file format type uses "Version = 2"
  • I added a subvi for performing version mutations
brentjustice
Member
Member
on

oh wow, you might actually be right.  This snippet here returns TRUE and NO ERROR... soo it seems like the null character is the only disallowed character:

tdms string.png

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Yeah thanks, I'm working on some improvements, hopefully I'll post it soon.  I've also added a couple of compression options which will help even more with file size.

brentjustice
Member
Member
on

Ughh, it doesn't look like this forum allows for me to upload files, or else I'd zip this up and upload it.  Maybe we should move this to LAVA?

Anyways, here's a bunch of snippets.

 

After confirming that the null character appears to be the only disallowed character, I wrote some escape/un-escape code.

Disclaimer!: I am not an escape artist.  (heh.)  So if I'm doing something silly, let me know.

 

But, basically, here was my strategy:

During escape, replace

\ --> \\

\00 --> \0

 

During un-escape, replace everything back to the correct characters.

Un-escape was harder code since I had to walk though the string linearly and keep track of offset.  A dumb replace-all would not work for un-escape.

 

To the best of my knowledge and testing, it looks like this code here should work correctly.  (And, hopefully, be decently performant.)

 

Thoughts?  Thanks!

 

String TDMS.pngEscape tdms.pngEn-Escape tdms.png

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Yes everything you are suggesting is basically what I developed yesterday.  I did replace the hex 0x00 with the ascii "\\00" adding 3 bytes of padding.  Still it is only for when there is a 0x00 byte which isn't all that often.  I won't be using the search and replace method because I believe it will be less efficient, but I'll do some speed tests to see what method is best for the pad and unpad functions.  So far the fastest pad method is to convert the array of bytes to a string, then using a for loop find all 0x00s, then using a second loop use the Replace Substring adding the padding. 

 

On top of that I have a compression method using zlib inflate and deflate.  This works by finding patterns in the bits and replacing them with place holders.

 

But that got me thinking.  If I am writing an array of a cluster, then there are probably lots of patterns in the data for the data types that haven't changed, and compression would work even better.  So for that reason my next release will have 3 modes.  No compression (just add 0x00 byte padding as needed) compression on each scalar, and compression on each array that is written.  This will require another channel to track how many elements are in each array write.  This information is a bit redundant because this information can be found in the written array data, but if I want to read the array subset starting at element 100 for instance, it would require reading all the data up to that point and uncompress each write.

brentjustice
Member
Member
on

You had my curiosity, but now you have my attention.

Looking forward to your code

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Okay you likely already got pinged, but I updated the code to have a version 2 working the say I described.  I tested it in several different ways, but please be sure and let me know if you find any bugs.  The older no compression mode hasn't changed much from the first version and I have lots of confidence in it.  The second compression method had lots of goofy math to get the reading of subsets of data to work and while I believe it is right, I could have made a mistake.

brentjustice
Member
Member
on

Wait, I'm not sure that I'm completely following your null character escape strategy.

What happens if the raw data string literally has "\\00" in it?

It seems like this will get converted into a null character on the un-escape method

brentjustice
Member
Member
on

example.  This code fails using your escape logic.

Thoughts?  Am I misunderstanding something?

My previously attached snippets handle this situation correctly

 

hooovahh escape.png

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Are you suggesting that the escape strategy is fine, but the characters I used aren't ideal?  I ask because you example also has a similar flaw.  What if the literal raw string contained 0x5C 30?  Then with your strategy it will also replace it with 0x00 when you go to unescape it.  If there is a literal string that contains my escape data of 0x5C 5C 30 30, then it will also contain your escape data of 0x5C 30.  I figured the likely hood of the actual data accidentally containing the 4 bytes I look for in that order would be quite low using random data.  However it is more likely to come up since I used printable characters.

 

If that is the case then what 4 characters should never, or rarely be ever seen next to each other?  Probably stuff in the Extended codes greater than 128?  Or a series of random unprintables at less than 32?

 

Edit: Oh I see you do two replaces, give me some time to try stuff out, I think your method would be better but does have a larger performance hit.

brentjustice
Member
Member
on

yep!  Unless I missed a bug, my example should work for all string values.

I agree that your example is likely to work 99.99999% of the time, but I'd still feel weird about implementing a non bullet-proof solution.

You are probably correctly about the performance hit.

I'd be curious if my solution could be made more performant by converting the string to a byte array and then performing inline array operations.

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Okay version 3 posted which uses your escape code method. I have several speed tests, and unit test VIs but I didn't include them in the zip just to reduce all the ransom VIs and optional inputs.

brentjustice
Member
Member
on

This needs to be a u8 I think.

brentjustice_0-1630515382552.png

 

Looks great!!  The optimization with a byte array vs my stringified method is cool

 

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Oh yeah good call.  I think there is still room for more improvement, but it is much better than it was.  I also rolled these changes into the VIPM package posted over here, but it takes some time before the new version is approved.

brentjustice
Member
Member
on

I did some performance testing for my own purposes.

I generated random strings with lengths from 0 to 10000.

brentjustice_0-1631292215920.png

 

Where:

stringified escape:

brentjustice_1-1631292252266.png

byte array escape:

brentjustice_2-1631292270973.png

stringified un-escape:

brentjustice_3-1631292290757.png

byte array un-escape:

brentjustice_4-1631292305159.png

 

cheers

 

 

 

 

 

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Thanks this mostly confirms what I had seen.  I have some test VIs for performance but didn't include them in the last post.  I wasn't as through as you, and I just went with what looked the best without spending too much time on it.

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Ugh, okay new issue.  While the TDMS file can read and write 255 bytes just fine (with null needing special attention), when moving the files between targets, other OSs handle the extended ascii differently.  So that means, for compatibility reasons, I may need to have special cases for 0x00, and for 0x80 through 0xFF.  I have a pretty crappy but seemingly working padding routine.

 

Byte Array to String No Null_BD.png

 

String No Null to Byte Array_BD.png

brentjustice
Member
Member
on

oops, sorry, just now seeing this.

"other OSs handle the extended ascii differently"
What OSs are we discussing here?  LV-RT?  Mac?

And how do these OSs handle extended ascii differently in a way that affects TDMS tools?  I would find that to be rather shocking.

 

Good find.

My own use-case is entirely Windows-based, so I should be okay without the extended padding-set, but this is kinda nuts

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Well I made a post here, but forum support from NI hasn't been great lately.  I would make a TDMS file on the Windows side, and write to a channel as a string.  Then I would FTP the file over to a Linux RT target.  Then when I would read those TDMS string channels back, the data wouldn't match the data as it was written on the Windows side.  If you viewed the string data as a string it would replace characters with "?" for many in the extended ascii range.  I was just using the string data type as a place holder for an array of bytes, but the TDMS API appears to be manipulating it for displaying the string  I just bumped the thread again but I expect NI to just say it is working as intended, and in that case more creative escaping is needed.

brentjustice
Member
Member
on

Thanks for the link to your post.  I'll see if I can maybe poke some NI folks, I'd love to get some better resolution on this.

brentjustice
Member
Member
on

@hooovahh,

I've posted a few new comments on that other post that you linked.

This turned into a high priority task for me, so I dug a bit deeper and pushed out a padding strategy that should be a tad more optimized, if only by a little.

 

What a silly issue

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Okay version 5 is now posted with the help from brentjustice.  Because I started using version 4 internally, I made several files with version 4 numbering, but never posted it publicly.  So now I posted a new version 5 that is able to read all these older versions, but only writes version 5.  I added some mutation code that I hope you don't have to use but I will be so I thought I'd share it anyway.

brentjustice
Member
Member
on

@hooovahh,

Just fyi, I recently managed to get approval to open-source some code that utilizes the string escape strategies discussed here.

 

In fact, there is a standalone VIPM library that I created just for string escape.  I don't expect this to be useful to you, but I wanted to ping this thread with a link to this library for completeness:

https://www.vipm.io/package/blue_origin_lib_bluestringescape_opensource/

Hooovahh
Proven Zealot Proven Zealot
Proven Zealot
on

Geez you've been busy. You're making the rest of us look bad.

Contributors