03-30-2022 03:40 AM
Hi,
I'm currently faced with a small design problem that I'd like your input on. For our application, a software used for testing audio equipment on production lines and R&D labs, I need to pick a certain file format to save our data to. I don't want to set a lot of limits on what data could look like, but for now it's a series of user metadata(e.g. operator, SN, PN, firmware version etc.) and a bunch of waveforms, xy traces, results coming out of our test executor. These all can be nested in various groups.
I'm trying to optimize 2 things.
First is data interchangeability within our customers' test setup, comprised of our software and others as well. This means an easy way of parsing our files, or getting the information out of our software and into theirs. This in principle means existing support for parsing these files in other languages. We don't have the resources to write those ourselves. The waveforms can go up to 10s of MB so not too large and at this point it seems like ease of data exchange is a higher priority than minimizing storage space usage. We also currently differentiate between waveforms stored in this binary format and other xy pairs that we might output from various analysis steps that will always be in the range of a few hundreds of points.
Secondly, we are getting requests to save to various tools like Tableau, Wats etc. so this file and the API I'll have to design to have our software write it will have to minimize the work required to import files directly into these tools.
A good API into our software could do the job but we need our customers to do as little programming as possible basically. We do already save to a database, and various file formats, txt, xlsx, wav and a binary one as well but they all have some problems associated with them. I'm deliberately not saying anything about what I'm leaning towards to not bias this thread from the very beginning.
Any thoughts on this are welcome!
Thank you,
Lucian Grec
03-30-2022 03:45 AM
03-30-2022 10:04 AM
I work with OP. NI TDM doesn't seem to have wide industry adoption. It seems like it's just DIAdem plus an Excel convertor. Some of our customers use WATS, some of them use Tableau, some of them make their own in house parsers with python. So ideally we'd want a fairly general file format that works with or could easily work with all of those.
We're posting here mostly in case someone knows of a widespread industry standard measurement data format that works with common analysis packages.
03-30-2022 11:12 AM
Have you considered HDF5? I know a colleague who settled on that after some similar consideration for a well-defined format with reasonably wide support. There's a VIPM package available.
-Kevin P
(Summary of my reasons in this post, part of a voluminous thread of mostly complaints starting here).
03-30-2022 11:52 AM - edited 03-30-2022 11:53 AM
@lucian.grec wrote:First is data interchangeability within our customers' test setup, comprised of our software and others as well.
What is "others"?
If these are third party programs (not NI based), do some research and see if there is a common standard file type that is understood by all.
Then study the full published file specification and implement it in LabVIEW. You have full control over every last byte in any file you write. 😄
03-30-2022 11:54 AM
I did consider HDF5 but I kind of excluded any binary formats at this point as it would make inspecting the data harder for the
non programmer/science types. Keep in mind we are not doing heavy streaming to disk nor do we generate TB large files so from my point of view the advantages of a file like HDF5 or TDMS or any other binary file don't outweigh the disadvantages for our users.
We're trying to move away from csv at the end of the day that's pretty easy to view/parse. I wouldn't want them to have to understand yet another complex format with it's own API. I want to improve hierarchical data support in the file which a csv doesn't lend itself to very well due to the tabular structure.
03-30-2022 12:21 PM - edited 03-30-2022 12:23 PM
That's a good question that I can't answer at this point. Very likely not NI.
I might've also been a bit loose with the nomenclature. I am not out to write my own worse than HDF5 file specification 😄 but rather to chose between various text file types, e.g. json, xml, csv and the like that are ubiquitous nowadays. I am thinking a JSON is easily extensible with other fields, can be nested as much as I please, it takes almost no effort for a python user to load it in a dictionary and we won't have worse performance(*file size) than a csv file since that was already ascii to start with. On top of that it should be very easy to read if prettified.
We could base64 encode our waveforms for a ~%30 percent hit in file size compared to binary, but that really doesn't seem to be that much of a concern.
03-30-2022 12:32 PM
@lucian.grec wrote:
That's a good question that I can't answer at this point. Very likely not NI.
I might've also been a bit loose with the nomenclature. I am not out to write my own worse than HDF5 file specification 😄 but rather to chose between various text file types, e.g. json, xml, csv and the like that are ubiquitous nowadays. I am thinking a JSON is easily extensible with other fields, can be nested as much as I please, it takes almost no effort for a python user to load it in a dictionary and we won't have worse performance(*file size) than a csv file since that was already ascii to start with. On top of that it should be very easy to read if prettified.
We could base64 encode our waveforms for a ~%30 percent hit in file size compared to binary, but that really doesn't seem to be that much of a concern.
I was going to suggest JSON. I don't like XML.
03-30-2022 12:46 PM
@Mark_Yedinak wrote:I was going to suggest JSON. I don't like XML.
I am also not a fan of XML. JSON, in my opinion, is easier to read. So if you really need a hierarchical, ASCII/Unicode format, JSON would be the route I would do with.
03-30-2022 12:57 PM - edited 03-30-2022 01:16 PM
I would still recommend TDMS the hierarchy is specifically designed for measurements and supports attributes and events.
JSON also has good support but suffers from the DOM standard and W3C won't / can't access it in a theradsafe manner.
SCOUT by SignalX is a reasonable TDMS Editor and viewer for rough eyeballing
npTDMS is well featured for your py gurus
The TDMS addon for Excel is out there everywhere
MATLAB support is around s
Some quick Googling suggests Excel, MATLAB and Python are each supported in Tableau. I let y'all dig in for more.
And of course, you can always use the Native LabVIEW functions to do the analysis in a better language environment 😉