Read from XML hangs

Athaj · ‎09-30-2020

I am using LV 18.0 program to save reports, organized in a cluster with some subelements, into XML using Write to XML. I also want to load them back with LV. However, for larger reports the Read from XML fails (either string or array polymorphic variant) to load the data and ends up being looping indefinitely or hanged or frozen.

Is there any hidden or misspotted limit for the data size? Or am I ignoring something trivial?

Thanks in advance,

A.

PS: XML data are zipped az XML is not valid attachment in this landscape.

billko · ‎09-30-2020

Maybe it's chugging along and you have to wait for it to finish. If NI is using the same code they use to parse XML files in general, it's going to be sloooooowwww...

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

Athaj · ‎09-30-2020

Maybe ... the smaller file (~500KB) takes a fraction of a second, bud the larger one (~8MB) was not open even after 10 minutes.

Is it possible there is such a massive exponential increase? Not that waiting ten minutes for opening a text file would feel user-friendly.

ben64 · ‎09-30-2020

It really has to do with how an XML file is parsed. XML is a tree like structure, when parsed the XML parser has to go to each branch and come back to the next higher node and check if there is another sibling and so on. It's a very time and memory consuming process. XML is not recommended for big dataset and you currently have a good example why.

Ben64

wiebe@CARYA · ‎09-30-2020

Find out what is slow, Read from XML File.vi, or Unflatten from XML.

I think you can do without the first...

Some unsolicited advise... Note that Unflatten from XML is very inflexible (while ironically, XML is usually chosen for it's flexibility). Any change to the cluster and the unflatten will fail, even if it's just an irrelevant addition. So evolving your program will render old files useless. I'd use the parsing functions to manually go over the XML and put it in LabVIEW structures. Then you'll have to option to fall back on defaults if items are missing in the XML, and you can simply ignore XML items that are obsolete.

Search LabVIEW like a graph!

Athaj · ‎10-08-2020

Thank you for your ideas, especially the unsolicited ones and I would be keen to hear more of those, please see towards the end of my post.

I was aware that XML is not the best option for storing large datasets, but the fact that small thousands of entries is an unprocessable amount was quite surprising. Just for record, the loading time appears to rise somewhat exponentially, with 100 entries fine, 1000 entries took several seconds and 5000 failing.

It is the Read from XML being slow. It seems the workaround mentioned by wiebe@CARYA would might do, but the inflexibility discouraged me from the whole XML thing entirely.

So in wider context: I have a heterogenous data structure, that basically contain a single cluster (with occasional subcluster; contains some "settings") and an array of clusters ("data"). Originally, I was appealed with the easy "persistence", i.e. the ability to save and load the entire structure at once. So please, do you have any ideas how to achieve such a simple store/load approach, that (1) would not need transversing the structure entry by entry, (2) would cope with the eventual evolution/alterations in the structure (lets say adding items only) and (3) would optimally store the data in a single file?

EDIT: I do both the saving and loading within a single set of programs (that share the functionality); so far I am not concerned about the data archived so far.

wiebe@CARYA · ‎10-09-2020

I would look into a (mysql) database. DBs are optimized for 'random access'.

The trick DBs pull is to use indexing. The indices are loaded, and for access the index can be used to get a pointer to the data. This avoids parsing, and maybe even loading, the entire file.

You can try to make this yourself, if you don't like the dependency to a DB. I have a project where the files have several sections of data, and they can be anywhere from 1 MB up to 50 GB. Storing all the data is streaming, but access is fast as there's a pointer at a known location to a table with indices vs file pointers (64 bit, of course!). I never have to load the 50 GB in memory.

DIY can be "better" then a DB, but it's a lot of work if you're new to that kind of thing. One advantage is writing is faster if you can simply attach data to the file. A DB will have overhead, as it's a general solution, not specific.

As for the flexibility, XML, database fields, ini file keys, JSON, flattened data (bad idea), etc., it doesn't really matter. If you want to automatically 'flatten'\'unflatten', it's inflexible, the alternative is to make VIs that loop over items and get\set the data. Either by (un)flattening individual items, or with variant magic. There're probably VIs that do that for JSON, ini files and\or databases in libraries, but I don't know them all.

Search LabVIEW like a graph!

Yamaeda · ‎10-09-2020

If you look at the output Read XML produces you'll realize it causes lots of data copies which'll explain the exponential behaviour.

I'd just do a Read text file and Unflatten from XML (or what it's called), or if i only needed specific parts, look into those XML-nodes. I've managed to use them but found them a bit messy.

The new JSON functions are nifty and produce less text waste. 🙂

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

wiebe@CARYA · ‎10-09-2020

Unless I need fancy file validation or web integration, I'd pick JSON as well if a simple ini file doesn't work.

XML can get notoriously complex very fast... For most purposes ((flexible and\or hierarchical) data) JSON is a much more pleasant alternative.

I'd use DBs if there is a lot of data, or if it gets really complex. Other use it just because there's used to DBs. Once you are, the SQL learning cure disappears (I guess). That goes for all solutions though.

Search LabVIEW like a graph!

Athaj · ‎10-09-2020

And how would the JSON respond to eventual changes in the data structure? Is there a similiar rigidity like with the XML?

The single dataset represents a single measurement (a calibration course) I want to archive, reopen (possibly after years, on a different computer in a different network), postprocess (outside LV) or just sneak a peek to the text file what was the temperature during the measurement.

From my humble experience with RDBMS (decade+ with MySQL, Interbase/Firebird, PG) I do not see any benefit of using it in this scenario... not at all. Especially because I would still need to transverse the structure item by item to do the load/save plus the overhead of operating the DBMS alongside with LV.. IMHO I would get by with the INI or JSON probably with less hassle.

EDIT: Ok, to be honest: one significant advantage would be the enforced care for the data consistency in case of eventual alterations in the DB schema.

LabVIEW

Read from XML hangs

Read from XML hangs

Re: Read from XML hangs

Re: Read from XML hangs

Re: Read from XML hangs

Re: Read from XML hangs

Re: Read from XML hangs -> what else for convenient storage of heterogenous data structure

Re: Read from XML hangs -> what else for convenient storage of heterogenous data structure

Re: Read from XML hangs -> what else for convenient storage of heterogenous data structure

Re: Read from XML hangs -> what else for convenient storage of heterogenous data structure

Re: Read from XML hangs -> what else for convenient storage of heterogenous data structure