Actor Framework Documents

cancel
Showing results for 
Search instead for 
Did you mean: 

Plan For Serialization System For LabVIEW Objects.pdf

Current version: 0.4

This is a personal project, a side project, but one that I hope takes the same arc from my machine through the community and into vi.lib proper that the Actor Framework has done. Having said that, this is completely a side venture for now.

LabVIEW needs a better mechanism for converting objects to and from strings. This need includes both binary formats and human readable formats like XML. Improving object serialization has been planned since LVOOP’s inception, but never implemented. Many people think of the Flatten and Unflatten primitives as covering this task – particularly the binary format. The problem is that those primitives are only appropriate for intra-LabVIEW serialization. In other words, as long as LabVIEW is both the reader and the writer of the data, those primitives work well, but not so well when you must communicate with the outside world. Even within LabVIEW there are support gaps created by complex refactorings and client and server code that revs at different intervals.

The attached document has been through multiple revisions already. It is refined enough that I think a wider audience can comment on it meaningfully -- when an idea is poorly formed, the raw brainstorm just spins, and when it is too finalized, there's too much resistance to changes. At this point,I think it has enough meat to be reasonably chewed on.

Please let me know what you think. No deadline, but know that I am inching forward on this design and at some point, I'll start spinning code and see where it leads me.

Comments
kegghead
Member
Member
on

Thanks for taking this effort, I believe serialization desperately needs to be addressed if OOP support in LabVIEW is to mature beyond what we have today. This a dense document so I'll likely have more comments on the weekend once I get a chance to work with the RFC, but for now I agree with and like most of what you have said. Except...

You mention streams in the open issues. I really think this has to be included. Having to serialize an entire object in one go simply isn't an option in some cases. If the object I'm serializing has a 500 MB footprint, do you think I'll be able to generate a string to reliably serialize that info to a string 32-bit land? I'd much rather serialize it in smaller chunks so my memory doesn't balloon out. This isn't a fringe use case for me, it's a regular occurrence in the data analysis applications I write which can be tasked to manage dataspaces orders of magnitude larger.

For that matter, XML DOM is great, but in the same vein as streams, I'd really like to see support for an extensible SAX-style Serializer class. Sometimes the DOM overhead is too great, or flat out impossible to work with.

With regards to custom edits to the Serializable overrides:

3.Make any custom edits to the four VIs as needed to handle special cases. Examples include:
a.an array that instead of writing down the whole array will be compressed by leaving out the zero values
b.a picture string field that is going to be written down as a pixmap cluster.

Note that if she re-runs the Magic Serialization Scripting Tool, all these custom edits will be retained. Magic!

I am extremely skeptical about this, but I'll hold judgement. It seems fragile and I don't know if I'd trust it. I realize you're being facetious with the use of "Magic", but it really does seem like magic and magic isn't real. However I realize you're trying to accomplish this without changing the code in LabVIEW.exe, and getting any extensible solution in here likely isn't really possible in the short term, so turning to magical scripting land will be a requirement.

Remember that the “with names” is going to write fields and the names of those fields and is order independent and does not need to write down the fields if a default value will serve.

Urgh. Add me to the list of people hating the default value "situation". However you later go on about this option be overridable, so perhaps you've finally seen the light? C'mon AQ, the cool-aid tastes great.

Finally, I need more time to digest your discussion on paths in relation to dynamic loading, I really think the ability to resolve the path at run-time is a requirement (I'm unclear if you mean it to be or not). I'm thinking a factory here which given a class QName (or UID, or some unambiguous identifier) resolves where to find the class or returns some sort of error for being an unknown type. I'm thinking modules which are dynamically loaded and might register their types at run-time where each module's classes might not be in the central repository.

-kegg (mje@lava)

AristosQueue (NI)
NI Employee (retired)
on

kegghead wrote:

It seems fragile and I don't know if I'd trust it. I realize you're being facetious with the use of "Magic", but it really does seem like magic and magic isn't real.


                   

Like you, I was hesitant about how far such a tool could go when I first wrote that. Now, I am actually becoming pretty confident in it. I've been drawing on paper what diagrams would look like for various use cases, and I'm pretty sure that we can do a very efficient job autogenerating these VIs and preserving custom mods. I don't have anything extracted from my notebooks to show just yet -- still a few months out likely -- but I really am starting to really think this is fully viable.

kegghead wrote:

Urgh. Add me to the list of people hating the default value "situation". However you later go on about this option be overridable, so perhaps you've finally seen the light? C'mon AQ, the cool-aid tastes great.


                   

I have included mechanism to make this work, and I do think it is valuable for some file formats. I also think there are many formats that want to conserve space and avoid writing down default values, and so both are included.

kegghead wrote:

I really think the ability to resolve the path at run-time is a requirement (I'm unclear if you mean it to be or not). I'm thinking a factory here which given a class QName (or UID, or some unambiguous identifier) resolves where to find the class or returns some sort of error for being an unknown type. I'm thinking modules which are dynamically loaded and might register their types at run-time where each module's classes might not be in the central repository.


                   

A factory is already a foregone conclusion -- Unflatten From String *is* a factory pattern today. This is the same but with one key new ability: Unflatten From String cannot load new classes into memory. It can only instantiate objects for classes that are already loaded.

I have tailored the dynamic loading fairly narrowly -- only a single absolute path for the entire hierarchy -- which does, as you note in your comment, perclude one pluggable class containing another pluggable class where the installation paths on disk are distinct locations. If you'd like to propose ways to open that up, go for it, with the caveat that I'd prefer to avoid putting any additional methods on the Serializable class in order to support it.

Underflow
Active Participant
Active Participant
on

I'm still reading this.  It's interesting, and I'll support anything that pulls LV into talking with the rest of the programming world!

Would different character sets (say, Unicode) be supported "internally" to the system, or would it be expected from the coder as part of the override of the Serialize class?

jzoller@lavag

AristosQueue (NI)
NI Employee (retired)
on

Underflow wrote:                       

Would different character sets (say, Unicode) be supported "internally" to the system, or would it be expected from the coder as part of the override of the Serialize class?

I believe that encoding would be coming from the Formatter class, not the Serializer. The Formatter specifies what a data element looks like as a string. The Serializer would decide how to output its own strings (say the text used for the XML tags).

AristosQueue (NI)
NI Employee (retired)
on

Got a new open issue:

The names of fields that are placed in the property bags are strings. A Serializable class could use any string for those names. Options:

1) Make rules for the names that can be used to restrict them to a fairly narrow range of characters (probably English alphanumeric) and return an error if any character outside that range is used when the Serializable adds to the Property Bag

OR

2) Allow any string to be used as the name and make the Serializer classes have to escape the name strings as necessary (i.e., if a particular serializer is using quote marks around the name, it is responsible for somehow escaping any quote mark embedded in the string). The serialization library might provide a utility VI for a common escaping patterns, but it would still be encumbant on the Serializer class to actually invoke that VI.

OR

3) <your idea here>

RMThebert
Member
Member
on

3) your idea here:  The Property Bagger sounds like a communications protocol to me.  The name field is an analog to the 29 bit CAN header which is followed by an array of data.  The Property Bagger might be able to use a similar construct involving flag bits and bytes in a fixed length name field.  The name field would specify the type and size of the data associated with it, including size will make the adoption of the protocol easier for other languages.  Interaction with other languages will happen automatically for a easy to use highly capable protocol.

kegghead
Member
Member
on

Regarding property names. I don't think the bagger should care. Restrictions on naming will creep up specific to given Serializer implementations, so it ought to fall on the implementation to sort out what to do with illegal names (that is option 2). You're never going to create a naming restriction that satisfies all possible formats.

DavidAMoore
Member Member
Member
on

Regarding waveform attributes and variants in general, I would like to see them supported but through decomposition into your fundamental types rather than hex encoding. Would that solve the LabVIEW versioning problem?

MGI coincidentally got into a discussion of this same general topic, because we were contemplating extending MGI Read/Write Anything to have an XML representation, and we want to eliminate hex representations in both .ini and XML formats wherever possible. RWA assumes that it's dealing with arbitrary LabVIEW data (similar caveats for refnums and DAQ strings but not complex numbers), so if an extended RWA were to encounter serializable LabVIEW classes in the data hierarchy it was processing, the idea was to request their contributions. I tend to start with data before code, so I worked up a BNF, and it was fairly straightforward to have my list of containers include clusters, arrays, classes, variants, waveforms, and dynamic data.

As I started looking at implementation, I read your plan, which seems well thought out. It would cover much of the use case for a RWA extension. If your plan were complete today, the updates I'd want would be:

1. Variant support by decomposition, as I said initially.

2. Treat waveforms, dynamic data, and complex numbers as additional containers.

3. Addition of .ini serializing that was compatible with existing RWA data files.

4. An interface for non-object data at the top level. This would let replacements for the existing RWA VIs be written that would piggy-back off the other serializers.

David A. Moore, Ph.D.
President
Moore Good Ideas, Inc.
AristosQueue (NI)
NI Employee (retired)
on

DavidA.Moore wrote:

Regarding waveform attributes and variants in general, I would like to see them supported but through decomposition into your fundamental types rather than hex encoding. Would that solve the LabVIEW versioning problem?

That's not really an option. Either the framework supports "waveform" as a type and manages marshalling the data into the property bag or every individual Serializable object has to include code to read the attributes and add the parts of the waveform however it sees fit. That's really the only choice for *any* of the data types. Over on LAVA, there's a similar discussion about time stamp. Sure, we could leave time stamp out, because we have string and any Serializable could just add its time stamp data as a string property, but then every obect will be choosing its own formatting for the time stamps. It works, but it doesn't work in a desirable way. Same thing applies to waveforms.

We could say, "OK... variants are supported as long as the data within the variant is itself one of the other supported types" and the do type analysis on each variant that is passed in, and return an error if the variant is not one of the supported types. That's potentially viable... I can't state a particular downside, but it leaves a bad taste in my mouth -- sort of the instinctual "it feels like there's something wrong with that solution, but I can't put my finger on it." Am I just seeing ghosts?

DavidA.Moore wrote:

3. Addition of .ini serializing that was compatible with existing RWA data files.

This would be yours to build when you inherit from the framework. The whole point is to make a system that is extensible for any file formats that others need. Do you see any barriers to your being able to supply a Serializer class to support your file format?

DavidA.Moore wrote:

4. An interface for non-object data at the top level. This would let replacements for the existing RWA VIs be written that would piggy-back off the other serializers.

I'm kind of loathe to do this. Non-object data simply doesn't have the "liveliness" to support data changes over time. If you write a float one day and you need to write an integer the next, but still read old versions, where do you put the function to handle the data mutation? There's no class to support the data. My thought would be, "If you want to write just one integer, create a class that contains an integer and knows how to serialize and unserialize that integer, because in the future, it might not be just an integer."

Is that me being overly defensive? Is there a way you can suggest to add a top-level API without increasing the workload on developers of Serializers and Formatters?

DavidAMoore
Member Member
Member
on

AristosQueue wrote:

We could say, "OK... variants are supported as long as the data within the variant is itself one of the other supported types" and the do type analysis on each variant that is passed in, and return an error if the variant is not one of the supported types. That's potentially viable... I can't state a particular downside, but it leaves a bad taste in my mouth -- sort of the instinctual "it feels like there's something wrong with that solution, but I can't put my finger on it." Am I just seeing ghosts?

This is what my data structure would do, so it works on that end, but I did not examine performance implications so that's still a fair question.

AristosQueue wrote:


                       

DavidA.Moore wrote:

3. Addition of .ini serializing that was compatible with existing RWA data files.

This would be yours to build when you inherit from the framework. The whole point is to make a system that is extensible for any file formats that others need. Do you see any barriers to your being able to supply a Serializer class to support your file format?

I don't see barriers, and MGI would certainly publish one.

AristosQueue wrote:


                       

DavidA.Moore wrote:

4. An interface for non-object data at the top level. This would let replacements for the existing RWA VIs be written that would piggy-back off the other serializers.

I'm kind of loathe to do this. Non-object data simply doesn't have the "liveliness" to support data changes over time.

 

Is that me being overly defensive? Is there a way you can suggest to add a top-level API without increasing the workload on developers of Serializers and Formatters?


                   

I'm shocked to hear this . I was concerned you'd click the "Report Abuse" link on my message.

Theoretically, you're correct that object data is better, but in practice it would be better to support all data in one way if possible/practical so that tool developments can benefit everyone at once. I'll have to think about what the non-object API would be.

David A. Moore, Ph.D.
President
Moore Good Ideas, Inc.
DavidAMoore
Member Member
Member
on

Is the property bag effectively flat? I'd want to structure effectively a property tree, even if implemented using the arrays of each type you describe. "Bag" suggests a flat collection where it's up to the objects to use a naming convention to differentiate has-a properties from is-a properties.

David A. Moore, Ph.D.
President
Moore Good Ideas, Inc.
AristosQueue (NI)
NI Employee (retired)
on

It is a bag of properties, but among those properties are objects and they each become their own bag of properties when processed, so although we never build a "bag within a bag", it is effectively a tree structure under the proposed usage.

AristosQueue (NI)
NI Employee (retired)
on

I HATE ARRAYS.

There. I needed to get that off my chest. *deep breath*

The prototype is working. I have had to make substantial adjustments to the document posted above in order to accomodate the HELL-SPAWNED ARRAYS. Wow, yes, I have anger issues there. It's a really beautiful, fairly easy to use API right up until ARRAYS HAPPENED. DIE! DIE! DIE!

Now. Having said that, I have, by the grace of LabVIEW, MADE THEM BEND TO MY WILL. It is not the prettiest code. If you want a 2D or higher array, it's damn ugly code, but it appears to be an effective solution. I will DEFINITELY be looking for feedback on this part.

But it works. And the rest is looking really good. I hope I'll have a first draft prototype posted next week. As usual, no promises.

Daklu
Active Participant
Active Participant
on

[Replied on LAVA because I was having trouble posting images here.]

Ian_Phillips
Member
Member
on

Hi AristosQueue

Please can you provide an update on this project.

Cheers

Ian

AristosQueue (NI)
NI Employee (retired)
on

How incredibly timely. 🙂 I've been polishing all morning on draft version 0.2 (i.e. the one that actually works) so I can post it as a document later today. Let me stress -- it's a draft. Undocumented, error codes that make no sense, text icons, poorly chosen conpane layouts... but a clean VI API, minimized data copies, easily scriptable To Serial and From Serial methods and support for all the mutation/save for previous/default value optimization/etc that everyone asked for.

... and it bears only a passing resemblance to the white paper I posted originally. 🙂

I'll have it up later this afternoon. I still want to clean up a few rough edges so you have a hope in Hades of investigating it.

D*
Member
Member
on

Did the API get posted?  Whoops, found it : https://decibel.ni.com/content/docs/DOC-24015

Contributors