Dr. Damien's Development - Object Based File I/O III - Top Level Design

DFGray · ‎09-17-2010

Here is a first draft of a very top–level design for file I/O objects (Dia format, file attached). This reflects the advanced or expert level of user. It is designed to allow creation of intermediate and easy user levels. All the classes are abstract.

The top level object is File. It contains Locations, Errors, and Options objects. To open, a path and Options(File) object are passed to its Open method (not shown). Open may or may not create an error, which is added to its Error object. Once open it can be read or written. To read or write, first, a location is created. In the case of read, this will index the file to that location. In the case of write, this will index the file to that location and, if needed, create any parts of the location not yet available. Creating a location also requires a Data object. Each type of data requires another child of the abstract Data class. Reading and writing are methods on the child Data objects.

Let's take a concrete example — reading a string tag from an XML file. An Options(File) object is created with a ReadOnly attribute. This, plus the file path, is passed to the Open method of XML.File, a child of File. An XML.Location object is created for the tag, which will convert the file I/O location format to XPath, the native XML location format. An XML.DataScalar(String) object is created for the XML.Location object. After the location is created, the XML.Data(String) object is used to read the data, returning the string.

Similar use cases can be constructed for writing the file and for interacting with various other file types, such as PNG, LVM, or TDMS.

Options and Data exist purely to make it easier to construct intermediate and easy APIs. Various types of options and data can be arrayed as inputs, since they all share a common parent.

DataArray and DataScalar will have children for every data type desired. At minimum, this will include the common integer and floating point types, as well as strings. These encapsulate reading and writing methods for the file type.

All objects will be wrapped in a LabVIEW library whose name is that of the file format. This will automatically namespace the different functions, making creating new ones much easier by simply copying the old ones and changing the icon boilerplate (relatively easy with the new icon editor), then writing the file I/O code.

Let me know your thoughts. This will handle all the use cases I could come up with, but I am sure you can come up with more. Comments and suggestions are encouraged. Next time, I will start filling out the objects with data and methods.

Previous Posts In This Series:

Representative File Types with conversation on design
Requirements

F._Schubert · ‎09-19-2010

Very nice, especially to use uml.

Here some comments:

If I understand the Location class correctly, there is also (not shown) a reflective assoication for the location class:

/navigableLocations {union, readOnly} [*]

For the simple use-case of a line-based format:

nextLine {subsets navigableLocations} [0..1]

and

previousLine {subsets navigableLocations} [0..1]

Again, for the DataArray there is an association back to data:

/containedData {readOnly} [*]

for the individual data elements in the array.

Felix

Ben · ‎09-28-2010

Sorry, I have been distracted!

Starting with the first dumb question;

What do I use to open a "dia" file?

I did some searching but in this day of trojans hiding behind free downloads, I'm better off asking rahter than trying.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

rex1030 · ‎09-28-2010

http://live.gnome.org/Dia

He linked that site to the word "dia" in his post. It seems to be the program's website. It appears to be a visio type program. You can download it there. I haven't checked it out. (I scan everything I download before opening it)

---------------------------------
[will work for kudos]

F._Schubert · ‎09-28-2010

Dia is a nice lightwight editor. Very easy to use and supports a lot of different diagrams (electronics, uml, ...). Sadly, it's not supporting all the features of uml. But it's among my top SW tools.

To my post above. I realized that locations do not need to have a reflective association, but the operations to get a nextLine (or the like) could all be part of the file class.

But: what about a 'link' (like an html link, only considering that it can link to a place in the same document)? Is it a child of DataType? How is it resolved?

Felix

Ben · ‎09-28-2010

@DFGray wrote:

Here is a first draft of a very top–level design for file I/O objects (Dia format, file attached). This reflects the advanced or expert level of user. It is designed to allow creation of intermediate and easy user levels. All the classes are abstract.

...

DQ2:

The relationship line (jump on me if that term is wrong) between "File" and "Location" is causing me some confusion becuase I understand the "1" and the "*" as indicating that;

"There is a single instance of the class "Location" associated with a potentially unlimited number of instance of the class "File".

This seems backwards since I could be reading from one loaction in a file and writing to another in which case I would need two insances of the class "Location" assocaited with that file.

Am i just reading the diagram backwards?

Thanks!

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

F._Schubert · ‎09-28-2010

Ben, you're right. The asterixes * should be on the opposing ends.

I think the multiplicity on that side of file should be [0..1] (like the error object can be passed to an instance of an error handler class).

On termini: The 'relationship line' is called an Association. But this specializes the meta-class Relationship (and Classifier), so that term is not wrong. BTW: The Generalizations are also Relationships.

Felix

Ben · ‎09-28-2010

I'll interject comments/questions as a different color

@DFGray wrote:

Here is a first draft of a very top–level design for file I/O objects (Dia format, file attached). This reflects the advanced or expert level of user.

I understand that as meaning that an End User of this code will never have to see this stuff or know about it while still being able to use it assuming some understanding of how normal file I/O works.

It is designed to allow creation of intermediate and easy user levels. All the classes are abstract.

Meaning that all of the classes shown may never apear on a diagram.

Re: Abstract

The ideas name and implementation are sufficiently non-specific to allow future additions without causing confusion. This is one of my big challenges! Since I have specifi widgets in mind when I develop designs I end up with Class names that seemed good for the first versions but when expanding and adapting I end up with a Class that is much too specific.

e.g.

The class I ended up using to handle analog inputs channels configured for thermocouple readings was originally called "9211" since that was the widget I was playing at the time. Talk about a bad choice for a class name!

The top level object is File. It contains Locations, Errors, and Options objects. To open, a path and Options(File) object are passed to its Open method (not shown). Open may or may not create an error, which is added to its Error object. Once open it can be read or written. To read or write, first, a location is created. In the case of read, this will index the file to that location. In the case of write, this will index the file to that location and, if needed, create any parts of the location not yet available. Creating a location also requires a Data object. Each type of data requires another child of the abstract Data class. Reading and writing are methods on the child Data objects.

Let's take a concrete example — reading a string tag from an XML file. An Options(File) object is created with a ReadOnly attribute. This, plus the file path, is passed to the Open method of XML.File, a child of File. An XML.Location object is created for the tag, which will convert the file I/O location format to XPath, the native XML location format. An XML.DataScalar(String) object is created for the XML.Location object. After the location is created, the XML.Data(String) object is used to read the data, returning the string.

If I understand that syntax correctly, when we read the term "XML.File" we should understand that as meaning that there will be a Child of the class "File" that will be called XML.FIle that (of course) inherits the properties and methods in the class File but also has extra properties and methods to support XML.

...

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

dsb@NI · ‎09-28-2010

In addition to the data, data attributes are also stored uniquely for a given file format. Where do data attributes fit into the proposed model? My understanding is that Data Options describes methods and properties for reading data. Are data attributes also elements of Data Options, other forms of Data Values that are linked to data values, or a new/different box in the UML?

Doug
Enthusiast for LabVIEW, DAQmx, and Sound and Vibration

DFGray · ‎09-28-2010

Varie res...

Felix is correct, I put the multiplicities in backwards. Files can have multiple locations and multiple errors.

DataArray and DataScalar are siblings (and there are associated Options(DataArray) and Options(DataScalar) as well that are children of Options(Data)). I thought long and hard about making DataScalar an element of DataArray, but decided not to do that. In most use cases I could think of, the methods for accessing scalars and arrays end up being different. You can also optimize scalar access quite a bit compared to array access. This is not necessarily the "best" design, and closely mirrors the debate over whether a "square" object should be the sibling or child of a "rectangle" object, both being drawing objects.

I envisioned Link being a child of Location. Links can point to another location in the file, another file, on the current computer or over a network. In an internal discussion, we considered the fact that virtually the same API could be used for almost any sort of communication, be it network, DAQ, inter/intra-process etc., but that is a different conversation.

The abstract objects in the UML are used to make the programming easier, either through factory patterns or setting arrays of things like options on high level VIs. The low level interface delineated here is useful for creating interchangeable interfaces to files that allow easy plug-in of new file types. For example, you could have a VI which searches the disk for installed File I/O objects, populates a ring control, and allows the user to select a file type before writing. Adding another file type is as simple as copying the code to the correct location. So getting an image or a set of spreadsheet data will involve using the particular DataArray.Get function. I anticipate particular file types will have elements specific to them and them alone. For example, I have never seen the HDF5 concept of I/O model anywhere else, but it is so useful I would be sure to support it. To fully support some file types, you will need to know what they are and write differences into your code. For most operations, this should not be necessary. At the easy level, the file I/O will be pretty specific to the file. For example, a JPEG file object would have GetImage and GetEXIF functions at the top level. Under the hood, these would be calling DataArray.GetData and DataScalar.GetData functions on the correct locations in the file.

I am currently fleshing out the UML and working on code. There is a lot of it. If you would like interim posts, let me know.

LabVIEW