DVR or data copy? How to deal with large data in AOP projects

jlokanis · ‎08-27-2013

I'm working on a large project and have run into a quandary about how to best deal with a large shared data set.

I have a set of classes that define 3 data structures. One is for a script read from disk that the application executes. Another is the output data from the program, generated as the script is executed. The last is a summary of the current state of execution of the script. The first two can be significant in size and are unbounded.

In my application, I have one actor (with sub actors) responsible for reading the script from disk, executing it and collecting the data. It also updates the summary status data. Let's call this the control actor. I have another actor that takes the data and displays it to the user, allowing them to navigate through it while the script is being executed. Lets call this the UI actor. The last actor is responsible for communicating the summary to another application over the network. We will call this the comm actor.

So, the control actor is generating the data and the other two are consumers of the data. I had originally thought to have all the data stored as state data in the control actor and then as it is updated/created just message it to the other actors. But then they would essentially have to maintain a copy of the data to do their job. This seems inefficient. Then I thought I could wrap the data classes in DVRs and send the DVRs to the other classes. That way they could share the same data. The problem with that is they would not control when their data gets updated. And I am violating the philosophy of actors by creating what is essentially a back channel for data to be accessed outside of messages. Also, I could block the 'read' actors when I am writing to the DVR wrapped class. I would have to be careful when updating subsections to lock the DVR in an in-place structure to do the modify and write. Then comes the question of how to best alert the readers to changes made to the data by the writer. Simpy send them a message with the same DVRs in it? Or send a data-less message and have them look at the updated values and take appropriate action?

So, any best practices or thoughts on how to resolve this issue? I appreciate any feedback..

-John

(this is cross posted in LAVA as well: http://lavag.org/topic/17084-dvr-or-data-copy-how-to-deal-with-large-data-in-aop-projects/)

-John
------------------------
Certified LabVIEW Architect

AristosQueue (NI) · ‎08-27-2013

Not sure what to do with the script. But as far as the generated data -- can the control actor generate a block of data and then hand it off to the comm actor who then hands it off to the UI actor and the UI actor is the only one aggregating the whole data set?

jlokanis · ‎08-27-2013

That is a good suggestion. But I must admit I simplefied the project description for the purpose of this discussion. The final project will have the UI running in a separate application across the network. The comm actor will then be responsible for caching all the data it sends in case the UI application goes off-line and reconnects later. So, I figured I would need a copy in the comm actor at all times.

As to why I need a copy in the control actor, perhaps that can be eliminated. I will need to review the design. When sending a data object, is a copy made if the sender does not retain the data?

The main point of the question is if it is acceptable in AOP to use DVRs for data classes when sharing data between actors. And if not, is there a best practice to deal with large data sets. Maybe the answer is just let it make a copy.

In the past I have solved this with SEQs (pre-DVR and LVOOP). I just want to apply the best methods to the new design.

-John
------------------------
Certified LabVIEW Architect

AristosQueue (NI) · ‎08-29-2013

jlokanis wrote:
When sending a data object, is a copy made if the sender does not retain the data?

Depends upon how the data is sent. If it is sent using a queue, no, there is no copy made.

jlokanis wrote:
And if not, is there a best practice to deal with large data sets. Maybe the answer is just let it make a copy.

For the use case you're describing, it sounds like you're going to have to have a copy since the UI may be across the network instead of local.

In the general case, the answer is generally to treat one actor like the database and let others make queries for subsets of the data, subsets that are copied out as needed. There may be times when you truly need to use a DVR so that everyone is pounding on the same copy of the data, but I haven't seen any actual application written yet where that was necessary -- that's not saying they don't exist, just that no one has shown me one (and shown me one in enough depth that I can inspect and play with the code to test out whether or not the DVR is actually helping their code or hurting it).

drjdpowell · ‎08-30-2013

You can't do TCP by ref, nor can you do UI indicators without a data copy, so you're going to have to take a subset or summary of your large data set somewhere; why not do it in actor that is in charge of the data? Only send to the UI what the UI needs.

How about this? When the UI actor connects to the main actor (via the Comm actor) it sends an initial "registration message" that tells the main actor what subset of info it needs. This registration message is resent if the UI goes offline for a while, so there is no need for the Comm actor to cache anything.

Note that you could use the "Visitor Pattern" to specify what info the UI needs: provide an object in the registration message that overrides a 'summarize data' method that the main actor uses to build the update messages to send to the UI. This way the summary calculation is specified in the code base of the UI, but executed in the main actor process, and you can easily write alternate UI's or other actors that can "plug in" to the same system.

-- James

jlokanis · ‎10-24-2013

I want to revisit this topic because I am now at the point of trying to optomise my application.

Here is a basic description:

Server gernerates a change to the state data (a data class with ~20 different variables).

Server transmits a copy of this object (wrapped in a message) over the network to the Client.

Client sends the object (in a client side message) to a subscription (message channel with multiple listeners) Also, the messages is stored in a DVR for later reuse.

The subscription function sends a copy of the data object (wrapped in a message) to each listener.

The listeners are different UI views. So, one listener will display the data one way and another will display it a different way.

This data object should always be the same for each listener. So the data is 'singleton'.

This communication model can be executed in parallel over 100 times. (so, if there are three copies for each listener group, there can be over 100 groups).

Given that, even if the data object is relatively small, the effect of those copies is multiplied if the system is being highly utilized.

I am considering making the following change:

When the data object arrives from the Server application, I wrap it in a DVR and send the DVR to the subscription. This way there is only one copy of the data. Each time I get a new update, I reuse the DVR and store the new object.

Instead of sending the data object to the listeners, I am sending them the DVR. That is the trigger to tell them that the data has changed and they need to update their displays from the data now in the DVR.

So, I am still using messaging to control program flow. I am simply not copying the data as many times. Instead of ~300 data objects in memory (3 of each x 100) I have only 100 data objects.

Is this reasonable? Is this worth it? (Since I have property node accessors for the object's data, it should be easy to extract it.)

Are there any pitfalls I am not considering?

thanks for the feedback.

-John
------------------------
Certified LabVIEW Architect

Daklu · ‎10-24-2013

So let me see if I understand this correctly...

- The server sends a StateChanged message to one or more clients (desktop pcs) updating them with the new data.

- The client sends a copy of the message to each UI view on that pc. There can be up to three UI views that use the data from a particular message.

And you want to optimize it by sending each UI view a DVR instead of a copy of the data it received?

My initial thought is if somebody has all three views open on their desktop pc there are going to be three copies of the data in memory anyway, even with the DVR. Unless there's something you've left out I don't think it will save you much. All you're doing is changing which code is making the copies.

--------------

My questions:

1. Is your desire to optimise this due to performance issues you have observed, or is it an academic exercise for you?

2. "This data object should always be the same for each listener. So the data is 'singleton'." Just because each listener uses the same data for its view doesn't mean the data object needs to be implemented as a singleton. Are there other reasons you want this data to be a singleton?

3. "Also, the messages is stored in a DVR for later reuse." Why are you saving the message? The purpose of messages is to transfer information between actors. Once the client receives the message and reads the information there's no reason to hang on to it. And why are you storing it in a DVR if you are sending by-val copies to the UI views.

jlokanis · ‎10-24-2013

Your understanding is correct. Except: There might in the future be N UI viewers of the message. Currently I have only implemented 3. And each UI does not use all the data in the object, just the parts they need. But, in the future I may decide to use more of the data in a particular UI or perhpas different data. So, I send the whole object to all of them now. And yes, the parts of the object that I display or use to construct the display could generate a copy of that data (there are a few caveats to this that I can think of). But since I do not use all the data in all the UIs, there would not nessearily be a copy of the entier object in each UI if I used DVRs. Also, the UI processes always exist even if they are not being activly viewed. So, the data might not have been written to an indicator until (and if) the user opens that view.

To answer your questions:

1. I recently addressed a severe performance issue in the systme using DVRs. This was discovered under light loads. I have yet to stress it to a heavy load, but I suspect that I will hit another wall due to these data copies. So, no, it is not acedemic.

2. The reason I think of it as a singleton is because each UI should be working on the same data at the same time. There is no need for one UI to have an older version of the data and I have no need for history. By putting it in a DVR, everyone has the same info at the same time. What I am trying to sort out is what is the penalty for sending messages with large data or lots of messages with medium data through my message architecture? And can I improve performance by sending small messages (containing DVRs) when the data allows it?

3. That was the solution to the previous performance issue. For broadcast messages (multiple destinations) I store a copy of the last message in a variant tree. That way, when a new listener is added, they can request the latest value without having to wait for a new broadcast from the sender. I implement this by having the message system simply send the stored message to the new listener and they process it as if it is a new broadcast. The problem was, by storing all of these in the attributes of a central variant, I was doing large copies every time I wanted to access a single element of the tree. So, I instead placed the messages in DVRs and stored those in the attributes, greatly reducing the size of the data in the tree and the penalty for the copy when accessing it.

I guess in the end I am trying to convince myself that it is ok to take my 'by value' message system and turn it into a 'by reference' message system for some types of data. I can see some issues where I am updating the data in the DVR and not telling the target that a change was made or perhaps I only tell the target that the DVR was created and then the target might poll the DVR to update a UI at a specified rate, but that seems to violate the concepts of AOP.

Just trying to stick to the paradigm so I avoid unforseen consequences, but at the same time I need to make the application work.

-John
------------------------
Certified LabVIEW Architect

AristosQueue (NI) · ‎10-24-2013

You open the door to missing changes. Someone could change the DVR and send a message that says, "DVR changed!" And then, before that message gets received, they could make a second change to the DVR and send a second "DVR changed!" message.

If not seeing all the changes is ok, then, yes, this system works.

If you need to see all the changes, then try this small modification...

If instead of reusing the same DVR you use a different DVR each time, then you solve the problem... now you have three systems that are all getting the same DVR and so they can each open it up, look at the changes, then close it up without making a copy.

If you do that, then you'll have a problem of knowing when to throw away the DVR. So you add a field inside the DVR that is "number of people I sent this DVR to". When a recipient opens the DVR to look at the data, they decrement that count. If the count is zero, they know they're the last recipient, so they throw away the DVR. Besure to override Drop Message Core.vi to do this also, in case the message never gets handled.

Does that help?

jlokanis · ‎10-24-2013

Those are good points. In this case, being lossy on updates is ok. Even desirable. The Server might be very chatty, but the user only needs to see the UI updated once a second at best.

In the past I had constructed complex mechanisms where the status messages would be stored in notifiers and all sent to the same queue. The UI would sleep, then wake and dump the whole queue, discard any dup notifier refs and then process all the unique ones. This gave me a way to only update the changed elements in a summary UI display and only when I was ready. If there were no changes at all when the UI woke, then it just waited until a change came in. If only 20% of the senders had an update while it slept, then the UI only had to process those. And if a sender sent 30 changes while the UI slept, then it would only process the most recent one when it woke. It was very efficient.

I would like to do something similar in this application as well, but I am not sure how to accomplish that in a command pattern message based system. I can receive the data from the server and store it in the DVR, but I am not sure how to alert the UI process to the changes in a lossy way.

-John
------------------------
Certified LabVIEW Architect

Actor Framework Discussions

DVR or data copy? How to deal with large data in AOP projects

DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects

Re: DVR or data copy? How to deal with large data in AOP projects