High Frequency Messaging in Debug Fork

ChuckDiesel · ‎10-18-2012

Hello,

I recently began rewriting a large application using queues and (*gasp*) global variables for asynchronous comms to using the AF. All seemed to be going well until I got to stress testing the core of the app.

I am basically sending a message (from an actor we'll call DAQ) containing a single 2D array of doubles (roughly 10x10 in size) at 1-5kHz to the DAQ actor's caller. The caller (Motion Controller Actor) then does some data manipulation and sends 1 row of data to its caller actor (Top Level Actor with test UI for now). The Top Level Actor then decides if enough time has passed to update its 1 row 10 item double array for user viewing (something like 10Hz).

Top Level Actor -> Control Manager -> DAQ

The issue is that when running this system at 500Hz and upwards for more than 30 seconds or so I start seeing slowdowns in when a message is processed. A message will not be missed; the timestamp of the data appearing on the test UI is correct, but instead of coming in at say 1 second intervals it slows down to 2-3 seconds.

If I try to close the program quickly after the splash screen finishes (I am basing this off of the AF template) then it will close without error. However, if I wait until things become sluggish then I get error 1556 ("The reference is invalid. This error might occur because the reference has been deleted.") upon stopping the actors. It seems as though a stop message for one of the actors doesn't get through before the actor reference is closed (or something along those lines).

Should I be seeing an issue using the Debug AF this way or am I doing something wrong? See some code snippets below. BTW, this problem does not seem to happen with the shipping version of AF in LV 2012, just the debug fork.

Thanks,

Chuck, using AF 4.1.1.34 DEBUG FORK on LV 2012, Win 7 x64

ycui7 · ‎10-18-2012

I am new to the actor framework stuff. Some of my reply might be wrong.

I did met similar issues on my program. My issue end up with the queue building up. There were too many data sent to the queue, by the processor loop cannot catch up to process all of them, so the queue takes a lot of memory. When the memory amount hit 2 GB on a 32-bit system, the program basically behaves strangely.

About the error, it might be due to race condition. So, when program runs slow, the quiting process execute throught a path unusual. This cause the queue to be release too early. But the processor loop might still working on dequeue data from queue, and since queue is released, reference becomes invalid. That could be a possible reason why the error comes. Also, because it is due to race condition, soemtime, the error does not appear at all.

komorbela · ‎10-19-2012

Hi Chuck,

I am not familiar with the Debug Fork, so I might miss some details, but for now let's just assume that the difference compared to the one in lv2012 is that it's slower.

I also don't fully understand your code based on these 3 screenshoots, but let's give it a try:

So in the Actor Core of the DAQ actor you register a user event and bundle it into the parent Actor Core. I don't know what do you do with it but I assume you call it in the Stop Core of DAQ. Is it so?

Anyway, as I see this Actor Core gets stopped by receiving a stop user event generated somewhere. If that stop event generation is somehow in sync with the DAQ actor stopping (for example if it's in the Stop Core) than you have a race condition. A race condition between the DAQ stopping and the Actor Core still trying to execute "Send Data" message-send-function which wants to use the queue of the DAQ actor. You should avoid this race condition.

What might have happened is the DAQ Actor Core already stopped when it still should process the "Send Data" messages.

I am not sure about my explanation as I don't fully understand your program and the Debug Fork also, so sorry if I misunderstood something.

By the way if I am right, than I would suggest to have more control over how the DAQ Actor stops. Only let it stop when your Actor Code override doesn't run for sure. (I mean when it doesn't want to send messages)

Message was edited by: komorbela

Ben_Phillips · ‎10-19-2012

I can't speak directly to your error, but Aristos advised in one of the white papers or intros to AF, that for high speed synchronous transfers, you should not be using messages to initiate something, time after time after time at high speeds. And you should DEFinitely not be using AF messages to transfer DAQ data. Probably better to use a single message at startup and send it to both the sending and receiving actors of that synch transfer. That message would contain the ref to something like a queue or notifier that would contain the data. Both of the actors then need to have a parallel loop in actor core that work on that transfer mechanism, whatever it is you chose. Those loops would essentially work like old school LV.

This adds extra work to provide a way to for those loops to shut down, but I don't think there's any way around this (it's really not a big deal, though, just a little extra baggage like you've always done). I use a light wrapper class around a T/F notifier to make a stopping mechanism to stop parallel loops without having to see the notifier primitives on the block diagram, but you could also have another messaging mechanism (not AF) if you're less lazy than me.

Daklu has some really good posts about the way he breaks down what he will permit in each message, and what type of thing they contain or do. Can't remember the name of the post, though. I won't butcher his idea too much except to say messages stating a status should be handled/grouped/named differently than something that is an action. That's as far as I've gotten with the idea, but his stuff is worth looking at. He provides extremely detailed examples and explanation, highly recommended reading. I almost went with his LapDog thing, turns out I did not, but I'm getting ready to resurrect the old reading and give it a go again.

Really worth listening to his posts about slave processes, how he deals with them, and when they're appropriate to use. Next place I have something complicated to do that isn't AF, I'm going to try to do this.

For my own app, I have to provide simulation data to some other thing that will do something, better to keep that out of my mind, I just need to produce the data and make sure it gets there. An extra wrinkle is that I have to have data models that are different, and can produce data at any rate they choose. And an extra wrinkle still, I have to be able to do this from any server, so that we can distribute the processing load. So I've got a process that runs at each server, and be able to fire up a Data Overlord so to speak, that can get everything out at fastest possible speeds, but handle independently-timed slaves.

I might change this later, but for this I chose a notifier. My top level only does things like a web marketer does. It puts the right people into contact, but has no further involvement. Each new source of data, aka an Actor, has to produce data and it needs somewhere to put it. Its a notifier. The program at this point grabs all that and does things like removing duplicate device names, then sends it out at a repeated period. Here is where I think you're going wrong. Don't use an AF message containing data, just push it directly to whoever is using it.

Aristos and Co actually encourage this if you have situations like yours or mine. Messages do not solve everything.

Apologies if I did not answer, but did my best.

komorbela · ‎10-19-2012

Hi again. I totally misunderstood your VIs. You send the message from the Actor Core to the caller actor, Not itself. So sorry. The revised suggestion:

It is still a race condition. Your caller actor probably stops before you stop sending messages to it. I would send a stop request from the caller and only stop the caller when the DAQ actor sent back a "stopped successful" message. I still may be wrong but I feel that it's close to the truth 🙂

ChuckDiesel · ‎10-19-2012

Thanks for the replies,

ycui: I have been monitoring my memory situation, and it does not seem to be getting that high. I have 8GB of memory since I am using 64 bit windows (but only 32 bit labview) so I don't believe that is the issue. However, about the error you may be correct.

komorbela: Yes, the stop core sends the user event to stop the while loop in actor core. This is basically taken exactly from the AF template. I think to avoid this condition I will wrap the send message into the user event case structure and only execute it if the user event has not been triggered. On your second post, yes the DAQ sends the data update to its caller, the Control Manager. I will also experiment with the necessity for the DAQ sending back a "stopped" message to its caller.

Ben: I have read through https://decibel.ni.com/content/message/39166#39166 which seems to explain what you are describing, I just thought I could get away easy! So the proper implementation would be having a queue reference in the DAQ and the caller; I am struggling with how to have the caller also be able to react to messages that manipulate the data coming in through the queue consumer loop within its actor core override (since from what you are saying it wouldn't even be appropriate to send a "New Data Set Ready in Queue" message to the caller actor at the same frequency without any data). Maybe it just isn't going to be as easy as it seemed at first, so thanks for reassuring me in this.

All told, the program just ran fine for 12 hours straight overnight without losing any time at about 5000 Hz of updates from the DAQ. This is without using the debug fork. So although it seems to be working I will probably end up doing it the "right way" anyway.

Chuck

LVB · ‎11-02-2012

Just catching up on some AF threads and this one caught my attention.

Ben_Phillips wrote:
I can't speak directly to your error, but Aristos advised in one of the white papers or intros to AF, that for high speed synchronous transfers, you should not be using messages to initiate something, time after time after time at high speeds. And you should DEFinitely not be using AF messages to transfer DAQ data. Probably better to use a single message at startup and send it to both the sending and receiving actors of that synch transfer. That message would contain the ref to something like a queue or notifier that would contain the data. Both of the actors then need to have a parallel loop in actor core that work on that transfer mechanism, whatever it is you chose. Those loops would essentially work like old school LV.

If this is the case, I would like to hear this from AQ directly. I am pretty sure that sharing references between actors is an anti-pattern. Actors are supposed to be completely self-contained. Sharing references within an actor is completely acceptable. If there is something high-speed that needs to occur and references must be used, then it should be contained within a single actor and implemented in the Actor Core via a parallel loop (not another actor).

Sharing references within an Actor Core is mentioned in the User Interfaces for Actor section of the white paper.

For an example of high speed/bandwidth actors, I would recommend looking at the Angry Eagles project. The DrawManager.lvclass is a good example which uses a DVR and notifiers to update the UI. Note, the references are only created from the "Send … Msg" or within the DrawManager.lvclass actor and not shared with other actors…

Ben_Phillips wrote:
I might change this later, but for this I chose a notifier. My top level only does things like a web marketer does. It puts the right people into contact, but has no further involvement. Each new source of data, aka an Actor, has to produce data and it needs somewhere to put it. Its a notifier. The program at this point grabs all that and does things like removing duplicate device names, then sends it out at a repeated period. Here is where I think you're going wrong. Don't use an AF message containing data, just push it directly to whoever is using it.
Aristos and Co actually encourage this if you have situations like yours or mine. Messages do not solve everything.

I am pretty sure there are two problems with this method. I don't think that AQ and Co encourage this.

1. Communicating between actors with anything other than a message

This is dicussed above. Only use messages between actors. Nothing else.

2. Short-circuiting the actor tree

Short-circuiting the actor tree is not suggested. It has been mentioned that short-circuting has it's use cases (Linked Network Actors) and pitfalls.

It would be nice is "AQ and Co" could chime in to clarify some of the topics brought up in this dicussion.

CLA, CTA

Jed394 · ‎11-02-2012

I'm not sure why crosslinking between actor's would be a bad thing for data publishing. There is definitely overhead in messaging and dynamic dispatching that will slow down some high rate DAQ processes. I'm new to this, but the rule i'm living by is that when i need to perform some "Action" then that has to be propagated through a message. However, if i have some actor just spitting out data to a queue that i can subscribe to anywhere in my architecture, i'm not sure how that is bad. As long as you follow the rule, that any "action" based upon this data must be a message, the same protection seem to apply.

Example: IF i need to update a UI element for a sensor actor, then i just subscribe to that sensors data queue. However, if i need to react or change a ui component based on that data, then i message the UI actor with the action to be performed.

justACS · ‎11-02-2012

It sounds very much like the slowdown is due to messages stacking up in the consumer's receive queue. You just aren't processing the data as fast as you are generating it. You'll need to look carefully at how long it takes to process data, and see if you can make adjustments. Perhaps fewer messages containing larger data sets, or some kind of pipelining, is the answer.

You can test for this by changing your stop message to have a higher priority. If the system shuts down right away with no errors, then message stacking is the likely problem.

FWIW, this is a common issue with any kind of producer/consumer architecture, not just AF.

BTW, it is perfectly acceptable to register an actor as a listener of another actor. Say you have a system with a top level actor that launches an analysis actor and a DAQ actor. You can pass the queue of the analysis actor to the DAQ actor, allowing DAQ to pass data straight to analysis, without going through top level. We make you do this explicitly because we want you to have a reason for doing it. Cutting your message traffic in half for for repeat messages from the same source to the same destination (DAQ data from the acquisiton loop to the analysis loop) is a good reason.

What is NOT OK is adding a different mechanism for transmitting data. AF is built around the assumption that you are using the message queues, and only the message queues, to move data between the actors (what happens *within* an actor is a different story). I've come across a few tightly constrained corner cases where I've found it necessary to do something else, but they are rare. Moving a block of data from one actor to another is not one of those corner cases. There is certainly no reason to use a separate LabVIEW queue, because the AF message queue is a wrapper around LabVIEW queues! All you would do is add a backchannel communication path that offers no performance improvement and adds a risk of breaking the AF model. And a notifier is right out, because it has the same issues as a regular queue, and is lossy as well!

If you need to stream continuously acquired data from actor A to actor B, where B is not A's caller, just give queue B to actor A, and send the data using AF messages. Yes, there is a (very) small performance hit for dynamic dispatch calls (the Do method), but it's measured in microseconds, and should not adversely affect a 500 Hz system. You will need to balance the production and consumption rates, just as you do for any producer/consumer system.

Daklu · ‎11-02-2012

LVB wrote:
Just catching up on some AF threads and this one caught my attention.
Ben_Phillips wrote:
Probably better to use a single message at startup and send it to both the sending and receiving actors of that synch transfer. That message would contain the ref to something like a queue or notifier that would contain the data. Both of the actors then need to have a parallel loop in actor core that work on that transfer mechanism, whatever it is you chose.
If this is the case, I would like to hear this from AQ directly. I am pretty sure that sharing references between actors is an anti-pattern. Actors are supposed to be completely self-contained. Sharing references within an actor is completely acceptable. If there is something high-speed that needs to occur and references must be used, then it should be contained within a single actor and implemented in the Actor Core via a parallel loop (not another actor).

Umm, I'm not AQ but I throw in my $.02 on the matter.

------

[Edit - It looks like I have a slightly different take on the issue than Allen.]

------

I do not think it is a good idea to share data references between actors. Things like single element queues, globals, DVRs, etc. In my designs each bit of data is "owned" by only one actor. That actor, and only that actor, has the responsibility to update the data and distribute it to the rest of the application. Shared data references blur the relationship between the data and the owning actor.

Ben is describing a dedicated data pipe between two actors. It's not a data reference, it's a separate queue for getting high speed data (thousands of messages per second) from one actor to another without loading down the messaging system. IMO it is a good idea--it helps keep your application responsive and allows more flexibility in your design.

When I use a data pipe I make sure it is only used to send data. I do not transmit any control or status messages on it. All control/status messages are transmitted on the normal messaging tree.

------

[Edit #2 - Allen's technique ("registered listener") and my technique ("data pipe") are very similar and accomplish the same thing. The differences are mostly in how you conceptualize this out-of-band communication.

With the RL this communication is sent through the normal--albeit short circuted--messaging channels. High speed data is received the same way as normal control and status messages. It avoids loading down the message hierarchy by skipping all the intermediate steps. Think of it as an overnight letter instead of normal snail mail. The DP implementation sets up a separate transport mechanism. It's more like a text message in that it doesn't sit in the mailbox (message handler) with all the other letters but instead gets sent directly to where it's needed. RLs short circuit the message tree. DPs bypass the message tree altogether.

Which is better? *shrug* Neither. It's mostly a matter of style and how you think about your system. Which one makes more sense to you? I don't like having to think too much about my messaging system's performance, so I prefer DPs. The disadvantage of DPs is there is more overhead to implement it--you need to have something create the two pipe endpoints and send them to the sender and receiver, then set up a separate loop in the receiver just for servicing the pipe.

The disadvantage of RLs is you have some messages that are not following the message tree. Ideally it will only be data messages, but once you have the queue of that actor way over there you can send it any message you want. Furthermore, it's harder (for me) to construct a mental model of the application when actor interactions aren't clearly organized.]

Actor Framework Discussions

High Frequency Messaging in Debug Fork

High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork

Re: High Frequency Messaging in Debug Fork