Actor Framework favorite (mis?)use cases

crandiba · ‎03-05-2020

Hello all,

As for most I imagine, AF was a game changer for me. Consistently it showed itself to be a great solution (in many levels) for a base framework, catering to many use cases.

However, sometimes I admit I may need to massage or go around what was perhaps the intended boundaries of use cases to get the design that I want. As I'm sure everyone does this at some level, I wanted to try to collect some of your favorite cases.

A classic example: often in distributed systems, Actor A needs data (or something) from Actor Zed, all the way across the tree. As routing the message up/down the tree doesn't scale, often we see actors being passed the enqueuer of some "service" actor and communicating directly.

There have been ample discussions on this (for example). I would argue its also present in some less obvious aspects like described in the AF whitepaper "How to use the AF" (see part on self addressed messaging). Personally, I agree with what was said previously by some better devs than me: prefer the tree, but accommodate the possibility of actors being imbued/having-a-mandate to communicate with some other actor in the application. This is more obvious in terms of larger applications where there are bound to be "service" Actors (central error handling, configuration managers, Broker/servers, etc).

I have another, less classic example that I will confess to you here, that I hope won't bring upon me the anger of the gods:

Say you have an Actor whose job is to respond to subscription (or monitoring) requests from the other Actors in the application. For example, Actor A makes a request to be updated on some process or subscription. To keep this monitoring/subscription (which is ofc blocking) out of the Service Actor, it spins up a minion actor to perform this action, and notify the client Actor in its behalf.

Now the river: sometimes I want that minion actor to be linked to the lifetime of the Client (requester, Caller) actor, not the "service" one. So I have the Service Actor use "Send Launch Nested Actor" method - but with the Client's enqueuer. This way, the minion notifies the caller on error and shutdown, as well as having the caller enqueuer easily accessible. If the Caller shutdowns, so goes its orbiting minion (hence why I refer to these as satellite minions).

And these 2 things are only to get us started...

Finally, I'll explain a bit my motivation: there have been already discussions on a "built in AF framework" quite recently even, and new developments promote even further the decoupling of actors. I guess my point with this is: 1) systems are more and more integrated and distributed, and in a practical sense everything needs data from a lot of dispar systems and 2) the framework and tools we have favor this decoupling, and hopefully will continue to do so in the future.

Thanks all, looking forward to hear your feedback,

AristosQueue (NI) · ‎03-05-2020

> As routing the message up/down the tree doesn't scale

Have you proven that? Do you have data showing that it doesn't scale in most cases? I suck as a scientist, so I can't give you counter graphs, but the anecdotal evidence of many AF apps that I get invited to look at is that it DOES scale. Many people have asserted a priori that it does not scale because they think it shouldn't... but there are a scattered few that I have backed into a corner and convinced them to actually build it. Almost all have found the performance not even a small issue, and the effect of requiring that to be a significant reduction in bugs and maintenance. Not all, but most. And the reason for that gain is that "from Actor Zed, all the way across the tree" is almost always wrong. In most cases, Zed should be *pushing* data out to its callers, so any one doing the request is actually requesting from a much nearer point than Zed. And even the request is often a bad idea -- just assume the data hasn't changed unless a new value is *pushed* to you is generally better design. Not always feasible, but a lot more common than not.

I really want someone to come up with a way to systemically study the scalability of the message tree. Allen Smith and I both thought the tree would be a performance disaster when we first prototyped AF. I wanted to use the tree as a proof of correctness to validate a direct messaging model (i.e. run both, and one would run slow but it should get the same result as the one running fast). But we never hit the performance bottlenecks that we expected. And eventually, we enshrined the tree as a better way of doing messaging. We left the backdoor open for direct messaging if it was needed. Are you sure you need it?

crandiba · ‎03-06-2020

Hi AQ,

To be clear, I have no concrete example, demonstration or proof prepared. What I can offer though, is a small thought experiment:

It all comes from what I can observe and infer: if the tree grows, the Caller actors are subsequently (and exponentially*) burdened on routing messages from its nested actors. And its not even the performance that I have an issue with, but the added code needed to be placed on the Callers to accommodate this routing. Sure, I can try to be clever and have some lookup/routing mechanism, but I rather not got there.

My own preference is towards a more zen, minimalistic (?) approach to actions, where "movement" or "unnecessary motions" (i.e. the routing of the messages by the callers) is to be avoided. I also realize part of why I'm partial is because of this preference. I did try the tree, and quickly lost enthusiasm when the levels start to grow and I need to have the callers updated/accommodating more and more messages: it almost felt like the Caller was working for its nested actors, at some point...

Your second point "should you be contacting Actor Zed in the first place" is very, very pertinent. To that I can only answer with: in the systems I am used to work with, it will be common that you will need to contact another actor with a request. In fact, its not only IoT that brings this: I notice the better the actor is decoupled from its surroundings, the more reuse I can extract from him and the more likely I am to plug him in different applications and have him be contacted by other Actors not directly in his proximity.

Your suggestion of the data push/subscribe mechanism is very valid however I feel it only accounts for (admittedly) most of the use cases of the cross tree communication, but not all (see request-response: calculate-this/do-this-operation-for-me type).

"Are you sure you need it?" seems like a loaded question: strictly speaking of course not, I could go across the tree. But in some use cases, it just doesn't feel right. My goal was to try to collect use cases on how developers have creatively used it, and maybe it can open the way to some new designs or better ways of doing things. And as LabVIEW gets better and supports more features, it just feels more relevant than ever.

Thanks a lot, and sorry for the long read!

AristosQueue (NI) · ‎03-06-2020

I may have hijacked the goal of your thread, but since I've started, I'll keep going for a bit. 🙂

I would like you to, in the words of Obi-Wan Kenobi, "search your feelings" with regards to this sentiment: "But in some use cases, it just doesn't feel right."

Please consider error code 7, File Not Found.

When a user enters a path into a dialog, returning error code 7 to the user is perfectly reasonable. But when the user tries to do something that requires the app to dynamically load a component, but that component is missing, the low-level error code 7 should probably change to something like 5000, Optional Component Not Installed.

We are all very used to the idea that a low-level error needs to be rewritten as it goes up the chain. Maybe we nest the lower-level code inside to aid debugging, but the higher-level error tells the caller what the real mistake is.

My contention is that the rewriting-for-higher-level that we naturally do for error returns should apply to ALL returned values, especially in asynch systems.

Allen and I have been working on the AF since 2009. We made it part of LV in 2012, and he and I have jointly seen a lot of applications. We think that your feeling of wrongness (shared by many users) is itself wrong, brought on by years of wrong thinking about how asynchronous systems have to work.

When Alpha and Zed are on opposite sides of the tree, Alpha cannot request anything of Zed without knowing about Zed. But knowing about Zed implies a design of Zed's caller, and Zed's caller's caller, and all the way up the tree until the common caller. And those assumptions may need to change such that Zed doesn't even exist. What if Zed's caller chooses not to launch Zed but instead just incorporates Zed's activity into itself? Or splits Zed into two separate actors? Alpha has to be edited to take those changes into account. But the author of Zed's caller may not even know about Alpha, so ne cannot necessarily go edit Alpha's code, even if the author owns that code! Writing Alpha such that it even knows about Zed enough to contact it is, in my observation, the key reason that async apps eventually fail.

At the root level of the tree, all the trees are peers (the "forest of trees" architecture that Allen and I have talked about regularly). They expect each other to come and go, and they expect to have to connect in various complex graph ways. That's designed into that layer. But having that interlocked design at every layer is a bug, in my opinion. The observer pattern is an anti-pattern except at the root of your application.

My working hypothesis: Alpha does NOT need to know anything from Zed directly. Not ever. What it needs to know is some aspect of the environment. It happens that Zed is what is computing that aspect, but Alpha shouldn't know that. The only environment that Alpha knows and can trust is the actor that created it and gave it purpose, it's own caller. Passing the message up from Zed and back down through the tree will actually change the message at each step. That is a good and desirable thing. We re-write the message from Zed for its caller, all the way up the tree, and rewrite it as it goes back down the tree.

My Evaporative Cooler shipping example tries to show this. The message from a temperature sensor is not sent directly to the AC controller. Instead, it goes up to an aggregator that computes a Room Temperature by averaging many sensors together. If the AC controller was directly tied to the temperature sensor, making that change would be a huge refactor. The Room Temperature is then elevated to the AC controller.

Think about it. I think you may want to change your habits.

justACS · ‎03-06-2020

Stephen, I'm stealing this.

I've been thinking a lot lately about what defines a feature in an actor system, and the impact interfaces will have on AF, and I'm coming to conclude that we are not paying enough attention to the relationships between our actors - the messaging topology itself. I am starting to see having to move a piece of data from Alpha all the way over to Zed, with little or nothing happening to the data along the way, as a code smell. It is an indication that either the relationships between the actors, or the duties assigned to them, are wrong somehow. (As Stephen says, "What if Zed's caller just incorporates Zed's activity into itself? Or splits Zed into two separate actors?")

Actors are the fundamental unit of computation in an actor system, but they are rather pointless by themselves, in much the same way that individual VIs don't mean much until they are wired together on a block diagram.

Something I've realized recently is that a feature in an actor system is really defined by the chain of messages that implements it. Sure, the work happens inside the actors, but the feature's structure is defined at the topology layer. One of the neat things about interfaces is that they will allow us to look at the messaging topology by itself, completely separate from the actors that will do the work. (One quirk of AF is that messages are announcements from a sender to its environment, but the message artifacts themselves are owned by the receiver; I think this inversion has muddied the waters a bit.)

It's probably an iterative process. We'll have to rough out some behaviors, and initially assign them to some actors. But then we watch the topology, and we make adjustments. There will be jitter - behaviors will move between actors, actors will split, new actors will be created - as we better understand our systems.

I don't have anything concrete yet - my thoughts on the subject are evolving. But, like Stephen says, I have this growing feeling that we've been doing it wrong at some level. There is information, and therefore power, in the topology, and I don't think we spend enough time looking at it. We're watching the actors, not the play.

paul.r · ‎03-06-2020

@AristosQueue (NI) wrote:

The observer pattern is an anti-pattern except at the root of your application.

Can you elaborate on this?

AristosQueue (NI) · ‎03-06-2020

@paul.r : Observer pattern has two problems. The first is the infinite echo chamber. The second is the out-of-order message receipt. When you mention these two problems to most developers who use the Observer pattern, they're response is, "Sure, but it is easy to code defensively against those two." Somewhat true (there are some complex cases). But if you ever fail to code defensively against those two, they are two of the hardest problems to debug, and are often extremely hard to refactor against.

Infinite Echo Chamber

This bug happens frequently in UI programming where the Observer pattern is very common. You have an outer control and an inner control (like a cluster around a numeric). The user can change the size of either one. If user grows the inner, the outer control should grow to match space. If the user grows the outer, the inner should grow to keep up (LV doesn't actually do this, but it is common in a lot of UI layout systems... LV NXG has this option).

To make this work, the outer control registers as a listener to the inner to hear about size changes. The inner registers as a listener of the outer to hear about size changes.

When the user grows the inner control, the inner can either send an absolute message, "I grew to be X units" or it can send a relative message, "I grew to be X units bigger." The bug exists with either message, but let's choose the relative message because the problem is easier to see.

The inner control sends a message to all its listeners that says, "I grew by 3 units." The outer is a listener, so it gets the message, and it makes itself bigger in response to the message, and sends a message to its listeners that says, "I grew by 3 units." The inner control is a listener, so it gets the message, and it makes itself bigger in response, and sends a message to its listeners... uh oh.

The defensive programmer knows that the messages have to be absolute messages AND listeners have to remember the old size of the thing they are listening to, so they can compare last known size against the size in the message and choose to do nothing in response to the message.

Out-of-order Message Receipt

This one is really bad because it is a race condition. Very hard to reproduce reliably.

Here's the code that the programmer thinks that they write:

Actor A sends message to its two listeners, actor B and actor C, that says, "I changed my state from X1 to X2."
Actor B gets A's message, changes itself, and sends a message to its one listener, actor C, that says, "I changed my state from Y1 to Y2."
Actor C receives A's message first (because it was sent first and the queues guarantee delivery order), and records the new state of A. Then it receives B's message. C thinks, "Oh, because A is in X2, now that B is in Y2, I should do action Q."

The problem is with A's sending. Remember that independent threads can be interrupted at any moment. That means things can happen like this:

Actor A sends message to its first listener, actor B, that says, "I changed my state from X1 to X2."
Actor B gets A's message, changes itself, and sends a message to its one listener, actor C, that says, "I changed my state from Y1 to Y2."
Actor A sends a message to its second listener, actor C, that says, "I changed my state from X1 to X2."
Actor C receives B's message first (because it was sent first and the queues guarantee delivery order), C thinks, "Oh, because A is in X1, now that B is in Y2, I should do action R." Then it receives A's message, and C is confused because A shouldn't be changing state while R is running. And the programmer is confused because "that can't happen, I'm sure of it!" And the worst part is that when the program runs a second time, the bug goes away.

You can code this defensively in many ways, but it is soooooooooo easy to create this scenario without thinking about it. And most of the defensive tricks involve -- get this -- arranging A to only send to B who only sends to C so that C always gets the state of A and B as a pair. You know what that looks like? That looks like the same "let's organize these actors into a tree" that the AF advises.

Huge amounts of code (and CPU cycles) are spent in observer systems diffing previous state with new state in order to stop echo chamber. And lots of comments exist saying, "Never let G talk directly to H" in order to prevent out-of-order messaging. My theory is that all the supposed performance advantages of the Observer Pattern creating arbitrary connections are eaten up by these defensive programming techniques. My other theory is that the Observer Pattern means a lot more things have to be refactored when something changes because abstractions are leaked all over the place.

I saw a huge amount of Observer Pattern failures when doing UI programming for LabVIEW NXG a few years ago. I saw that same pattern failing in user code from really top-notch architects who were trying to write asynchronous module libraries. Those failures spurred my work on the Actor Framework. As the AF showed itself to be more and more successful, I became more and more doubtful of the Observer Pattern. And for about the last four years, I have considered the Observer Pattern to be an anti-pattern. I have been trying to proselytize against it.

cbutcher · ‎03-06-2020

I'm not sure if this is the kind of response you were looking for, but I'll add that when I started using Actor Framework, I identified lots of processes and created lots of Actors, thinking this gave me wonderful modular code.

This kind of style seems to me to also be what some (mostly detractors of Actor Framework) point to in regards to it "infesting" your entire application, or greedily taking over everything.

I'd like to point out that it doesn't have to be this way - since that time I've found that I have some very nice uses for AF, and Actors, but that not everything needs to be an Actor. They will fairly happily communicate with non-Actor code (especially if they receive information from e.g. their caller) and can be used when you'd like without using them everywhere.

Further, you're not limited to one Root Actor (of course, this is never even really implied, but the naming lead me away from the alternative at first) and so having separate groups of Actors (allowing actually modular code, rather than a conglomeration of overly-coupled unrelated Actors) is completely possible.

I'm looking forward to NI Week this year (for a non NDA-covered explanation of what's being discussed above re Interfaces, see the public video of Stephen's presentation at the American CLA summit (slides))

drjdpowell · ‎03-07-2020

As an aside to AQ, as this reminded me of a previous conversation we had: I suspect the root flaw illustrated by all your examples is actually that of circular dependancy. A observes B and B observes A is an immediate code smell to me, not because of the observer pattern, but because of the circle.

AristosQueue (NI) · ‎03-07-2020

No cycle in the out-of-order receipt example.

Also, proving that there isn't a cycle is a hard thing. Why not start with an infrastructure that pushes back on that ever existing in the first place?

Actor Framework Discussions

Actor Framework favorite (mis?)use cases

Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases

Re: Actor Framework favorite (mis?)use cases