Actor Framework Documents

cancel
Showing results for 
Search instead for 
Did you mean: 

Justifying The Actor Framework

Written by Stephen Loftus-Mercer,

creator of the Actor Framework

 

The Actor Framework is predicated on this thesis:

When sending command-like messages between parallel systems, asynchronous messages are superior to both synchronous messages and synchronous function calls.

A thread is any sequence of instructions executing in an application. If you write applications that have two or more parallel threads, you have many models for inter-thread communication, but all models come down to two basic flavors:

  1. Synchronous. A shared resource that the parallel threads take turns accessing. A thread reserves the resource, does some work, then releases the resource. This is functionally equivalent to the thread sending a message to the other thread and waiting for a reply. In both the "make a function call to a locked object" and "send a synchronous message" cases, the sending thread is held while the request is processed.
  2. Asynchronous. An open communications channel where a thread sends messages to the other threads. In this model, instead of issuing instructions, a thread announces status and expects the other threads will act on that announcement in their own time.

The synchronous model of communication is the dominant programming paradigm in use today. Most software involving parallel threads relies on synchronous function calls. For programmers used to writing single-threaded applications, the synchronous model makes sense. You have a pointer to a block of data, you reserve the data, you make some modifications, you unreserve the data so someone else can work on it. Similarly, you have the reference to an object, you call a method on that object, and the system takes care of making sure that every method is handled atomically. In this model, a programmer feels as if he or she knows exactly what will happen at every function call. They can read the instructions in order and just imagine the other thread injecting other atomic calls along the way.

 

Asynchronous communication, by contrast, feels sloppy. A thread is working on its own, managing its own data, and it announces to other threads, "This is my status" or "I need this data." It doesn't presume that anyone will ever answer those announcements. Instead, it keeps working in whatever state it is in, and it monitors its own communications channel for announcements from other threads. The sequence of steps to be taken in any given task to complete can be hidden as it is divided up into a multitude of functions representing mini states: here is the work I do at the start, here is the work I do while waiting for data, here is the work I do when I get the data, here is the work I do when I'm done with my task and waiting for a new task, etc.

 

Under the synchronous model, when Thread A needs a value, it calls a function on Thread B to "read the value." Thread B gets to a good sync point, pauses to copy the value out for A, and then both threads continue operation. Under the asynchronous model, when Thread A needs a value, it announces, "I need this value," and then it keeps working without the value. It records in its state, "I am currently waiting for this value," and handles all the rest of its tasks accordingly. Eventually, Thread B, who heard A's need, announces, "I have this value." Thread A hears the announcement and changes its state accordingly.

 

In my observation, the asynchronous messaging model often feels like a poorer choice for parallel systems because the programmer feels less in control and because the code is divided up and is thus harder to swallow as a chunk. We all know how hard it is to write parallel systems correctly, so programmers tend to want the model that makes them the most confident of writing correct code. Moreover, all that asynchronous messaging means that copies of data get handed around to every running task, leading to concerns about data synchronization and memory bloat. It is no wonder that synchronous communication remains the dominant paradigm.

 

The Actor Framework relies on a theory that our intuition is wrong. When we humans are asked to write parallel code, we feel more in control with the synchronous model, but tactically, we appear to reason more correctly under an asynchronous model. The reason for this is data changes: with asynchronous communication, only one thread is modifying any given piece of data at any given time. Under the synchronous model, a programmer writes a sequence of commands to change a block of data and then has to worry that between every pair of lines, the other threads could inject an infinity of other data changes. Can the programmer really analyze for all the permutations? If he or she cannot, then we have bugs called "race conditions" where the random order that threads get to execute affects whether or not the application succeeds or fails. Because such bugs do not reproduce easily (you have to get the same random order generated twice at run time), they are among the hardest of all bugs to fix. At some point, a programmer says, "No, I need to lock the data for the duration of this mega-operation." This allows the programmer to make all the data changes without worrying about interference from other threads.

 

But the lock may prevent other threads from making any progress, leading to issues like user interfaces that hang while supposedly background tasks update. And as locks proliferate on different objects, it is easy to construct scenarios where one thread has locked half the data and the other thread has locked half the data and neither thread can proceed with its work. We call these situations "deadlocks." Finally, there is the lifetime issue. Many synchronous parallel programs have problems shutting down or finishing subroutines because it is unclear which thread actually owns the shared data and who is responsible for freeing the memory. Either the program is too aggressive and releases data before all threads are done, leading to crashes when the other threads access deallocated memory, or the program is too lax, leading to memory leaks where no one frees the memory.

 

Asynchronous messaging avoids all of these issues by giving every thread its own data exclusively. The process works on its data; there are no references to that data for anyone else to use. The only references are the communications channels. If a thread wants to tell another thread about a given value, it does not share a reference to that value. Instead, it gives the other thread its own copy of the value. Each thread knows its own status and does its own work. Ideally, even resources such as hardware are not shared. One and only one thread is responsible for addressing any given piece of hardware at a time. It may decide it is done with the hardware and give up the access, but when it does so, it can only request the access come back, not demand it. In a synchronous model, many threads could have a reference to the same hardware at the same time and take turns using it.

 

The term "thread" is any parallel process within an application. Once we divide the threads up so that each is working on its own data exclusively, we call the threads "actors." An actor is more coherent than a general thread. It has much more well-defined behavior. We can say with confidence what state a given actor is in, which in turn allows us to say how that actor will react if we send it a given message. We never worry about lifetime of memory because an actor always frees its own data; there is no sharing. And we avoid deadlocks because an actor is always able to keep working because it never waits on another actor. The core tenet of actor-oriented programming is that nothing ever blocks an actor from responding to a message other than its own work. It might not check its inbox while it is working, but eventually, the actor finishes its current work and gets around to checking for new messages. That means that actors are always able to, at the very least, quit eventually. A given actor may take its time responding to a message and may have a deep backlog of messages to work through, but they never have the problem of not quitting because they are stuck waiting for information from some other actor instead of hearing the shutdown message.

 

One place where it is easy to see the value of asynchronous messaging is a Mars rover. The robots on that distant planet may need more information from Earth, but the distance means responses from Earth take a long time. We would not want a Mars rover to be unable to deal with a rapidly changing event on Mars while it waits for data from Earth. Instead, it sends its request off to Earth and then keeps working as best it can, adjusting its solar arrays and moving with the terrain. If and when a response from Earth comes back, the rover incorporates the reply into its current state and reacts accordingly. The extreme time delay between Earth and Mars makes the need for asynchronous messaging obvious, but the reasons for continuing to work without waiting for some other system apply to most parallel systems here on Earth. A highly responsive user interface may not be as high profile as a Mars rover but may be just as mission critical in its own domain.

 

The Actor Framework takes this asynchronous communication idea one step further and limits, by default, which actors can communicate with each other. The AF encourages actors to be arranged in a tree -- a single root caller actor launches zero or more nested actors, and each nested actor may launch its own nested actors. Any actor may communicate only with its caller, itself, or its own direct nested actors. If a message needs to go from one actor to its sibling, the actor sends the message up the tree to the common caller actor, and then the caller passes it back down to the sibling.

 

All this data passing would seem to be memory bloating. Empiric testing shows that to not be the case. In practice, an actor is only given the small bits of data that it actually needs to do its job. This means that if there is a central database of information, it is not the entire database that is copied per actor but rather just the entries that an actor is given by its caller. Small message objects are created and destroyed at high frequency, so you do need an efficient memory manager, but most modern systems have those. In other words, going through the actor tree is not generally prohibitively expensive. Programmers can create shortcuts across the tree so that siblings can directly message each other, but we advise programmers to do this only with good reason, i.e., you know from actual testing that you have a particular performance bottleneck.

 

Asynchronous communication requires a different way of thinking about your programming tasks. The mental shift is as big as the shift from variable to dataflow programming or from procedural to object-oriented programming. The Actor Framework exists because the benefits of making that mental shift are as great as the benefits of those two earlier paradigm shifts.

 

Comments
t.kendall
Member
Member
on

I found this because of the NI websites horrible search feature(I was coming here to ask you a question about AF error handling for pre-launch init), but I am glad that I did.  Linking this in the style guide for sure. 

 

We have recently on boarded a fair number of engineers and a good number of them are very green (anywhere from 0-3 years out of university) and I find them clinging to the synchronous, call response data flow model of things(or worse yet using the JKI state machine).  It is easier to envision everything if it is all in a single vi and follows a strict data flow left to right with a case structure in the middle. But I think this tendency is strongest in people without a deepish CS background, I think it is the equivalent of a text language having a massive mono function of 500 lines with a 15 case switch statement. Learning how to break a larger application into SOLID blocks is tough when you don't have the foundational concepts and understand why it is better to do it that way. 

 

Some of the latest actors we have built into our systems are of the 'on the wire' flavor(with a factory pattern kicker) and I think a lot of the people without AF experience are confused as to how they even work. 

 

 

 

 

 

Contributors