Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

cbutcher · ‎11-10-2016

I have some initial work on an AF-based project. When I hit my stop button, the UI sends a message to the 'Controller' actor, which is the root actor, to shut down.

The nested actors all then shut down, mostly by using private data stored booleans triggered by 'Stop Core.vi' as needed (signalling when stopping an event loop.)

This was working quite nicely but now when I hit stop, the windows all close but the libraries remain locked for some time, typically between 5 and 20 seconds. (The obvious question becomes, what did I recently change. Unfortunately, I don't know - at first, I assumed the delay was me having broken something with shutdown and so I tinkered. When I realised it was still finishing eventually, I just waited. Now, the time is problematic because it slows testing of small things until I want to make more major changes without checking their effects between steps, which I'm sure increases the chance I mess up and cause further problems)

Reading this post (https://decibel.ni.com/content/thread/16918?start=0&tstart=0) it seems like there might be a few culprits (although the post is 3 years old).

They list as possibly suspects:

I didn't actually close all the actors. My guess is that this would lead to the libraries becoming permanently locked until I closed the project, triggering the 'Do you want to abort running VIs?' style dialog (this does appear if I close the project, so presumably something is running, but it does close itself if I wait long enough)
I have one time-delayed message sender. The notifier is closed using the 'Release Notifier' primitive in Stop Core.vi, but the DETT shows that a bunch of these messages continue to be dropped for quite a while, so that makes me think this might be a problem?
I have actors embedded in subpanels (a graph controller in the main UI, with an actual graph in a nested subpanel). Regardless of whether I use the Invoke Node's 'Remove VI' or not, the delay occurs. Maybe this isn't the problem?

Possible additional problems:

I have calls to read a SQLite database from the most nested subactor (the actual graph actor). The Timing profiling shows that these typically use the most time, but calling 'Abort VI' on the reading VI during the graph's Stop Core to try and speed up possible hanging there doesn't help me. In any case, the typical time is far less than the time that the project locks for.
I have somehow stupidly left something tracing execution (although I can't find anything) and this is only called during shutdown, which then drags out the time taken significantly. I can't find any evidence of this, but perhaps? Using DETT might help me find the long running VIs, but nothing was apparent from the profiling with Timing and matching the Calls and Returns for VIs seems tricky (but maybe I'm not using the toolkit effectively).

AristosQueue (NI) · ‎11-10-2016

Temporarily change over to using Emergency Stop instead of Stop... does that fix the problem? If so, the issue is a backlog of messages in someone's message queue.

cbutcher · ‎11-10-2016

Yes. I first changed the stop signal send to the SQLite logger, since I assumed that was where I had messages backing up. This made relatively little change.

Swapping the Stop signal to emergency from within the UI Actor (which sends the stop signal to the controller manually, and then receives the stop signal from the controller automatically) gave me my libraries back nearly instantaneously.

I guess now it becomes just a matter of working out who's holding messages.

As a follow-up question, does this mean that if an Actor has a series of messages remaining in the queue, and is sent a stop

Whilst typing, this became obvious. An Actor will process all of the messages in the queue in a queued manner. If the stop signal is somewhere far down the line, it won't be handled for some time. (But messages can still be dropped from the queue if they're enqueued after the stop signal, right?)

justACS · ‎11-10-2016

cbutcher wrote:
I guess now it becomes just a matter of working out who's holding messages.

I doubt anyone is holding messages. One actor is receiving messages faster than it can process them. Look for methods that might take a relatively long time to run; our various profiling tools can help with that.

cbutcher · ‎11-10-2016

Putting a One Button Dialog in each Stop Core.vi tracked down the culprit. Is my solution at this point to just always send E-Stop to that Actor? Since it's only graphing previously saved data, I don't think I need to worry about it failing to carry out exisiting messages.

When I create a child class subactor, is there an instance of the parent class actor I should be concerned about? Whilst its Actor Core is called by the Call Parent Method, I never receive an enqueuer for the actor, so I'm guessing not.

In this case, my stop message goes something like:

with the stop message moving from Main -> Controller, then left to right? (Apologies for the dreadful UML, I've never used it but figured it's supposed to be good for drawing diagrams. I used the black arrows for SubActors, and I think the open arrow is actually the correct UML for a parent class (but that could be nonsense).

My guess is then that MainUI should be the only one to send an E-Stop, and only to SQLiteGraph, if I later have other nested subactors to MainUI.

Or is there just a much better way to do this and speedily get through the message backlog (beyond sending fewer messages, which I'm going to look into now)

cbutcher · ‎11-11-2016

Regarding 'holding messages' - my mistake, poor choice of wording. Sorry. As you said, the profiling tools can very clearly indicate the slow VIs are so targetting improvements was easy.

Reducing the frequency with which the message requesting an update turned out to be sufficient to avoid the problem (at the moment) but some improvements to my use of SQL cut the execution time of a couple of subVIs by a factor of ~20, which was a rather more useful change. Now my graph update can run in around 100ms rather than ~1.5 seconds.

Thank you both for the guidance in troubleshooting my Actors and working out where the problem lay.

Actor Framework Discussions

Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

Re: Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

Re: Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

Re: Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

Re: Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)

Re: Long (sometimes 20s) delay when ending AF-based application (but not permanently locked)