peculiar behavior from event-driven queued state machine

Specter · ‎11-15-2007

Hello,

I am seeing some mighty peculiar behavior and I hope someone can help.

I am using the standard producer-consumer loop, queued state machine, event-driven architecture. The producer loop contains the event structure...the consumer loop contains the functionality. The two chat with each other via a queue, a notifier, and some user events.

In my consumer loop, I have several states -- Initialize, Get Data, Driver On, Current Change, All Drivers Off, Error, and Stop. The consumer loop spends most of its time in the Get Data state. The Get Data state contains a while loop which collects data from my sensors. It also contains a Get Notifier Status VI, and one or two other odds and ends (mostly processing -- this isn't a high-speed acquisition). The Get Data loop will stop under 3 conditions: if the interlock circuit indicates the interlock has been opened; if an event fires, thus sending a "True" value to the notifier; or if an error occurs. My code determines which one of these things has occurred and takes the appropriate action. If the interlock opens, the code generates a user event which will queue the Error state. If an error occurs, again the code generates a user event which queues the error state. If an event occurs, the code exits the state, and whichever event fired (Driver On, Current Change, etc.) is queued. (The notifier value is also set back to "False" previous to exiting the Get Data state, so that the loop won't terminate immediately the next time I enter it.) In that case, the consumer loop enters the appropriate state for executing the action indicated by the event (i.e. if the Driver On event fires, the consumer loop enters the Driver On state and executes that code...turns the driver on). After the action has been executed, the consumer loop generates a user event which queues the Get Data state, and the consumer loop re-enters that state and again loops until something happens to stop the loop.

This works perfectly under all conditions except one. As it happens, my engineers have decided that they might possibly want to shut down all of the drivers while the current is in the middle of ramping. The current ramping is done inside a subroutine. I use my notifier to abort the current ramp and exit the subroutine if either the All Drivers Off button, or the Stop button, is pressed. My main VI also monitors the notifier, so it knows that the notifier is returning a "True"...i.e. an event has fired. It takes the same action as above...that is, exits the state it's in and queues the next state (which is always either Stop or All Drivers Off). If the next state is Stop, the code performs the shutdown actions and terminates. If it's All Drivers Off, the code turns all the drivers off, then generates a user event which queues up the Get Data state.

This also works perfectly.

HOWEVER. After this sequence of events has occured, I see some strange behavior the next time I try to fire an event. I have a probe on the notifier inside the Get Data loop, and I have a probe on the element output of the Dequeue Element VI in my consumer loop, so I can see when the states change. What I see is, the output of the Get Notifier Status flashes "True", then quickly flashes back to "False". The output of the Dequeue Element VI doesn't change, and the code does not exit the "Get Data" loop.

If I then fire another event, the notifier again goes "True", stays that way until the code sets it back to "False", and both events show up in the queue, in the correct order. Both execute. So clearly the first event got into the queue, but for some reason didn't get dequeued promptly at the other end. My feeling is that this is because the Get Data loop did not register the "True" value of the notifier...something set it back to "False" before it could be read...thus the loop just kept looping along happily, with no idea that it was supposed to have stopped. Firing another event causes the notifier to be "True" long enough for the loop to notice, so it quits and then the next event can be queued up and executed...and so on.

Interestingly, the NEXT event I fire under these conditions also exhibits this behavior -- that is, the notifier flashes "True", then sets itself back to "False" before the loop has registered it. And then the event after that one causes them both to fire, and on and on.

I cannot find where that notifier is being reset, and for pete's sake I've been over it with a fine-toothed comb. And why in the world does it only do this under a single condition? None of this happens unless I've used the All Drivers Off button to interrupt a current ramp in the subroutine. If I let the current ramp complete, and then press the All Drivers Off button, the code does not show this behavior. Nor does it show this behavior under any other circumstances.

Help?

Thanks,

d

Bob_Schor · ‎11-16-2007

My suggestion would be to get rid of the ramping subroutine. I recently made a state machine to run an experiment -- as part of this, I generate a low frequency sinusoid, and have a "warm-up" period where I slowly increase the amplitude of the sinusoid until I read Full Amplitude, and at the end, a similar "cool-off" period where I gradually decrease the amplitude to bring things smoothly to a halt. I have the following states: Initialize, RampUp, WaitForUser, Collect, RampDown, Idle, Abort, Stop. All of the "stimulus" states (everything from RampUp to Idle) have almost-identical code -- they all run once per clock tick, generate a single point of the sinusoid, check for error conditions or a Stop button, etc. [The "Abort" state also works this way -- it is simply a RampDown state with a very quick ramp, which then transitions to the Stop state that ends the State Machine]. The transition from state to state is generally very simple, and based on a single criterion. For example, if I'm ramping up at 1 deg/sec to an amplitude of 10 deg, I can compute that it will take me 10 seconds to do this, so I simply ask "Is t > 10?", and if yes, do the transition. The "WaitForUser" state simply is looking for a button press. If you were to look at the code for the various states, it is almost identical, with slightly different transition rules, and slightly different stimulus generation rules (i.e. ramping up or down). "Exposing" everything this way makes "predictable behavior" more likely, I think.

Bob Schor

Specter · ‎11-16-2007

Unfortunately, getting rid of the ramping routine isn't an option for me. These are high-current drivers connected to laser diodes which put out a few hundred watts each. Slamming the current "on" can result in failure of the diode -- and they aren't exactly cheap. Like it or not, this is functionality I have to provide.

Besides, I really don't think the problem lies within the ramping routine itself, as it appears to work just fine. The problem appears to be that something resets the notifier before the data loop has a chance to register that it went "True"...but only under the specific circumstance I described. The code works under all other circumstances. Notifiers are intended to provide communication between VIs so I don't think I'm using them improperly here.

Thanks for the reply!

johnsold · ‎11-16-2007

It sounds like you have a race condition. You indicate that it is being monitored in both the consumer loop and in the ramping subVI. Does the subVI reset it or only the state machine?

Perhaps if you change the notifier to have three values rather than two, you can avoid this. The first value corresponds with the False in your boolean notifier: continue with normal operation. The second value represents the transition to True: something has happened so stop what you are doing. The third value is "action pending" or something. It indicates that the notifier has been read, either in the ramp subVI or in the state machine, but that all the necessary actions have not yet been completed. Any part of the program which reads the notifier as "action pending" would determine whether it was in an appropriate state to manage the stop request or whether it had already done so. Reseting to Normal status could only be done by the part of the program which was able to confirm that all parts of the program had completed responding to the notifier.

I have not tried this with notifiers. With queues I use a pair for each independent pair of loops or subVIs in a program. One is a command queue and the other is a response queue. The response queue can handle data but also includes a status element which reports things like errors or poop stopped.

Lynn

Specter · ‎11-16-2007

The notifier is only set inside the main VI. The subVI monitors it, but never actually writes to it.

A race condition is possible, of course, but heck if I can find it. Here is the sequence as it happens, maybe you can see it?

1. Current is ramping. Main VI is in the Set Current state. SubVI is executing.

2. User presses All Diodes Off button. "True" notification is sent. All Diodes Off state is queued.

3. SubVI receives "True" notifier. SubVI terminates.

4. Set Current state also receives "True" notifier. Set Current state sets notifier back to "False", then terminates.

5. All Diodes Off state is dequeued. All Diodes Off state executes. All Diodes Off state generates a user event. Nothing is written to the notifier, so it's still returning "False" -- as it should.

6. User event fires. Get Data state is queued. Nothing is written to the notifier.

7. Get Data state is entered and the DAQ loop starts. The loop monitors the notifier, but does not write to it.

8. Another front panel event is fired (i.e. the user has pressed a button). "True" notification is sent.

Now, what is supposed to happen at this point is, the Get Notifier Status VI receives a "True" value and terminates the loop. As I mentioned, this is indeed what happens in all other circumstances except this particular one.

As I understand it -- and maybe this is where the problem lies -- a notifier can only hold one piece of data at a time, and that data is not consumed when it is read. Therefore, if the event structure in the producer loop writes a "True" value to the notifier, that notifier will return a "True" value until something else writes a "False" value to the notifier. Is that right?

The producer loop ONLY writes "True" values to the notifier. Therefore, if another event were to fire right after the first one, the value written to the notifier would still be "True".

We are inside the DAQ loop, which does not write to the notifier. Therefore it can't be setting that notifier to "False".

We never exit the DAQ loop -- that's the problem here -- so we aren't entering any other state in the consumer loop which could conceivably set the notifier to "False".

I can look into making the notifier data a three-state value instead of a two-state value, but I would still dearly love it if someone can explain what's going on here. Obviously I'm missing something!

johnsold · ‎11-16-2007

Maybe Cancel Notification is what you want rather than setting the notifier to False. Also much easier than setting up the three valued system I described earlier.

Posting your code makes it much easier than trying to follow the verbal descriptions. Plus, if your implementation does not exactly match your intent, someone will be able to spot that as well.

Lynn

Message Edited by johnsold on 11-16-2007 01:41 PM

Specter · ‎11-16-2007

Here is the code.

If I use "Cancel Notification" instead of writing a "False" value to the notifier, what does "Get Notifier Status" return? The default value of the data type (in this case, a "False" boolean)?

johnsold · ‎11-16-2007

If you use Wait on Notifier rather than Get Notifier Status, you could eliminate the waits in the loops where the function is called by using the timeout function. Use zero if no waiting is required. Check the Timed out line. If false, then evaluate the Notifier out. This will result in immediate response to the notifiers (no waiting for the wait functions to complete). I think you would not need to set the Notifier back to false. All parts of the program would get notified exactly once per notification with proper use of the Ignore Previous? input. This might take care of your mystery race condition.

The notifier references and the the DAQ task wires do not need to be passed via shift registers because they never change inside the loops.

I prefer to avoid loops inside loops for things like the DAQ calls. I take one set of readings, leave the state, and if nothing else has changed, come right back to the same state for another DAQ call. I find it easier to manage errors, delays, and commands from the user with just one loop. Of course little loops which calculate data in an array or set properties are OK. Is starting and stopping the DAQ tasks in each iteration of the loops necessary? Could you use continuous acquisition?

Updating the front panel from a subVI via control refs works, but can slow things down because of switching to the UI thread. I am not an expert in this, but if you search the archives, you will find several threads discussing the subject. I use an Action Engine or queue to pass the data from the subVI to the top level VI and update the panel at the top level.

Lynn

Specter · ‎11-16-2007

Well, I gave that a try (using the Wait on Notifier VI as opposed to the Get Notifier Status VI). Now the subroutine doesn't respond to the notifier. If I send the notifier while the subroutine is in the middle of its current ramp, it doesn't register. The subroutine finishes its ramp, then returns control to the main VI, which goes on to execute whichever state was queued up (in this case, All Drivers Off).

This may actually drive me crazy.

Good idea on the queue instead of control refs. That makes sense.

I am using continuous acquisition...I explicitly start the task, enter the loop, collect data, and then stop the task when the loop terminates. I had understood that it was better practice to explicitly start and stop tasks. I can't run the DAQ all the time, in a third loop, because one of the subroutines has to use those tasks to make sure the driver is doing what it's supposed to be doing.

Thanks a lot for your help...I take it by your referral to the "mystery race condition" that it wasn't immediately obvious to you either?

johnsold · ‎11-19-2007

I can't run the software so I really cannot tell what is happening. I would put a probe or a breakpoint into the subVI to see whether the notifier is detected there. If it is, then you need to find out why the subVI does not stop. Oh, the subroutine has two internal loops. The lower one (with the DAQ reads) does not monitor the Stop notifier, so it will continue to run.

Your statement: "I can't run the DAQ all the time, in a third loop, because one of the subroutines has to use those tasks to make sure the driver is doing what it's supposed to be doing." raises another concern. Do you have an effective mechanism to avoid contention for the DAQ resource? Semaphores are designed for this purpose. But more fundamentally, using the same task for different purposes in different parts of the program sounds like it could lead to problems down the road when something needs to be modified. I don't know enough about your setup to recommend an alternative.

Lynn

LabVIEW

peculiar behavior from event-driven queued state machine

peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine

Re: peculiar behavior from event-driven queued state machine