Unlock step sometimes throws -17500 error on terminate

pdobratz · ‎07-14-2005

TestStand 2.0.1

In our process model we have added some steps to ProcessModelPostResultListEntry (result recording has been disabled for this sequence). The first step of the Setup is to lock (and create it if it doesn't exist) a Lock named "Reporting File Write Lock". There are a few steps that write information to disk, and the last step of the Setup step group is to early unlock this Lock. This lock is not referenced any where else.

We use the Batch Model and there are up to 7 sockets at a time running. We have a button labeled "Terminate All" in our Operator Inferface that loops through the running test sockets and calls Execution.Terminate on each of them (but keeps the Batch execution running).

Sometimes when this "Terminate All" button is pressed we get a -17500 error on the Unlock step "Attempted to unlock a lock that is not locked by this thread."

The lock lifetime is "Same as Sequence" and there are no timeouts enabled. There are also no preconditions on the Lock and Unlock steps. There is a Goto that can cause the Unlock step not to be executed. However, it will be automatically unlocked after the Main and Cleanup of ProcessModelPostResultListEntry execute normally.

We are able to generate the above error by creating a simple sequence that unlocks a named lock that has been created, but not locked. However, we cannot see how terminating a sequence can cause that to happen. There is no execution path that we can see where the Unlock step will get executed without the Lock first being executed. Is there any other way to trigger the above error?

Thanks,
Peter

nandini · ‎07-15-2005

I was wondering if you are using Synchronized sections within the batch steps. If so, there are certain requirements for using the Enter and Exit opearations that must be followed. TestStand generates an error if the following requirements are not complied:
1. Each Exit operation must match the most nested Enter operation
2. A thread cannot reenter a section it is already within
3. You must exit a section in the same sequence in which you enter it
Please see if these guidelines are always followed.

Thank you
Nandini Subramaniam
NI

Scott_Richardson · ‎07-20-2005

Peter -
I cannot think of why this would occur. Clearly it appears that you can ignore this error for now, but it would be nice to know why. When this occurs, do you see the status of the steps above the Unlock in the listview of the execution? If yes, what is the status of the Lock step and did any after it execute. Can you look at the Locals.ResultList to see the step statuses?

Scott Richardson

pdobratz · ‎07-20-2005

Scott,

We managed to get a screenshot of our OI after this error occurs. The status of the Lock step is "Terminate". However, then execution appears to have proceeded simply to the next step instead of to Cleanup. All of the other steps in the sequence show their status as executed (Skip or Done) and then the Unlock step shows the mentioned Error. Normally the OI doesn't trace into ProcessModelPostResultListEntry, but it does on an error.

I've attached the mystifying screenshot.

--Peter

Scott_Richardson · ‎07-20-2005

Peter -

This makes more sense, but I do not know why the thread is still running. I will need to look at this further. Clearly the only workaround is to ignore the error. You could always add another step after the unlock to error if the error is not the one you want to mask.

Scott Richardson

Scott_Richardson · ‎07-20-2005

Peter -
I have been unable to reproduce the error. I have attached the automated sequence that I used to attempt to reproduce the problem. If you have time, please try it. Do you have any other ideas as to what is unique about your situation.

Scott Richardson

Captain_Jack · ‎07-21-2005

FWIW - I'm helping Peter attempt to isolate this issue, and there might be some additional information that could be affecting this issue:

1. Our Test Executive "Terminate All" is actually a looped "Terminate" of each socket independently and not the actual Terminate All command. We loop through all available sockets, determine if it is active and terminate it if so. This allows us to quickly terminate all sockets without also terminating the batch process.

2. The problem appears to be more reproducible if multiple terminates calls are made successively. I have gotten a similar error in the Cleanup of a non-Process Model sequence. It appears to be a timing issue, and appears to happen if several calls to terminate a socket are made in rapid succession, due to accidentally double clicking the terminate button, or because the operator wasn't sure it was properly received.

3. We are using TSSync.dll version 2.0.1.122, which was provided to us by NI when we found another unrelated issue with synchronization. See post <http://forums.ni.com/ni/board/message?board.id=330&message.id=3238#M3238> "Best way to have a Socket wait in Cleanup for all other Sockets to complete before running "Batch Cleanup" steps".

That's all that I know of to add to this issue.

-Jack

Message Edited by Capt. Jack on 07-21-2005 08:02 AM

Scott_Richardson · ‎07-21-2005

Peter and Jack -
The issue that you are seeing is related to the double termination of an execution. To prevent the double termination, you could only call Execution.Terminate if a call to Execution.GetStates(runState, termState) returns ExecTermState_Normal for the termState paramter.

Now why are you seeing this behavior with double terminate. The logic in the engine says that if a terminating sequence calls another sequence, the engine executes all the steps in the called sequence even though the caller is terminating. We do this to support ia sequence call step in a cleanup group.

In your case, when you terminate a step, the context for that step is marked as terminating. The engine now attempts to complete the execution of that step by finalizing the result. Since you are using a PostStepResultCallback, the engine calls the callback sequence. The engine plans to call all the steps in the callback sequence. Now while executing the Lock step, a second terminate occurs. The logic in the step sees the new termination and does not get the lock. The engine disregards the termination because the caller of the callback is already terminating, so it continues to execute all the steps in the callback.

We have to explore whether we should be honoring the second termination in the callback sequence or not. Some question whether we should support double termination in general, but it is already partially supported in the Execution.InitTerminationMonitor and Execution.GetTerminationMonitorStatus functions. If we make a change in this behavior, it will have to be in a future version.

Thanks for reporting this behavior to us. Hopefully now that you understand the issue better, you will feel comfortable with the workarounds that I suggested.

Scott Richardson

dug9000 · ‎07-22-2005

Just want to add that you might not be able to use GetStates to avoid a double termination because termination happens asynchronously so it might not have happened yet when you call GetStates. A more reliable way would be to keep track inside your own code whether or not you have already done a terminate and don't do another one if you have.

-Doug

NI TestStand

Unlock step sometimes throws -17500 error on terminate

Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate

Re: Unlock step sometimes throws -17500 error on terminate