From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

NI TestStand

cancel
Showing results for 
Search instead for 
Did you mean: 

17501

Hi,

 

sometimes, when starting a sequence which creates several local and remote executions and threads and a lot of synchronization objects, I get the following error message:

 

Error message -17501 : Error executing While step.

Falscher Parameter

One reason this can occur is if the interface of your COM server cannot be marshaled.... (quite a bit more OLE and threading apartment stuff).

Error accessing item 'EvaluatedConditionExpr'

 

The second line is German on purpose; it is a German Windows (or rather: an English Windows XP with German user interface), so I assume that this message comes from the operating system. It means "Wrong parameter" in English.

 

My current guess is that I am making some kind of mistake with locks when starting all those executions and threads. It happens more frequently, the more threads I use.

 

Any ideas would be appreciated.

 

Peter

 

0 Kudos
Message 1 of 26
(5,232 Views)

A couple of possibilities come to mind.

 

1) There might be a race condition in your code. Make sure you are protecting all shared data access with locks. Especially places in your code where you modify data structures dynamically and all of the places where such data structures are accessed to ensure that you aren't accessing a data structure from one thread while another thread is modifying it. TestStand does not automatically synchronize access to data whose structure is dynamically changing, thus things can get into a bad state if you are modifying a data structure (or resizing an array) in one thread while accessing it in another, leading to unpredictable errors. Generally, you will need to protect such access with lock steps or the lock step setting. If you have a specific question about thread synchronization please let us know.

 

2) This particular error might also be caused by getting a COM reference count in a bad state. For example, if you set a TestStand Object Reference variable with a COM object that has already had its final reference released, you might get an error like this. If you have a code module that is written in a language where you maintain the reference count yourself, you might want to make sure you are doing the adding and releasing references in a consistent and correct way. With multiple threads adding and releasing references to the same objects, the location of such an error might be even more unpredictable. Keep in mind the code that gives the error might not be the place where the reference count is being incorrectly maintained, it might have happened much earlier and is not noticed until the object's final reference is prematurely released.

 

Hope this helps,

-Doug

Message 2 of 26
(5,223 Views)

Sorry for the late answer, was offline for some time.

We have not yet solved this problem, but I think you put us on the right track, thanks a lot for the information. I did a lot of tests during these days with your comments in mind, and it seems to me that all the effects we notice have a common cause. I would like to clarify some issues on that.

 

Currently, it appears to me as follows:

  1. In our ProcessSetup, there is a loop starting several new executions -- either remote or local. It appears that the problem is at least more severe when the executions are NOT remote. This entire process is controlled by a configuration in a StationGlobal named ExecutionList (which is, in fact, an array with information, what sequence to start where).
  2. When such an execution starts, it runs through a Setup sequence which, e.g., initializes hardware. During this time, the next execution may be started by ProcessSetup.
  3. Now we have .NET module calling external library functions which may (and usually will) return dynamic results, i.e., we do not know beforehand the number of results. Using the hardware initialization example: the function may detect how many cameras are connected, and return a structure wih handles to the cameras. And here I think is the catch:the camera handles are needed in more than one execution, therefore, we have them in StationGlobals. So at that time, the size of an array in the StationGlobals is changed (while, as I said, ProcessSetup keeps on running, reading StationGlobals.ExecutionList to start the next execution). We observe the following effects (typically in more than 50% of the runs and in an intermittent fashion):
  • The 17501 error with a false parameter, either in a"while" or "if" step condition;
  • a .NET exception stating that there was an attempt to read or write protected memory -- which may point to other memory being corrupted;
  • missing StationGlobals (see my other post with that subject); this effect was more frequent when our StationCallback.seq also created some StationGlobals to control the error handling process during its initialization.


From these observations, I assume that all these problems are symptoms of the same cause: the concurrent manipulation and reading of StationGlobals in the main thread (ProcessSetup) and the new execution thread (within the same TestStand process), resp. in several new execution threads (then even in a Remote Engine). This leads me to the following questions:

  1. Does the assumption appear plausible that the observed errors are all due to this kind of race condition involving the dynamic manipulation of StationGlobals?
  2. Does a manipulation inside a particular container in the StationGlobals (say: changing an array size in StationGlobals.TestSocketList) affect other containers in the StationGlobals (say: reading the contents of the StationGlobals.ExecutionList array), since they are basically all subproperties of StationGlobals? This would explain while some obviously existing StationGlobal may not be readable and appear non-existant at some time during the process.
  3. Does the same hold within the FileGlobals container within its respective scope, i.e., do manipulations of FileGlobals within one thread of an execution affect other threads within the same execution? (I am assuming here that Locals are thread-local so that a sequence can safely manipulate its Locals).
  4. If so, are the Locals, FileGlobals and StationGlobals containers shielded against each other, e.g., can we safely manipulate data structures within FileGlobals while reading data in StationGlobals?
  5. Finally, are "by value" sequence parameter structures safe from these effects, i.e., fixed completely before passed to the callee and fixed completely before the caller switches to the step following the sequence call?

Thanks again for the valuable information.

 

Peter

 

0 Kudos
Message 3 of 26
(5,164 Views)

Answers to questions:

 

1) Yes, changing the structure of variables in one thread while trying to access them in another can lead to this sort of problem, but it's hard to know for sure that there aren't additional race conditions in your code unrelated to the station globals until you fix the race condition with the station globals. For example, your code modules themselves are being executed in parallel so if you have any global or shared data you are sharing between threads inside of your code modules you will need to synchronize access to that as well. Also if you are calling other code (i.e. drivers, etc.) in multiple threads, calls to that code either need to be synchronized or that code itself has to be threadsafe (designed to be called from multiple threads concurrently).

 

To fix the issue with your accessing of StationGlobals, you need to make sure that all of your threads aquire a lock (use the same lock name for all threads, you will need to use a machine name in the lock name in order to make it shared across machines if you are using remote executions that are accessing the same globals. TestStand locks support the creation of machinename specific locks that can be shared across multiple machines on your network (it uses DCOM similarly to how remote executions do)) before accessing or changing the data structure.

 

2)  Not necessarily. If you are adding or removing a variable directly from stationglobals (i.e. you are creating or removing a sibling property of StationGlobals.TestSocketList) then yes that could require a lock if you want to safely access other child properties of stationglobals at the same time, but if you are just modifying subproperties of TestSocketList or resizing TestSocketList (assuming it's an array) then that doesn't affect access to other properties of stationglobals. That said, even if the memory corruption is happening because of access to TestSocketList subproperties only, that memory corruption could still break other parts of the data structure (because memory locations unrelated to the data you are trying to access could be getting corrupted by writes to the wrong addresses). It's hard to know for sure whether there is one problem or multiple problems leading to the data corruptions, but they are all likely race conditions. I recommend protecting all access and modification of your station globals with a lock and see if that fixes the problem. If so you can then look into ways to minimize the amount of locking necessary if needed.

 

3) All variables that are shared between threads have the same issue. Locals are generally not shared between threads, but if you pass a local by reference to another thread (i.e. by passing one as a byreference parameter to an asynchronous sequence call) then those variables will have the same sort of issues and require synchronization for accessing them if you are modifying their underlying structure.

 

4) Modifying the underlying structure of a property (i.e. adding, removing, resizing properties) only effects the variable whose set of children is changing (i.e. because the variable's list of children gets updated) and those children themselves of course. Variables further up the tree are not effected because their internal data is not being modified. It's best to think about it as which properties am I both changing the children of and accessing those children in multiple concurrent threads, those are the accesses and modifications that must be protected with locks. Again, if you aren't changing the structure of the variables, simple property value gets and sets are protected as are things like StationGlobals.Myint++ in an expression, but if you are changing the structure (i.e. the subproperties or element properties) then you need to protect the access and modification with locks.

 

5) Yes, by value parameters are copies that are created in the original thread before the new thread is started and do not effect the original at all once they are created. Keep in mind though that Object Reference variables are always passing references even if you pass them by value so if you are storing something in Object Reference variables in your parameters those are not copies.

 

Hope this helps,

-Doug

Message 4 of 26
(5,141 Views)

First of all, thanks a lot for the comprehensive reply. We have used a lot of this information to good effect in our ongoing projects (which have kept us very busy, hence the long delay).

 

There is one thing, though, which we do not quite understand. We think that we have covered the potential race conditions and therefore I would like to know if someone else has ever observed such behavior.

 

We call an external DLL to initialize a hardware device. We close the device first, in case it is already open, the open it again. The first time, this works fine. When we do it a second time, it appears to work also. At one of the next decision steps, an if or while, TestStand runs into this -17501 error, apparently not knowing the condition variable. When we keep all interfaces identical and replace the device initialization with a corresponding Wait time, no error occurs.

 

So it seems, as if the actual execution of the hardware driver in some way affects the TestStand COM server. And I would like to know if something like this has been observed before.

 

Thanks

 

Peter

 

0 Kudos
Message 5 of 26
(4,974 Views)

Are you perhaps trying to close a handle that is uninitialized or not an actual handle? If so, that can likely lead to memory corruption which is possibly the problem you are seeing. It would be better to more exactly track whether or not you have an open handle and only closing it if you actually have it open. That said, I'm not familiar with the API you are using to open and close your hardware device, with more details about that I could possibly be more sure. If you can change your code to only close handles that you really have open, that might fix the problem though.

 

-Doug

0 Kudos
Message 6 of 26
(4,967 Views)

Hi,

 

just an intermediate update. We have modified our procedure to use a different driver (i.e., not an actual USB camera but a simulation, that loads an image file). The interface to TestStand, input-output-variables, handle management and timing (by using a Wait function) are identical -- or as close as possible in the case of timing. Then we do not see the error. As soon as a single actual camera is used, the error occurs predictably at the same position in the sequence and immediately.

The exact reproducibility leads us to the conclusion that it is not a race condition issue but really a problem of the driver itself.

So it is not a TestStand problem, it seems, but I will keep you posted anyway because if the driver vendor is able to pinpoint the problem in his code, this may be of help to other users.

 

Thanks

 

Peter

 

0 Kudos
Message 7 of 26
(4,909 Views)

It might be that the driver behaves badly if you close the device when you don't have it open. It might not be validating whether or not the device is actually open and accessing uninitialized data structures. It's probably best to avoid writing code that does things like close devices that aren't open because you never can know for sure whether a driver developer has taken that into account.

 

Anyway, thanks for keeping us up to date.

-Doug

0 Kudos
Message 8 of 26
(4,900 Views)

I have what sounds to be a similar problem.  The COM server TFTP application coded in Labview that has been built into an exe.  I have a utility sequence file that has sequences to create an instance, update the state of the server (through setting control values) and closing the server.

 

I have tied to allow the user to provide a variable to store the Object reference or the sequence will dynamically create a RunState.Root.Local variable to hold the reference for the current execution.  If the sequence is used in either manor (use creates their own Local variable or used the default flag) the code will occasionally error evaluating the precondition (which checks the Object reference against Nothing) of the GetVIRef step.  This does not occur on the open, however has been seen on both the update and close subsequences.    The application where this is being seen is not running multiple threads so there should be no race conditions to worry about.

 

Attached is the utility sequence I am using.  As the above post appeared to point to the driver and in this case the driver is Labview I do not really know where to go from here. 

0 Kudos
Message 9 of 26
(4,630 Views)

I had received this error -17501, with the error message:

 

"The parameter is incorrect.
One reason this can occur is if the interface of your COM server cannot be marshaled. This can happen if your server is not using the default OLE marshaling implementation and does not implement its own proxy and stub code. If you write your server using Visual C++ you can add the oleautomation attribute to your interface in order to use the default OLE marshaling implementation. Alternatively, COM does not require marshaling if the server's threading model is the same as the client thread's apartment. You can try changing your server's threading model or the client thread's apartment to avoid the need to marshal the interface.
Error accessing item 'EvaluatedConditionExpr'."

 

 

My application:

My main TestStand sequence file has two consecutive calls to the "Program Chip" sequence.  Within "Program Chip", there is a series of calls to a DLL that connects to the programming hardware, starts the programming, checks the status repeatedly, checks for any programming error, then closes the port to the programming hardware.   

 

My problem:

One run through the main sequence would cause no problems.  During the second run, after the first call to the "Program Chip" sequence, during the second call to the "Program Chip" sequence I would receive the error 17501 above.  If I ran the main sequence again, I'd see the same behavior.  

 

My solution:

One of the DLL calls was to a function "close_all_ports" which in the manual for the DLL was described as "This call closes all open Cyclone units (if any) and frees all dynamic memory used by the DLL. This function should be called before the user application is closed."  I called this function at the end of each "Program Chip" sequence.  Once I removed this DLL call, I never received error -17501 again.  I won't try and speculate why this function was causing me the problem, but I wanted to share my findings in case anyone else has the same problem that caused me to beat my head against a wall for a good two days...

0 Kudos
Message 10 of 26
(4,408 Views)