From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.
We appreciate your patience as we improve our online experience.
From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.
We appreciate your patience as we improve our online experience.
01-30-2010 04:43 AM
Hi,
sometimes, when starting a sequence which creates several local and remote executions and threads and a lot of synchronization objects, I get the following error message:
Error message -17501 : Error executing While step.
Falscher Parameter
One reason this can occur is if the interface of your COM server cannot be marshaled.... (quite a bit more OLE and threading apartment stuff).
Error accessing item 'EvaluatedConditionExpr'
The second line is German on purpose; it is a German Windows (or rather: an English Windows XP with German user interface), so I assume that this message comes from the operating system. It means "Wrong parameter" in English.
My current guess is that I am making some kind of mistake with locks when starting all those executions and threads. It happens more frequently, the more threads I use.
Any ideas would be appreciated.
Peter
01-30-2010 11:25 AM
A couple of possibilities come to mind.
1) There might be a race condition in your code. Make sure you are protecting all shared data access with locks. Especially places in your code where you modify data structures dynamically and all of the places where such data structures are accessed to ensure that you aren't accessing a data structure from one thread while another thread is modifying it. TestStand does not automatically synchronize access to data whose structure is dynamically changing, thus things can get into a bad state if you are modifying a data structure (or resizing an array) in one thread while accessing it in another, leading to unpredictable errors. Generally, you will need to protect such access with lock steps or the lock step setting. If you have a specific question about thread synchronization please let us know.
2) This particular error might also be caused by getting a COM reference count in a bad state. For example, if you set a TestStand Object Reference variable with a COM object that has already had its final reference released, you might get an error like this. If you have a code module that is written in a language where you maintain the reference count yourself, you might want to make sure you are doing the adding and releasing references in a consistent and correct way. With multiple threads adding and releasing references to the same objects, the location of such an error might be even more unpredictable. Keep in mind the code that gives the error might not be the place where the reference count is being incorrectly maintained, it might have happened much earlier and is not noticed until the object's final reference is prematurely released.
Hope this helps,
-Doug
02-07-2010 11:32 AM
Sorry for the late answer, was offline for some time.
We have not yet solved this problem, but I think you put us on the right track, thanks a lot for the information. I did a lot of tests during these days with your comments in mind, and it seems to me that all the effects we notice have a common cause. I would like to clarify some issues on that.
Currently, it appears to me as follows:
From these observations, I assume that all these problems are symptoms of the same cause: the concurrent manipulation and reading of StationGlobals in the main thread (ProcessSetup) and the new execution thread (within the same TestStand process), resp. in several new execution threads (then even in a Remote Engine). This leads me to the following questions:
Thanks again for the valuable information.
Peter
02-08-2010 11:19 AM
Answers to questions:
1) Yes, changing the structure of variables in one thread while trying to access them in another can lead to this sort of problem, but it's hard to know for sure that there aren't additional race conditions in your code unrelated to the station globals until you fix the race condition with the station globals. For example, your code modules themselves are being executed in parallel so if you have any global or shared data you are sharing between threads inside of your code modules you will need to synchronize access to that as well. Also if you are calling other code (i.e. drivers, etc.) in multiple threads, calls to that code either need to be synchronized or that code itself has to be threadsafe (designed to be called from multiple threads concurrently).
To fix the issue with your accessing of StationGlobals, you need to make sure that all of your threads aquire a lock (use the same lock name for all threads, you will need to use a machine name in the lock name in order to make it shared across machines if you are using remote executions that are accessing the same globals. TestStand locks support the creation of machinename specific locks that can be shared across multiple machines on your network (it uses DCOM similarly to how remote executions do)) before accessing or changing the data structure.
2) Not necessarily. If you are adding or removing a variable directly from stationglobals (i.e. you are creating or removing a sibling property of StationGlobals.TestSocketList) then yes that could require a lock if you want to safely access other child properties of stationglobals at the same time, but if you are just modifying subproperties of TestSocketList or resizing TestSocketList (assuming it's an array) then that doesn't affect access to other properties of stationglobals. That said, even if the memory corruption is happening because of access to TestSocketList subproperties only, that memory corruption could still break other parts of the data structure (because memory locations unrelated to the data you are trying to access could be getting corrupted by writes to the wrong addresses). It's hard to know for sure whether there is one problem or multiple problems leading to the data corruptions, but they are all likely race conditions. I recommend protecting all access and modification of your station globals with a lock and see if that fixes the problem. If so you can then look into ways to minimize the amount of locking necessary if needed.
3) All variables that are shared between threads have the same issue. Locals are generally not shared between threads, but if you pass a local by reference to another thread (i.e. by passing one as a byreference parameter to an asynchronous sequence call) then those variables will have the same sort of issues and require synchronization for accessing them if you are modifying their underlying structure.
4) Modifying the underlying structure of a property (i.e. adding, removing, resizing properties) only effects the variable whose set of children is changing (i.e. because the variable's list of children gets updated) and those children themselves of course. Variables further up the tree are not effected because their internal data is not being modified. It's best to think about it as which properties am I both changing the children of and accessing those children in multiple concurrent threads, those are the accesses and modifications that must be protected with locks. Again, if you aren't changing the structure of the variables, simple property value gets and sets are protected as are things like StationGlobals.Myint++ in an expression, but if you are changing the structure (i.e. the subproperties or element properties) then you need to protect the access and modification with locks.
5) Yes, by value parameters are copies that are created in the original thread before the new thread is started and do not effect the original at all once they are created. Keep in mind though that Object Reference variables are always passing references even if you pass them by value so if you are storing something in Object Reference variables in your parameters those are not copies.
Hope this helps,
-Doug
04-28-2010 09:06 AM
First of all, thanks a lot for the comprehensive reply. We have used a lot of this information to good effect in our ongoing projects (which have kept us very busy, hence the long delay).
There is one thing, though, which we do not quite understand. We think that we have covered the potential race conditions and therefore I would like to know if someone else has ever observed such behavior.
We call an external DLL to initialize a hardware device. We close the device first, in case it is already open, the open it again. The first time, this works fine. When we do it a second time, it appears to work also. At one of the next decision steps, an if or while, TestStand runs into this -17501 error, apparently not knowing the condition variable. When we keep all interfaces identical and replace the device initialization with a corresponding Wait time, no error occurs.
So it seems, as if the actual execution of the hardware driver in some way affects the TestStand COM server. And I would like to know if something like this has been observed before.
Thanks
Peter
04-28-2010 11:56 AM
Are you perhaps trying to close a handle that is uninitialized or not an actual handle? If so, that can likely lead to memory corruption which is possibly the problem you are seeing. It would be better to more exactly track whether or not you have an open handle and only closing it if you actually have it open. That said, I'm not familiar with the API you are using to open and close your hardware device, with more details about that I could possibly be more sure. If you can change your code to only close handles that you really have open, that might fix the problem though.
-Doug
05-03-2010 06:50 AM
Hi,
just an intermediate update. We have modified our procedure to use a different driver (i.e., not an actual USB camera but a simulation, that loads an image file). The interface to TestStand, input-output-variables, handle management and timing (by using a Wait function) are identical -- or as close as possible in the case of timing. Then we do not see the error. As soon as a single actual camera is used, the error occurs predictably at the same position in the sequence and immediately.
The exact reproducibility leads us to the conclusion that it is not a race condition issue but really a problem of the driver itself.
So it is not a TestStand problem, it seems, but I will keep you posted anyway because if the driver vendor is able to pinpoint the problem in his code, this may be of help to other users.
Thanks
Peter
05-03-2010 09:45 AM
It might be that the driver behaves badly if you close the device when you don't have it open. It might not be validating whether or not the device is actually open and accessing uninitialized data structures. It's probably best to avoid writing code that does things like close devices that aren't open because you never can know for sure whether a driver developer has taken that into account.
Anyway, thanks for keeping us up to date.
-Doug
11-01-2010 08:14 AM
I have what sounds to be a similar problem. The COM server TFTP application coded in Labview that has been built into an exe. I have a utility sequence file that has sequences to create an instance, update the state of the server (through setting control values) and closing the server.
I have tied to allow the user to provide a variable to store the Object reference or the sequence will dynamically create a RunState.Root.Local variable to hold the reference for the current execution. If the sequence is used in either manor (use creates their own Local variable or used the default flag) the code will occasionally error evaluating the precondition (which checks the Object reference against Nothing) of the GetVIRef step. This does not occur on the open, however has been seen on both the update and close subsequences. The application where this is being seen is not running multiple threads so there should be no race conditions to worry about.
Attached is the utility sequence I am using. As the above post appeared to point to the driver and in this case the driver is Labview I do not really know where to go from here.
01-24-2011 07:01 PM
I had received this error -17501, with the error message:
My application:
My main TestStand sequence file has two consecutive calls to the "Program Chip" sequence. Within "Program Chip", there is a series of calls to a DLL that connects to the programming hardware, starts the programming, checks the status repeatedly, checks for any programming error, then closes the port to the programming hardware.
My problem:
One run through the main sequence would cause no problems. During the second run, after the first call to the "Program Chip" sequence, during the second call to the "Program Chip" sequence I would receive the error 17501 above. If I ran the main sequence again, I'd see the same behavior.
My solution:
One of the DLL calls was to a function "close_all_ports" which in the manual for the DLL was described as "This call closes all open Cyclone units (if any) and frees all dynamic memory used by the DLL. This function should be called before the user application is closed." I called this function at the end of each "Program Chip" sequence. Once I removed this DLL call, I never received error -17501 again. I won't try and speculate why this function was causing me the problem, but I wanted to share my findings in case anyone else has the same problem that caused me to beat my head against a wall for a good two days...