From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

NI TestStand

cancel
Showing results for 
Search instead for 
Did you mean: 

17500 error, synchronization manager connection lost

Hello,

 

on a system using two PCs, one with TS Operator Interface, one with RemoteEngine, I got the following message:

 

    "-17500: Lost connection to the synchronization manager on the remote machine. TestStand will attempt to reconnect ..."

 

Shortly afterwards our sequences do not react anymore, on the local machine as well as on the remote machine. Our log files do not show anything unusual before the problem, no crashes of DLLs etc.

 

Are there typical situations where this error occurs?

 

Regards

 

Peter

0 Kudos
Message 1 of 12
(3,596 Views)

Hello,

 

when occures the error?

At the beginning of your application or after a few minutes?

 

Greez

Basti

0 Kudos
Message 2 of 12
(3,563 Views)

More like hours of operation. And we do not seem to have a memory leak or the like.

 

Peter

0 Kudos
Message 3 of 12
(3,558 Views)

Hello,

 

it would be helpfull to become Informations about your systems.

Please post a Max-Report of your two PC´s and check out that the Firewall is down.

(If you wont to post the Max-Reports public, you could send me the Reports privat)

 

Greez

Basti

--> I work for KUDOS <--
0 Kudos
Message 4 of 12
(3,546 Views)

Does it happen consistently and reproducible? or is it more random? Are there perhaps intermittent network connectivity problems on your local network?

 

-Doug

0 Kudos
Message 5 of 12
(3,542 Views)

The error occurred the first time when the system entered full production, i.e. was running continuously under full load (not meaning 100% CPU but the full planned production cycle time). Since then the error occurs frequently but not reproducibly, i.e. something like 1-3 times a day.

 

As for MAX reports: I would not have a problem with making them public, but there are no MAXes on the systems as the only specific hardware used are USB cameras.

 

Doug: I am fairly certain by now (and we will test it in our next project) that the 17501 error we had was caused by the OLE initialization in the driver of the USB camera. Could this be a similar issue?

 

Regards,

 

Peter

0 Kudos
Message 6 of 12
(3,533 Views)

Remote synchronization does use DCOM. If a thread was prematurely uninitialized for COM that could cause problems. I'd think it would happen more frequently though. Do you know exactly what the driver is doing incorrectly? If so we can probably come up with a workaround and see if that helps.

 

-Doug

0 Kudos
Message 7 of 12
(3,523 Views)

Sorry for the delay, but now I do have some new information.

 

Just for clarification, we have two PCs here which I will call controller (this is where the TS Operator Interface runs, where the remote sequence calls are made and where the queues and notifications for communicating with the remote execution are created) and worker (where only a REngine.exe is running).

 

The error is displayed on the worker, not, as I previously thought, on the controller. Is it possible to deduce from there, where the error actually occurs?

 

Most important is that the error always occurs after a production pause. When the system is idle (i.e. does not receive a start from the control system) for more than 10 minutes, it will have that error on the next start. During idling the controller polls the control system for a start signal, while the worker stands in a Dequeue step waiting for a command from the controller.

 

Hope that leads to some ideas.

 

Regards

 

Peter

 

 

0 Kudos
Message 8 of 12
(3,490 Views)

What exactly is the controller doing? Is it processing window messages? Is this some sort of custom UI?

 

Can you reproduce the problem consistently if you wait for the 10 minutes?

 

What do you mean by, " it will have that error on the next start". What gets started? I thought you said the worker was waiting on a queue (thus already started). Is "start" a command it gets from the queue?

 

It's possible there is a network timeout happening somewhere. This is only a guess, but perhaps making sure your controller is processing window messages would help avoid the problem.

 

-Doug

0 Kudos
Message 9 of 12
(3,481 Views)

Sorry for the confusion.

 

The controller is the standard NI MFC Operator Interface. When the station is idle, it runs a sequence which periodically (approx. every 50ms) checks a digital input signal from the PLC.

"on the next start" means exactly this: the digital input is set by the PLC, the controller detects this and enqueues a structure which tells the worker what to do. The worker is already started.

 

Yes, the problem can be reproduced consistently if we wait for 10 minutes.

 

I will check whether window messages are processed. Thank you for the idea.

 

Peter

0 Kudos
Message 10 of 12
(3,476 Views)