LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Error 66 with DataSockets

Hello LabVIEW community,
 
I have been getting some strange, unpredictable occurences of Error 66 (connection reset by peer) in my DataSocket system, and wonder if anyone has any ideas as to a cause/solution.
 
Here is my set-up: I have two computers connected together through a network hub, which is also connected to the Internet. Computer #1 is remotely accessed and controlled via VNC from computer #2. My data collection system consists of about 10 VI's on computer #1 and 4 on computer #2 all running simultaneously and communicating with one another (both computer to computer and within one computer) via DataSocket connections. The DataSocket server is running on computer #1, with computer #2 also allowed read/write permission -- typically, computer #2 is just reading and displaying information that has been written to DataSockets by computer #1; computer #1 handles the processor intensive data acquisition tasks. There are about 400 individual DataSocket data items, some of which are updated as fast as 10-20 times a second. Both machines are running LabVIEW 8 and Windows XP, and computer #1 is additionally connected to a PXI box.
 
This configuration generally performs very well. However, occasionally (on the order of once every few days, sometimes more often, sometimes less often) I'll get a string of Error 66's that ripple through all of the running VI's on computer #1. Computer #2 is often, but not always, unaffected. Error 56 (connection timeout) sometimes occurs on both/either computer as well, but one can usually recover from this by repeatedly pressing "Continue" on the LabVIEW error dialog box. Because I can't seem to reproduce these errors on command, I'm finding it difficult to troubleshoot. I switched all of the data items to "pre-defined", so they are always present on the DataSocket server, rather than being dynamically created and destroyed, but this hasn't solved the problem. I've also disabled programs on computer #1 such as firewall, antivirus scanner, automatic windows update, etc. in case they were conflicting with/slowing down the TCP/IP exchange, but again it has not seemed to help.
 
It's my understanding that error 66 happens when Windows sees that a TCP/IP connection is not being used, closes it, and then LabVIEW tries to send/receive data on the closed connection.
 
This is getting to be quite mysterious and frustrating, so any insights at all would be greatly appreciated.
 
Thank you very much,
 
Patrick
0 Kudos
Message 1 of 18
(6,078 Views)

Hi Patrick,

There is a known issue with McAfee where you will intermittently get, "Error 66 occured at Datasocket Write in DS Writer.vi. Network connection was closed by the peer" when using DataSocket to write and read data.  This occurs even if the internet connection is set to "allow all."

 I know you mentioned that you had tried turning off the firewall but I wanted to check in case you were using McAfee.

Hope this helps,
Megan B.
National Instruments

0 Kudos
Message 2 of 18
(6,059 Views)

Hi Megan,

Thanks for your reply. While I do suspect that something like what you pointed out is the cause of my problems, we don't have McAfee installed. The firewall we were using (now removed) was ZoneAlarm, and the antivirus software (now not resident in memory) was Symantec (Norton) Antivirus. I have now also removed the firewall, antivirus software, etc. from computer #2, but it's hard to tell if this has fixed the problem because of the occasional nature of these errors. There still seem to be periodic slow-downs of the DataSocket transfers within computer #1, so I suspect something on computer #1 is to blame.

Patrick

0 Kudos
Message 3 of 18
(6,049 Views)
Hi Patrick

Another possibility is that the random disconnects could be caused by the server dropping the subscription because the client is too busy to respond. The DataSocket server periodically polls its connected clients to see if they're still present and active. Your application uses a large number of items, so it seems likely that your client is doing a lot of work, so much so that the server does not get the acknowledgements back in time.

There is a registry setting that the server uses for this timeout. Here it is:

HKEY_CURRENT_USER\Software\National Instruments\ComponentWorks\DSS\CWDataServer\Options\CheckForTimeoutDivisor

The default value for this is 2. This key is not created by the installer. Increasing this value would increase the timeout value; you may have to experiment a bit to see what value works.

Hope this helps!
Megan B.
National Instruments

0 Kudos
Message 4 of 18
(6,042 Views)

Hi Megan,

I've been out of town for a few days; just got back and saw your message. I think your suggestion is a good one.

However, I'm not a Windows registry expert, so I need a little clarification:

-I've looked in the registry, and found "HKEY_CURRENT_USER\Software\National Instruments\ComponentWorks\DSS\CWDataServer\Options\" but it only has two values in it -- "(Default)" (of type REG_SZ, with it's value not set) and "ShowAtStartup" (of type REG_DWORD, with a hex value of 1). Nothing about "CheckForTimeoutDivisor" -- is this what you meant by the key not being created by the installer? So, should I create it?

-If I do need to create it, is it a key (ie. a subfolder within "\Options\") whose "(Default)" value I would set? Or is it simply another value within the "\Options\" key, and if so, which type (String, Binary, DWORD-hexadecimal, DWORD-decimal, etc.)?

-Do you have any details on the units of this setting? I.e. does the default of "2" correspond to a timeout of 2 minutes, or something more complicated (hinted at by the word "divisor")?

Thanks for your help,

Patrick

0 Kudos
Message 5 of 18
(6,036 Views)
The new key would be just like ShowAtStartup. Same type, with the value 2. You can create this key by right-clicking the right-pane (after browsing to HKEY_CURRENT_USER\Software\National Instruments\ComponentWorks\DSS\CWDataServer\Options), and selecting New->DWORD Value from the popup menu. Type in the name for the new key, CheckForTimeoutDivisor, and set its value by double-clicking it (decimal 2 or higher if you like).

The divisor is a value used by the server to figure how long the timeout for subscription is. It is not a time value, rather just a factor. If the above key is not present, the server assumes a value of 2. I believe a value of 2 gives you a timeout of 10 seconds.
Message 6 of 18
(6,027 Views)

Thanks for the help. I have added that registry value, and it has succeeded in extending the DataSocket timeout. However, this hasn't corrected the root of the problem -- another one of these errors occurred this morning while I was at the computer (= computer #1, which was the only one affected today), so I can now more fully describe the symptoms.

Here is the general timeline for one such attack:

#1. The computer, with many VI's open and running (and exchanging data via DataSockets), has been working continuously without trouble for hours or days at a time.

#2. Something mysterious happens, causing all of the running LabVIEW VI's to drastically slow down. Loops which normally take ~100ms for one iteration now take on the order of minutes.

#3. After some period of time (now made much longer by adding the abovementioned registry value), Error 56/66 appears in DataSocket read/write processes of all running VI's.

I had hoped that extending the timeout divisor via the registry would allow the LabVIEW to outlast phase 2, but unfortunately the mysterious event seems to last quite a long time. While in phase 2 this morning, I was able to poke around and find out some more information about the mystery event. During this phase the following are true:

-LabVIEW is quite slowed down, but not frozen completely. Loops still complete, but take minutes rather than hundreds of milliseconds.

-The rest of the computer is fine; not crashed or frozen at all. Other programs can be started and used without difficulty.

-LabVIEW's processor usage drops to close to 0%. When running my VI's normally, LabVIEW uses between 50% and 100% of the processor (depending on the specifics of what is being done). The error this morning happened when LabVIEW was using ~50% of the processor (without any other programs running), so I don't think it is necessarily processing-power related.

-No unknown applications/processes are hogging the cpu. "System Idle Process" is using ~98% of the processor, which means the cpu really is free. Nothing is blocking LabVIEW from using it.

-Memory usage is about the same as when things are running normally, and not near the limit of our physical RAM (1Gb). LabVIEW tends to be using about 100-150 Mb, and no other processes use anywhere close to that much.

-Network activity is low. In fact it is somewhat lower than during normal operation.

-The other computer on the network is not doing anything out of the ordinary (ie. it is not running Windows Update or anything like that, although I have purposefully run Windows update during normal operation before and it did not disrupt my VI's).

-The DataSocket server continues to send and receive data, but at a much slower rate.

Just about all of the above points can be explained as resulting from LabVIEW running much, much slower than usual, but they also rule out many potential root causes of the slowdown.

 

So I am rather bewildered about this. Any suggestions as to a cause, no matter how silly, would be appreciated.

Patrick

Message Edited by PMCR on 06-13-2006 02:58 PM

0 Kudos
Message 7 of 18
(6,001 Views)

Hi Patrick,

I'm hoping we haven't quite reached silly yet!  Are you using a dual processor machine?  I have heard that in some instances the DataSocket server can crash on these systems.  If you are using a dual processor machine, you could try disabling this feature (for the LabVIEW process or even in the BIOS).

Let us know,
Megan B.
National Instruments

0 Kudos
Message 8 of 18
(5,986 Views)

Hi Megan,

No we're not currently running a dual-processor machine, although we have been thinking of upgrading this computer to a dual-core model soon. As a side note, would you be able to point me to any documentation/details about this DataSocket Server-dual processor bug?

Patrick

0 Kudos
Message 9 of 18
(5,983 Views)

Hi Patrick,

We're still trying to sort out what might be happening on the dual processor machines.. it may be a separate factor.  With most dual-processor systems, you can change the processor affinity for processes and this seems to be a fairly good workaround.  If we get some clear answers about what is happening, I will certainly post them.

As for your application, it is very strange that LabVIEW is slowing down so significantly.  I'd like to try and determine if those loops are executing slowly because all LabVIEW processes have slowed down, or because the DataSocket VIs are taking a long time to execute (perhaps because of the increased timeout?).  Perhaps you could put in a second while loop with an indicator on the front panel just so when the slowdown happens, you can take a look and see if this second (empty) while loop is executing quickly or if it has also slowed down.

Also, I'd like to know if the errors are always originating at the same VI.  To check this, you could put in some additional error checking and perhaps add some popup messages which will alert you if the error chain begins and which VI started the whole thing.

Finally, after the slowdown happens, just out of curiousity perhaps you could check and see if a third machine is able to access the DataSocket server or if this machine has similar trouble.

Please let me know how this goes!!
Best Regards,
Megan B.
National Instruments

0 Kudos
Message 10 of 18
(5,957 Views)