"TCP Write" Timeout (error 56) seems to do not work properly

asper · ‎08-13-2012

Dear folk,
I have set up a STM (TCP/IP based) connection between a PXI controller (LabView Real-Time 2011, version 11.0) and a PC based supervisor.
For STM see: http://zone.ni.com/devzone/cda/epd/p/id/2739
The PXI controller runs the STM server and the PC runs the STM client.

As soon the STM client opens the connection to the server, the STM server starts to send data every 100ms.
The "TCP Write" and "TCP Read" timeout are set, by default, to 1000ms (=1s).

If I unplug the RJ45 cable on the PC side, the "TCP Write" on the STM server detects the connection timeout (error 56) only after an huge amount of time, about 3-4 minutes.
The "TCP Write"-timeout connector seems to do not have any affect, why?
I have set the "TCP Write"-timeout to different values, as 100ms, 10ms, etc. , without any apparent effect.

For to test this issue you can use the "STM Basic Server Example.vi" and "STM Basic Client Example.vi" in the shipped LabView examples.

Is the observed behavior expected or wrong? ...
and, there is an workaround to quickly detect the "TCP Write" timeout due to the unplugging of the RJ45 cable?

Asper

Mark_Yedinak · ‎08-13-2012

Unfortunately the timeout values only apply to handing the data off to the stack. If you were to use a LAN anayalzer and capture the data you will see that the stack is still trying to handle the connection. The hand-off of the data from LabVIEW to the stack is successful and the data is buffered at the stack level. Your long timeout is most likely th eamount of time it is taking you to fill the buffer. If you need to keep things truly synchronized you will need to include and application layer ACK in your communications.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

asper · ‎08-13-2012

Hi Mark
thanks for the replay.

I need to quickly detect the unplug RJ45 cable for security reasons. If the controller losses the connection with the supervisor, it has to set the alarm status and close the TCP connection.

Do you know if, with LabView Real-Time, can I modify the length of TCP Write stack?

Regards,
Asper

nathand · ‎08-13-2012

A standard way to handle this would be a watchdog timer with a short timeout. This would be a simple message sent at a regular interval between the PXI and supervisor. If no message is received within some amount of time, assume the connection died.

asper · ‎08-14-2012

Hi Mark,
searching for TCP timeout, I have found the following old KnowledgeBase:

"Do LabVIEW TCP Functions Use the Nagle Algorithm?"
http://digital.ni.com/public.nsf/websearch/7EFCA5D83B59DFDC86256D60007F5839?OpenDocument.

The Nagle algorithm can introduce some latency for an handshaking mechanism. Do you know how to disable it on LabView Real-Time?

The "TCP Close Connection" function flush the unsent data renaming in the TCP Write stack? ... Or, there is an explicit function to flush the TCP stack?

Regards,
Asper

Mark_Yedinak · ‎08-14-2012

Disabling the Nagle algorithm may help you to coordinate but the stack can still buffer data for you and you will not be able to detect this from your application. All data will be sent provided the connection is still valid. It will only discard the remaining data when the connection becomes invalid.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

asper · ‎08-14-2012

Hi Mark,

the linked KnowledgeBase works only for Windows OS, it links and uses a dll, and does not works for RealTime OS (PharLap).

Do you know how to implement the same functionality for PharLap Real-Time OS?

Regards,

Asper

Mark_Yedinak · ‎08-14-2012

No, I do not know how to accomplish that on a RT system.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

nathand · ‎08-14-2012

Disabling the Nagle algorithm will not help you detect a dropped connection any faster, and in any case if you search through the forum you'll see that many people have asked if it is possible to disable the Nagle algorithm on RT and so far no one has provided a way to do so.

rolfk · ‎08-14-2012

The Nagle algorithme is only about a delay used to send of data. This delay is typically 200ms or so, so it certainly isn't the issue here. Basically, Nagle tries to combine multiple short data packages into a single frame to reduce the network overhead, that comes with IP and TCP headers that each frame needs. So if you send 100 packages of 10 bytes each in quick succession, with Nagle enabled this will result in one frame with 1000 bytes data and about 40 bytes of IP and TCP headers. Without Nagle this will result in 100 frames of 10 bytes data and 40 bytes header information each, resulting in a brutto data transfer of around 5000 bytes. Nagle doesn't hold data indefinitely but will send all existing data after the Nagle interval anyhow, even if it is only a single byte.

So in general Nagle is a bandwith saver. However if you have for instance a command-response protocol where you send of a short message each time to request an answer, it can in the worst case require two Nagle timeouts before the data arrives, one for the sending of the command and one for the sending of the answer. So in those cases where you need to have quick succession of command/response messages, it is advisable to disable Naggle. However, this comes at the cost of a large overhead for the frame header information. A better approach would be to allow a combination of multiple commands into a single request, but that makes the protocol handling more involved.

For your connection break detection this does only make a difference if you need to detect this break in less than 200ms. But first you need to implement some scheme to be able to detect a breakage at all. One would be for instance to add a PING command to your protocol that does nothing but cause the receiver to respond back with a PING RESP package. To make it more robust you could add some auto-incrementing identifier to the PING command and maybe a timestamp. You would send a PING from your RT system to the host and somehow wait for the appropariate response. If the response with the correct identifier isn't received in a reasonable amount of time you must conclude that you lost connection. Set your alarm and disconnect your network connection and go back into the connection attempt mode. Once you get a successful connection again, you can clear the alarm.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

LabVIEW

"TCP Write" Timeout (error 56) seems to do not work properly

"TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly

Re: "TCP Write" Timeout (error 56) seems to do not work properly