modbus tcp timeout

bkinard@kinardcontrolsyst · ‎06-04-2014

Brian T. Kinard
President - SCADA and Security Division
2110 Nance Street
Newberry, SC 29108

bkinard@kinardcontrolsyst · ‎06-04-2014

Brian T. Kinard
President - SCADA and Security Division
2110 Nance Street
Newberry, SC 29108

Ryan.S · ‎06-05-2014

Can you make another log on skip=1? I want to see if it makes any difference.

The retry happens on invalid response, such as garbled message. The comm failure leads to "skip".

Ryan Shi
National Instruments

bkinard@kinardcontrolsyst · ‎06-05-2014

New log files. All of the retries are set to 10. All of the timeouts are set to 7500ms. Changing the timeout period really hasnt affected it unless it was too low. The skip retries are set to 1. Although my communication statistics are nearly perfect, 98% or better, the attached picture is what I get due to the modbusTCP not retrying. Also, I changed the polling of each site to a sequencer to help with traffic but I get the same results.

Brian T. Kinard
President - SCADA and Security Division
2110 Nance Street
Newberry, SC 29108

Ryan.S · ‎06-20-2014

I'm able to reproduce this behaivor on my computer, and I probably know what is happening.

Firstly, this driver works in this way:

1. Retry only happens on bad response, such as garbled data. However, communication failure is an error. It doesn't retry on error.

2. When a poll starts,if the socket connection is bad and cannot reconnect, it will do nothing. This will be the communication failure.
3. On communication failure, the driver tries to reconnect when next poll happens(if you don't skip).

In your case, there are intermitent communication failure. You can find them in your log. This error will not trigger the retry, so it doesn't make difference no matter what you set to retry options.

Let's see your first log in last post, there are below messages

13:08:05.8 - RTU1 ->
[00]o[00][00][00][06][01][01][00]c[00][02]
13:08:08.5 - RTU1 <-
[00]o[00][00][00][04][01][01][01][00]
13:08:08.5 - RTU1 ->
[00]p[00][00][00][06][01][01][01][F3][00][01]
13:08:29.8 - RTU1 ->
Open socket: Waiting for connection to complete, 0x2733
13:08:30.7 - RTU1 ->
[00]q[00][00][00][06][01][01][00]c[00][02]
13:08:31.5 - RTU1 <-
[00]q[00][00][00][04][01][01][01][00]

I believe this is the beginning of each poll.

[00]q[00][00][00][06][01][01][00]c[00][02]

I don't know your poll rate, however I guess it's probably 12 seconds according to your log file.

This is the last good poll before the error

13:08:05.8 - RTU1 ->
[00]o[00][00][00][06][01][01][00]c[00][02]

And the next good one after error is 13:08:30, which is 2 polls later. It means, the connection was still bad on the second poll at 13:08:17, and when the third poll came, it found the connection became good, and then the poll iteration happened from 13:08:30.

So, if the disconnection recovers before the next poll, you will just lose part of data in current poll. If the disconnection doesn't recover before next poll, there will be another 12 seconds gap. If it doesn't recover before the third poll, another 12 seconds gap. Or, maybe it does recover but happens to disconnect again on next poll... It depends on the status when the poll triggers.

Looking at your hypertrend, the small gap is about 20~30 seconds. It makes sense according to the log. Although there are 20~30 seconds gap on hypertrend, there are actually 2 points lost.

This is my understanding to your problem. It seems to be expected behavior. The question is, why the disconnection happens so frenquently.

To set the "skip poll" to 0 maybe will make the gap smaller.

Ryan Shi
National Instruments

bkinard@kinardcontrolsyst · ‎06-23-2014

So, what i am seeing is socket errors rather than communication errors? If so, what are the chances of revising the driver to retry on socket errors as well as communication errors. Else everyone will have this same problem. Is there a way to clear the socket error before trying to communicate with the PLC? What causes the socket error?

Brian T. Kinard
President - SCADA and Security Division
2110 Nance Street
Newberry, SC 29108

Ryan.S · ‎06-24-2014

Yes, it's a socket error. The error code is 0x2733 - 10035.

WSAEWOULDBLOCK (10035) Resource temporarily unavailable.

I checked some documents, and find that this error is not severe, and the software can retry the same command after a while. One possible cause of this error is that the socket buffer is full on the other side so that the command cannot be sent to it. For example,

13:10:00.0 - RTU1 -> [00][A7][00][00][00][06][01][01][01][F3][00][01]

13:10:18.2 - RTU1 -> Open socket: Waiting for connection to complete, 0x2733

13:10:18.3 - RTU1 -> [00][A8][00][00][00][06][01][01][00]c[00][02]

Something wrong at 13:10:00.0 on the socket, and error returned on send command.

The driver in Lookout skips the current poll and retry on the next poll(or skip, depends on the settings). Almost all lookout built-in tcp drivers have this behavior on socket error.

I'm thinking if it's possible to programmatically triggers the poll after a socket error. It will be like the regular trigger, but just happens earlier. We can monitor the alarm messages and send a signal to "poll". But I don't know if it will works as expected.

Are there any other softwares/hardware communicating with the same device by Modbus protocol? I notice that, there is always a slow response before the socket error, for example
13:08:05.8 - RTU1 -> [00]o[00][00][00][06][01][01][00]c[00][02]

13:08:08.5 - RTU1 <- [00]o[00][00][00][04][01][01][01][00]

13:08:08.5 - RTU1 -> [00]p[00][00][00][06][01][01][01][F3][00][01]

13:08:29.8 - RTU1 -> Open socket: Waiting for connection to complete, 0x2733

13:08:30.7 - RTU1 -> [00]q[00][00][00][06][01][01][00]c[00][02]

There is 3 second delay on the response before the error. It happens before almost all errors. In other good polls, each response time is less than 1 second. It looks like the device is busy with something.

Ryan Shi
National Instruments

bkinard@kinardcontrolsyst · ‎06-24-2014

Nothing else is polling the PLCs. The only thing polling them is lookout. I'm using a sequencer to poll them one every 2 seconds.

Brian T. Kinard
President - SCADA and Security Division
2110 Nance Street
Newberry, SC 29108

Lookout

modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout

Re: modbus tcp timeout