I am assuming you are not seeing the problem with that process I uploaded?
I talked to the board president at the ag project and they will allow me to upload their process files if that would help. I looked them over and I have pruned many of the floats in favor of 16 bit registers using *100 and /100 to get 2 decimal points. You won't be able to see the logic which is solved in PLCs but it has a boatload of I/O.
Well, it happened again at my ag project last night even though Lookout had been relaunched the day before.
I want to re-clarify the com fail line in my first post though. In the case of the muni system I had asked if they had com fails to which the operator replied "yes". The comms had failed but they only got 1 alarm which was the site that went into the endless loop, not the ones that had stopped polling.
The failures that happened at the ag project were the same.... The only com fail alarm was the site that was polling endlessly, except last night where there were no alarms, just a 4,000 gpm pump that could not be accessd through Lookout. Fortunately the overflow from that failure went harmlessly back into the USBR canal.
I got the modbus settings from the ag project and they look like this on the wireless sites:
Timeout: 3000 msec
On the hardwired RS485 site:
Timeout: 3000 msec
So, the fastest polling in this system is 2 seconds, with 5 seconds on the hardwired site.
I am going to upload the lks files from that project. But since you aren't using any wireless modems, I would not be surprised if you won't be able to reproduce the problems since wireless modems impart a latency that you won't get with copper.
Oh yeah, the serial ports are set:
1000 bytes rx gap and 100 msec on both rts and cts.
The system is set up and running here.
I see that there are four sites, three on COM5, one on COM4. I guess the COM5 is through wireless modem, COM4 is hardwired. Is the modbus object on COM4 the only one that worked when the problem happened? I mean if all the modbus objects through wireless modem had problem while the one hardwired was good?
Could you consider the diagnostic file? In case it happens again(hope not), I can look into the communication before the problem happens, and the RTS/CTS signals. I suspect the problem is related to wireless too. The modbus driver maybe doesn't properly handle some corner issues, which cannot be noticed when hardwired.
" Is the modbus object on COM4 the only one that worked when the problem happened? I mean if all the modbus objects through wireless modem had problem while the one hardwired was good?"
No the hardwired connection was failing as well. It is not, again is not a "wireless problem".
I say this because what you do not see in the files you have is the other 6 wireless networks being controlled by SCADA Packs beyond this HMI layer using identical hardware and that part never has failed. Only the modbus comms in Lookout fail. We relaunch Lookout and we are good for another 24 hours. If we were having modem problems, which I know is not the case since we helped develop and beta the firmware for these modems, we would have to re-boot the modems. We never have to do that. The only fix we have found is relaunching Lookout. To prove that theory we had the district re-boot all modems when the failures happened on one occasion and that did nothing.... Zip. Relaunching Lookout is the only fix we have found
"Could you consider the diagnostic file? In case it happens again(hope not), I can look into the communication before the problem happens, and the RTS/CTS signals. I suspect the problem is related to wireless too. The modbus driver maybe doesn't properly handle some corner issues, which cannot be noticed when hardwired."
The CTS and RTS signals are only being used for inter-packet delay since we are using RS485 2 wire to the master radio. On the SCADA Pack end of things it is RS485 on one site and RS232 3 wire (Rxd, Txd and signal gnd) so all the diagnostic file will tell you is when the failure happens.
But, I will initiate a diagnostic file next time I am down there. I hope you are seeing this problem clearly now.
I haven't seen problem yet.
I notice that you said the receive gap is set to 1000 bytes. I'm not sure if it is necessary for your system, but if the receive gap is 1000 bytes, each transaction will need 1 sec, and then the whole polling cycle will be more than 3 minutes. So, if possible I suggest to reduce the receive gap, and increase the poll rate.
For example, if you set 100 bytes, and don't get error such as "response too short", the polling cycle will take about 10 sec, and you can set the poll rate to 15~20 sec. This will be better.
I suspect the problem is in the timer inside of Lookout. I suggest you first to try out the steps above.
We were at 240 bytes but tried 1000 bytes. Didn't help anyway so we went back to 240. I will have to confirm it but the time through the polling cycle was very long at 1000 bytes.
Even polling at 2 seconds on the wireless network we have about 4 minute cycles so it sounds like you may not be looking at all the I/O in that system.
Are you using 96008N1?
15 second polling is going to be a hard sell I think.
I will try to get specific times after we get back from the jobsite.
Yes, if you set 1000 bytes, the polling cycle will definitely be very long.
If the cycle is 4 minutes, the 2 seconds for pollrate will not take effect. No matter you set pollrate 2 sec or 100 sec, the same tag will be read or written once every 4 minutes.
During each poll, if the next poll time reaches but the current one is not finished, it will be pushed into a queue. As soon as the current poll finishes, the ones in the queue will be executed. I suspect that there is problem in it, but I'm not usre. So I suggest to increase the poll rate, in order to have poll rate longer than the cycle. Actually if the poll rate is shorter than poll cycle time, the cycle time will be the poll rate.
I'm using 96008N1.
That does sound like the issue here. I have other work to do at both of those systems and I will get the polling rate set so there is a, say 1 second delay after the last transaction from each PLC. I will program the polling failure alarms too.
I do agree that we are gaining nothing by piling up "future polls" in a queue.
If the problem comes back we will know whether or not it is a Lookout issue.