Ethernet drpos connection.

FrankHS · ‎04-08-2020

I am using labview to communicate with an Ethernet IO rack. Occasionally, I briefly lose the Ethernet connection to the IO rack, which shuts down the process I am controlling. If the rack does not receive a message over Ethernet every 100 mS, the watchdog on the rack shuts everything down.

There are only two nodes on the Ethernet cable, the IO rack and the computer running a labview program. I am using the following NI vi’s. UDP Open, UDP Close, UDP Write and UDP Read. The computer runs windows 10 and the labview version is 15. The program is a built exe and runs 24 /7.

At first I opened the port, writing, and reading data and then closing the port each iteration. This worked except that it would occasionally lose connection. It would run OK for days then there would be a shutdown.

I know there are things like windows update that attempt use the Ethernet port. Is there any way I can guarantee an uninterrupted connection?

Then I tried opening the port when the program starts, and keeping it open until the program shuts down. There is a bizarre undesirable behavior from time to time that I have yet to understand.

Do you have any ideas of what I can do to solve this problem?

GerdW · ‎04-08-2020

Hi Frank,

@FrankHS wrote:

This worked except that it would occasionally lose connection.

I know there are things like windows update that attempt use the Ethernet port. Is there any way I can guarantee an uninterrupted connection?

There is no way to guarantee this type of communication within your 100ms watchdog timeout…

Yes, Windows can interfere with using the port for other communication, but in general you can have many open communication connections over the same Ethernet port.

@FrankHS wrote:

Then I tried opening the port when the program starts, and keeping it open until the program shuts down. There is a bizarre undesirable behavior from time to time that I have yet to understand.

What kind of "bizarre"? Do you get any errors from your LabVIEW program?

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

altenbach · ‎04-08-2020

Your entire problem description is insufficient to troubleshoot, and some of your comments are "bizarre" in their own right.

Who wrote the code running on each side? Are both written by you in LabVIEW or is the other device a "black box"?

Why is the watchdog so picky? Can you make the watchdog timer much longer for testing, then do a statistical analysis of the actual idle times? If both sides are LabVIEW, can you show us some code? Is there anything that potentially slows down the communication loops? What else are the programs doing?

Maybe it's a hardware error. Have you tried a different cable? Blow some compressed air into the connectors to clean up some dust? Did you try to update the ethernet drivers for your hardware? Do you connect with a dedicated ethernet port and crossover cable or does any of this travel through shared external hardware (routers, switches, etc.)

If this is a time critical process, you should not be running on windows anyway, at least turn off updates, virus scans, etc..

You say you are using UDP read and write. Does that mean that the communication is two-way? Or does one side only write and the other side only read? UDP is connection-less and there is absolutely no guarantee that any particular message makes it to the other side. How big is each transmission in bytes? What is the configured sending rate? Do you use a specific local port or an ephemeral port?

LabVIEW Champion.

Bathank · ‎04-09-2020

Opening Port and Closing port for each iteration is not a good practice.

As you told that opening port and closing the port at the program shut down is "bizarre undesirable behavior", i think this is the place you should work on. I am thinking timeouts handling are not done properly at this point.

It would be really great if you can add your code, So that better solution can be given!

rolfk · ‎04-10-2020

The choice to use a watchdog in combination with UDP communication is aleady bizarre in its own. Making that a 100ms watchdog would be even with TCP/IP considered bizarre.

A device that shuts down after it hasn't received a network message within 100ms? What sort of atomic nuclear control device is that?

You might be able to guarantee 100ms repetitive messages from a real-time controlled device but never from something like Windows or any other desktop operating system for that matter. If your device doesn't allow for message dropouts from the remote side for more than several seconds it is doomed to run into shutdown situations eventually. Especially when using UDP messages as there is no way to guarantee a UDP message is ever arriving nor even detecting that it didn't arrive other than implementing your own application level acknowledge messaging on top of UDP.

Also does your device really need to shutdown completely after a timeout? Wouldn't it be smarter to return to a default operating state that allows it to continue to receive new messages and react to them?

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

FrankHS · ‎04-11-2020

Thank you for the reply

I wrote the labview code and the other device is a black box. The ethernet is is connected directly from the computer to the black box, an automation direct Ethernet base controller. There are no routers etc.

I have changed the watchdog timer to 1 second. Do you think that one second would be a good choice for Ethernet? The process is not time critical, though it needs to shut down if there is no communication for more than 5 seconds. I would prefer to keep this down to a second or two.

The communication is 2 way. I send a packet of data to the controller and it replies. The replies are checked for a checksum. The false case is not being used now but it closes and opens and closes the Ethernet port. This runs in a loop that runs at 25 mS.

I just added code that will let me know if the loop time exceeds 50 mS and to capture all IO errors and log them to disk in the hope of identifying this problem.

When I said that there was bizarre behavior I meant that with the limited information available, I was unable to explain it or even describe it well. This only happened when the port was opened at program start and closed at exit. The cause was when an error occurs, there was a reset function that closed and opened the port. The reset function added a 1 second delay what caused the watchdog to trip. This has been removed.

The program usually runs for a few days without issues. But every few days there is a shutdown.

altenbach · ‎04-11-2020

Thanks for the details. We don't work well with pictures, though. Too many ambiguities. an actually VI would be significantly more helpful.

Do you clear timeout errors somewhere?
Does the remote device require a specific local port (62140) or will it reply to whatever the source port was?
Is your max size (548) based on some logic or just a default? While this prevents fragmentation on very low MTU links (rare these days!), you probably are way above that.
What exactly defines your loop rate?
Why do you have all that duplicate code?
Shouldn't your outer loop be a FOR loop (with conditional terminal)?
Why don't you build the blue array before the loop? (Well, the compiler will most likely fold it, but still...)
The array sizes never change, so why do you need to constantly measure the sizes inside the loop?
...

LabVIEW Champion.

FrankHS · ‎04-11-2020

I don't actually need the 100 mS watchdog. I just changed it to 1 second. What do you think it the minimum that I can set the watchdog for using ethernet and having nothing on the cable except the controlling computer and the IO Rack? The communication uses a CRC checksum to guarantee that the data is correct. See my last message for the block diagram and more info.

The device doesn't really shut down. The IO resets and continues. If there is an IO error that cant be corrected by retrying, the process is stopped until an operator resets it.

FrankHS · ‎04-12-2020

Do you clear timeout errors somewhere?
Originally the port was opened each iteration, clearing the error. That changed in this version. Now if there is an error the true case is called which reinitializes the port.
Does the remote device require a specific local port (62140) or will it reply to whatever the source port was?
I’m not sure. I think it was in some documentation I saw. I never tried any other port.
Is your max size (548) based on some logic or just a default? While this prevents fragmentation on very low MTU links (rare these days!), you probably are way above that.
This was a default from the labview vi, UDP read. From the LV help: max size is the maximum number of bytes to read. The default is 548. (Windows) If you wire a value other than 548 to this input, Windows might return an error because the function cannot read fewer bytes than are in a packet. Isn’t the maximum size of a packet 96 bytes? The packets I send are much smaller, 21 or 35 bytes.
What exactly defines your loop rate?
The main program runs 4 parallel loops, UI, Control, DAQ and one other for user pop ups. The IO runs in the DAQ loop. After the IO function Runs, the IO is decoded and made available. After that, there is a timing vi, wait for loop time, which delays until 25 mS is reached. I probably could have accomplished the same thing by dropping a 25 mS wait into the loop. I recently (Saturday) added some code that will let me know every time the loop time goes above 50 mS.
Why do you have all that duplicate code?
Do you mean the write - read sequences? There are three of these. Each one writes to a module on the IO rack. First we generate the hex sting to write, and then write it to the Ethernet port. When we read it, there is a reply back from the rack controller It is always that 96 byte string which contains the state of all inputs and outputs. If my string is corrupted, the controller won’t send the 96 byte string.
Shouldn't your outer loop be a FOR loop (with conditional terminal)?
Good idea. Done.
Why don't you build the blue array before the loop? (Well, the compiler will most likely fold it, but still...)
Good idea. Done
The array sizes never change, so why do you need to constantly measure the sizes inside the loop?
Good idea. Done. Also if the array sizes were not correct the data should not be sent at all and a shutdown should happen.

Originally the loop only retried twice.

Do you know of any rule of thumb for how many seconds the watchdog setting should be?

rolfk · ‎04-12-2020

@FrankHS wrote:

Do you know of any rule of thumb for how many seconds the watchdog setting should be?

As with most things, the answer is: it depends!

1s is usually enough to let a Windows process respond in time but there is absolutely no guarantee. Windows is not a real-time operating system and therefore you can make absolutely no guarantees about how long a process might get suspended because Windows decides to start a virus scan or something, or the network communication might get suspended because Windows decides the network stack or network interface needs to be reset.

The solution is usually not to depend on the Windows application to never fail to send a message within a specific time, but to configure/program the device to not fall into an unrecoverable state after it detected that there was a certain amount of inactivity. Your device should at most change critical outputs into a failsafe state and then return to an idle state that the remote application can detect, in order to reconfigure the device and return to full operation if that is safe to do.

So you may actually have to modify your Windows program to actually query periodically if the device decided to go into failsafe shutdown and then reconfigure it to return to normal operation mode. That is robust programming, not trying to prevent the system to never fail to send messages to the device within a certain amount of time. Of course this all might be different if your device controls something that might cause physical harm to a person or if it could cause your factory to blow up. Then other standards apply and you should definitely consult an expert in these matters to look at your system and probably seek certification of it.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

LabVIEW

Ethernet drpos connection.

Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.

Re: Ethernet drpos connection.