TCP connection break / re-connect issue

CoastalMaineBird · ‎12-01-2014

Yes, and the fact that between the time you check the refnum and the time your case structure executes, the refnum could have gone invalid.

That's true, but I don't see it as relevant to this problem. If that happened, I would then try to read from the invalid refnum, immediately return an error, set the refnum to NaN and then attempt to reconnect.

The only "race" problem is apparently on the OpSys on the other end. Whether the ABORT gets thru the system and turns off the listener before the host recognizes the connection died and attempts to reconnect.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

Oligarlicky · ‎12-01-2014

@CoastalMaineBird wrote:

Not a refnum just tells you if you have a valid refnum, it doesn't tell you you have a valid connection.

Sure, I understand that. What I didn't know was that the test for Not-a-RefNum is NOT a simple test for zero, but a search thru some table somewhere.

It doesn't pertain to my orginal problem but it's interesting to know, nonetheless.

It's pertinent because you are using the Not A Refnum to tell you connect gave you a usable connection. You should do a read after the connect and wire the error from the read to the case structure that fires off the connected event. Ideally, in the server, you'd fire off a "Welcome" message upon connection so that the client would know the connection is usable.

CoastalMaineBird · ‎12-02-2014

You should do a read after the connect and wire the error from the read to the case structure that fires off the connected event.

I tried exactly that

Ideally, in the server, you'd fire off a "Welcome" message upon connection so that the client would know the connection is usable.

If I did that, and then only claim connection when the welcome message was received, then it would avoid the issue, because there's no way the other end sends a message after being aborted. However, my architecture is such that the host sends a command, and the PXI sends ONLY in response.

However, the first thing I send is a VERSION? command. I could postpone connection events until after the response is received, I guess. But then I would have to SEND into a connection that I don't know about yet (because it hasn't CONNECTed, officially, yet).

No, it still seems like the best thing is to wait some fixed time. I don't know EXACTLY how long, but it's not important to minimize it.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

CoastalMaineBird · ‎12-02-2014

You should do a read after the connect and wire the error from the read to the case structure that fires off the connected event.

I tried exactly that, and nothing changed, nor would I expect it to.

I put a zero-byte read (which still checks the health of the connection) immediately after the connect, and check the ERROR output. No behavioral difference.

Ideally, in the server, you'd fire off a "Welcome" message upon connection so that the client would know the connection is usable.

If I did that, and then only claim connection when the welcome message was received, then it would avoid the issue, because there's no way the other end sends a message after being aborted. However, my architecture is such that the host sends a command, and the PXI sends ONLY in response.

However, the first thing I send is a VERSION? command. I could postpone connection events until after the response is received, I guess. But then I would have to SEND into a connection that I don't know about yet (because it hasn't CONNECTed, officially, yet).

No, it still seems like the best thing is to wait some fixed time. I don't know EXACTLY how long, but it's not important to minimize it.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

rolfk · ‎12-02-2014

@CoastalMaineBird wrote:

You should do a read after the connect and wire the error from the read to the case structure that fires off the connected event.

I tried exactly that, and nothing changed, nor would I expect it to.

I put a zero-byte read (which still checks the health of the connection) immediately after the connect, and check the ERROR output. No behavioral difference.

The zero byte read at best checks the refnum but doesn't involve any socket transaction that would physically check the actual connection. Even if it did that the TCP ACK and SYNC could still be handled by a lingering socket on the remote side before it gets properly garbage collected after the process abort.

Rolf Kalbermatter
My Blog

CoastalMaineBird · ‎12-02-2014

Even if it did that the TCP ACK and SYNC could still be handled by a lingering socket on the remote side before it gets properly garbage collected after the process abort.

In which case, the proper thing to do is still to wait for the cleanup crew, isn't it?

The more I think about this, everything I've seen seems to point to the same thing - the ABORT process has TWO things to do:

A... Kill the existing connection.

B... Kill the TCP Listener

And it cannot do them exactly simultaneously.

If it did B before A, then the symptoms I see would not appear.

If it does A before B, then there is a window of opportunity for the host to connect again, and that is seemingly what is happening.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

rolfk · ‎12-02-2014

@CoastalMaineBird wrote:

Even if it did that the TCP ACK and SYNC could still be handled by a lingering socket on the remote side before it gets properly garbage collected after the process abort.

In which case, the proper thing to do is still to wait for the cleanup crew, isn't it?

The more I think about this, everything I've seen seems to point to the same thing - the ABORT process has TWO things to do:

A... Kill the existing connection.

B... Kill the TCP Listener

And it cannot do them exactly simultaneously.

If it did B before A, then the symptoms I see would not appear.

If it does A before B, then there is a window of opportunity for the host to connect again, and that is seemingly what is happening.

It can't do them simultanously and has to do umtien other things too. Even if it did them simultanously it wouldn't fix the problem. Shutting down a socket is simply telling the socket manager to do this at its earliest convinience. That is probably quite a bit less on a realtime system than on Windows where it can take 10 of seconds before the socket is fully terminated, but it is likely not 0s either.

The problem with your approach of waiting for a yet to be determined period is that this period may change suddenly based on seemingly unrelated changes to your system setup. And it may change in ways that even doubling or trippling your determined "time to be safe" could fail to catch.

Rolf Kalbermatter
My Blog

CoastalMaineBird · ‎12-02-2014

The problem with your approach of waiting for a yet to be determined period is that this period may change suddenly based on seemingly unrelated changes to your system setup. And it may change in ways that even doubling or trippling your determined "time to be safe" could fail to catch.

Well, I understand that, which is why I asked "how long do I wait" in the original question - it might depend on PXI 8196 vs. PXI 8102, maybe LV2013 vs. LV2014, maybe Mondays vs. Saturdays, I don't know.

I don't like the uncertainty of that, which I why I posed the question.

So, what is a safe way? It does seem that a "welcome" message would be bulletproof - if the PXI is aborted, then even if the PXI listener is still open, it cannot respond with a welcome. It seems certain that the user code would be FIRST to be killed in the ABORT sequence.

That means that the PXI has to send the message without being asked for it - the host cannot ask for it because the connection is not established yet.

And I should receive it on the host immediately after connecting - I can't go thru the usual way, because I haven't declared the connection valid.

And once the connection is established, everything else is the same.

I'll give that a spin.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

CoastalMaineBird · ‎12-02-2014

OK, the WELCOME message does seem bulletproof.

I changed the PXI code to send a VERSION message immediately upon connection (without being asked).

I changed the HOST code to connect and then wait for 100 mSec to get a message. If I get a VERSION message, I declare the connection good and proceed, if not, I declare it dead and proceed.

Thanks to Oligarlicky and RolfK for pushing me toward thinking the right way.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

LabVIEW

TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue

Re: TCP connection break / re-connect issue