TCP Connection issues when using Actor Framework

StevenHowell · ‎01-24-2022

I have a really weird one here that I am beating my head against the wall on.

The application is written in Actor Framework and is in 2 separate applications, a server side and a client side.

In the application the customer has a need for the client to be able to connect to a server at any one of their sites and retrieve data.

At present I am utilizing the STM toolkit for TCP communications as I like the API and a lot of the legwork for message transfer has been done.

Here is where it gets weird. The client application can connect to a server at remote site over the network and everything works fine. When the same client attempts to connect to the server locally on site, the connection times out.

I have tested this while using netstat on the server and can see when using the example STM VIs or even just the plain TCP example VIs in LabVIEW that the connection is made and data transfer works.

When I try to run it in the application, no connection attempt is ever seen on the server in netstat.

My architecture is as follows; I have a TCP server actor that all that it does is listen for connections. When a connection is made the server spins up a TCP handler actor that actually handles the new TCP connection and data transfer.

The connection never works if I set a timeout or have it wait forever while using Actor Framework.

Could there be something lost in the transfer from the server to the handler spinning up? Is it a timing issue where the server expects to see data on the connection and since there is a short pause while the handler spins up, the server is dropping the connection?

I am not a network expert here and this is driving me nuts.

Any help or suggestions would be greatly appreciated.

Steven Howell
Controls and Instrumentation Engineer
Jacobs Technologies
NASA Johnson Space Center

justACS · ‎01-24-2022

Is your server only offering one connection? If so, you might check this out:

Network Endpoint Actors

The latest version is on the LabVIEW Tools Network, and you can download it via VI Package Manager.

I'm aware that I am being the guy who answers your question with "do something different", and I normally hate that guy. But I wrote the Network Endpoint Actor for exactly this scenario, and folks have been using them for years.

Now, if you want to have several connections served off the same port, you'll need to spin up a handler like you're doing. In that case, I'd have to look at your code to see where the issue may lie.

StevenHowell · ‎01-24-2022

Hey Allen,

This will be a 1:N connection, multiple clients should be able to connect to this server concurrently.

What I am using is for is that the client application will be able to connect to the server and alter the scaling and other sensor configuration information remotely and then send a message to the server when ready to reset the DAQ task and load the new settings.

The way I have it set up is to use the Create Listener VI with the port number (one port per site) and then the Wait on Listener. I take the listener out after a connection attempt and write it back to the actor so it will use the same listener reference.

I have set it up both with -1 on the wait to various timeouts and then have the actor send itself a msg every so often to listen for a connection and start the process again.

Steven Howell
Controls and Instrumentation Engineer
Jacobs Technologies
NASA Johnson Space Center

drjdpowell · ‎01-25-2022

Not AF, but the remote communication in my Messenger Library is very similar to what you are doing: a multi-client TCP Server using Async launched connection-handlers. I have never had any issue like you describe, but the one thing I could suggest is if you mistakenly specify a specific Port number, and use the same Port of both client and server. Then it would work on different machines but not the same machine, I think, as only one EXE can bind to a specific Port.

One other suggestion, whenever things just don't work, is to check that your error handling is up to scratch, and that perhaps there is an error message that you are not recording that would tell you exactly what is wrong.

StevenHowell · ‎01-25-2022

drjdpowell,

Thanks for your reply. I have downloaded and I am looking at your toolset.

I do have error handling on both ends. I get an error 56 immediately on the client side even though I set a timeout of a second or more. I have confirmed this in the source code as well as timestamps from the error logger.

On the server side, the only error I see periodically is an error 1, which I have seen before. Typically, this has been indicative of a reference being called that is no longer valid.

This would lead me to believe that the call is heard by the listener, and it attempts at opening the connection but when the actor tries to use it, the reference is no longer valid.

However, when I run netstat on the server, I do not see that port number being called.

I have set up a specific port for each site and in a range that is not used by the customer for anything else.

Using 3370-3373. Is that port range used by something else in LabVIEW? I know that the TCP/IP remote stuff is in the 336x range.

I also tried a higher range of port numbers and have the same results. In testing I attempted opening the port by specific port number or by service name and neither of these are working.

The client side does not use the same port number for the call.

One thing I dont recall trying is to assign a 0 to the port on the listener side and then call by service name.

Steven Howell
Controls and Instrumentation Engineer
Jacobs Technologies
NASA Johnson Space Center

StevenHowell · ‎01-25-2022

I just found an interesting piece of information.

TCP Open Connection Details

When wiring an unused IP address, you may receive an error stating the network operation exceeded the user-specified or system time limit. This error occurs before the default timeout of 60000 ms has occurred. To correct this error, wire an IP address that is running and listening on the port you are trying to use.

This is exactly the behavior I see on the client side.

The customer is using DHCP for the IP addresses and as such I am using a DNS name of the server on the network.

What is weird is like I said, the TCP examples in LabVIEW work and I am using the same server name.

It almost sounds like the server side is closing the connection immediately after the listener opens it as the handoff to the nested actor is occurring with the newly opened TCP reference.

What network settings could that be on the server side causing that behavior?

Steven Howell
Controls and Instrumentation Engineer
Jacobs Technologies
NASA Johnson Space Center

Taggart · ‎01-25-2022

Is it possible the server is not listening? or you have the wrong port?

What happens if you connect to it with netcat?

Sam Taggart
CLA, CPI, CTD, LabVIEW Champion
DQMH Trusted Advisor
Read about my thoughts on Software Development at sasworkshops.com/blog

drjdpowell · ‎01-26-2022

Is it possible the actor doing the listening gets an error and shuts down, thus invalidating the new TCP connection before your Connection actor can use it?

StevenHowell · ‎01-28-2022

Thanks for the reply again, I think that may be exactly what is happening.

I have put some TCP logs in my code and doing some testing.

Steven Howell
Controls and Instrumentation Engineer
Jacobs Technologies
NASA Johnson Space Center

StevenHowell · ‎02-04-2022

Thank you everyone for your replies and suggestions. In the end it indeed was a race condition in the code and I finally found it and corrected the issue.

The communication is stable now.

Thanks again.

Steven Howell
Controls and Instrumentation Engineer
Jacobs Technologies
NASA Johnson Space Center

Actor Framework Discussions