Simple TCP Messaging (STM)

joernheit · ‎11-18-2010

I have a problem with STM Read / TCP Read. I have multiple (e.g. 4) dataserver each providing up to 3 messages per second. Every dataserver has its connection using the same port 55555. The client application receiving these messages polls through these connections (e.g. every 50 ms). Under some conditions I see that the TCP Read function needs >2000 ms for returning, but its timeout is 50 ms. No error is raised when this happended. Is there anyone who can explain such behaviour.

Christian_L · ‎11-22-2010

I assume this subVI is running in parallel to a bunch of other code, therefore you can not really say that the TCP Read function itself is taking more than 2000 ms to return. In the time between the two Tick Count functions, another subVI may be taking control of the processor and increasing the amount of time measured. At the very least you should place the first Tick Count function right in front of the TCP Read node, rather than outside the Case structure.

You can also try placing these three functions (Tick Count/TCP Read/Tick Count) in a subVI and increasing the priority of the subVI to Time Critical to prevent the LV scheduler from switching to another VI during this sequence.

authored by
Christian L, CLA
Systems Engineering Manager - Automotive and Transportation
NI - Austin, TX

Stephen_Moore · ‎01-19-2011

Hi,

I have just installed STM using stm_202_installer.zip on LabVIEW 2010 and carried out a mass compile.

There appears to be a vi missing.

#### Starting Mass Compile: 19 Jan 2011 09:38:22
Directory: "C:\Program Files\National Instruments\LabVIEW 2010\user.lib\STM"
Search failed to find "SUM Read Msg.vi" previously from "<userlib>:\STM\Examples\Distributed Clients\SUM Read Msg.vi"
### Bad VI: "Host Receive IP Address from Client.vi" Path="C:\Program Files\National Instruments\LabVIEW 2010\user.lib\STM\Examples\Distributed Clients\Host SubVIs\Host Receive IP Address from Client.vi"
#### Finished Mass Compile: 19 Jan 2011 09:38:35

The "Host Receive IP Address from Client.vi" appears to be a stand alone sub vi that is not called by anything.

I have looked into the cab files of the installer and I have been unable to find "SUM Read Msg.vi" in either stm_202_installer.zip or stm_1.0.32.zip.

I am sure that it not important but that is not really the case. Reference Design installers should be complete.

Cheers

Stephen

Ben_Engelen · ‎03-16-2011

Hi There,

I'm usng the STM and i must say it works great!

Currnetly i need a C++ program to send commands with data to a LV application, and also prefer to use STM way.

I have downloaded the VSC++ example, and this explains already a lot.

But (there is always a but ) the data in my LabVIEW application is organised in a cluster, typical cluster is attachted.

Creating a flatten/unflatten function in C++ for each of 25 commands will be a lot of work.

Is there a somewhat generic way to get a struct flattened (in the C++ code) so i can send it via STM to

the LabVIEW application and there unflatten it.

Or is there a better way in implementing it,

Thanks,

Ben Engelen

rolfk · ‎03-22-2011

As long as you have no variable sized arrays and strings in there and make sure to surround the definition of the structure definitions with

#pragma pack(1)

typedef {

bla;

} your_structure1;

.....

#pragma pack()

the structure layout will match exactly LabVIEWs flatten format and you can simply do a memcpy(). With strings and arrays you won't get around writing a real, recursive flatten function.

Rolf Kalbermatter
My Blog

mtru · ‎04-28-2011

This question may be for another forum as it may be associated with TCP instead of just STM, but hopefully somebody here can point me in the right direction.

I use STM to communicate between multiple EXE files on the same computer using localhost addressing. Each EXE has a different port number to either write to (client) or listen to (server) that is defined in the EXE INI file. Some are 1 to 1 server/client connections, some are many to 1 server/client connection (but no more that 4 clients to any one server). An example file is below:

_________________________________________________________

[N6700B_SERVER2]
DebugServerEnabled = False
DebugServerWaitOnLaunch = False
HideRootWindow = True
server.app.propertiesEnabled = True
server.ole.enabled = True
server.tcp.serviceName = "My Computer/VI Server"
server.vi.propertiesEnabled = True
useTaskBar = False
WebServer.TcpAccess = "c+*"
WebServer.ViAccess = "+*"

[N6700B_SERVER.vi]
AutoMeasure = "TRUE"
AutoMeasureDelay(ms) = "50"
CHAN OFF = "FALSE"
CHAN ON = "FALSE"
CHANNEL # = "1"
Close Ref = "FALSE"
EventHandler_Timeout_ms = "1"
Exit Program = "TRUE"
Frame Channels (1:4) = "1:4"
HideFrontPanel = "TRUE"
Init = "FALSE"
Listen_To_Port = "35402"
MEAS_VOLT = "FALSE"
MEAS_VOLT&CURR = "FALSE"
PS_CL = "FALSE"
PS_VL = "FALSE"
Queue_Timeout_ms = "1"
Reset = "FALSE"
State = "CHARGE"
STM_Read_PortListen_Timeout_ms = "1"
Tab Control = "Setup"
Value_IN = "0.000000"
VISA Resource Name = "TCPIP0::169.254.67.71::inst0::INSTR"
WatchdogTimeout = "30000"
ConsecutiveErrorsAllowed = "999"
net address = "localhost"

__________________________________________________________

The EXEs run in "clusters" of 14 executables to accomplish their combined composite function. Everything works fine and connections are made and maintained with not issues when the first 3 sets of clusters of EXEs are running (42 EXEs). I have a total of 5 clusters on the system (70 EXEs). However, when I attempt to run a 4th cluster set of EXEs, the programs in the fourth set INITIALLY connect and start running (I can see data being collected/moved across TCP connections), but within a few seconds the connections get terminated by Error 66 which shuts everything down. I do not have code to attempt reconnects as I doubt trying to reconnect will help in this situation as it appears to be some type of TCP connections or stackup limit that I don't have visibility to or understand. Also, I can start up different cluster sets out of the 5 in different order and they all work fine together as long as no more than 3 of the 5 clusters are running at the same time.

I am running Windows 7 64-bit on a quad-core dual processor Xeon PC with 12Gigs of RAM, so I do not see the HW resource limit being reached while monitoring system resources. Are there specific LabVIEW INI settings for TCP control I am not updating correctly to avoid possible collisions or limits here?

ryank · ‎04-29-2011

The error you are getting doesn't indicate a resource conflict, there is a limit to the number of TCP sockets, but if you reach that limit it will return error 61 on the TCP Create Listener. Also, it's rare to reach this limit on Windows systems as I think it's 64k sockets for most implementations. A couple of suggestions:

1. You can use netstat (from a command prompt) to check on the status of your connections. With so many connections, it's definitely not impossible that you are running out of sockets if you are leaving them in TIME_WAIT, but I doubt this is the case.

2. Double check that error 66 is the only error you recieve.

3. My best guess as to what is happening is that you are getting a timeout somewhere as you increase the number of connections and then closing the timed out connection because ot the error. Ensure that you do not close connections on a timeout anywhere in your code (you should generally retry on timeout to avoid orphaning sockets and leaving them in TIME_WAIT).

Ryan King, CLA
Senior Systems Engineer, Industrial Embedded - National Instruments

mtru · ‎04-29-2011

Thank you for the suggestions. Some of the things I have done relate somewhat to your suggestions. Specifically:

1. I have used netstat before and also a utility called TCPView to monitor connections. Connections get established fine, but then drop off in LabVIEW but not in TCPView right away. I'm thinking LabVIEW believes that a connection is lost, but is actually just "not available" according to the OS...still looking into this.

2. Error 66 is for sure what is shutting down my program (specifically, the multi-client server), which then in turn shuts down every other EXE in the cluster. I get other errors after this, but are related to the lack of a connection first initiated by the Error 66.

3. I've conferred with other software engineers here at my company who use TCP and agree that a timeout is very likely as the root cause. However, all my "TCP Read" code retries many times after each timeout error to insure a packet is not lost if in can be captured in the allotted time (within 50ms). If the expected packet is not retreived by then, it "gives up" and moves on to the next process. This brings up a good point about my STM driver that reads/writes TCP packets. The way it works is this:

a. I have a "TCP Write" (using STM) send a packet along with a unique ID number (8 digit random number) from a client to server and visa versa (it works same both ways).

b. I have a "TCP Read" on the other end that is constantly running looking for incoming messaged that takes the packet, processes it, and sends a response packet back with the same ID number (usually an "ACK" with data).

c. The original sender goes to "TCP Read" and reads the incoming packet. If it has the wrong ID number, the program throws it out and tries again and again until it gets the correct ID'ed packet OR times out after ~50ms.

Now...if the "TCP Read" times out or gets it's data, the system then starts sending another "TCP Write" request for the next communication. If there was a timeout before this, the next "TCP Read" will most likely get two messages back...the one that was late (and is now ignored) and the one that has the correct ID.

Is there a better way to send/receive TCP messages using STM?

Also, even though a have a monster system with 8 cores, I am now benchmarking the system with Performance Monitor and seeing the system processor ping at 100% just before the "crash". I am now adjusting loop times and, per suggestion from other engineers, adding a relatively long 25ms delay before using "TCP Write" function to try and slow down traffic and see if this helps (though I am not sure this is going to help).

One last question...does using the TIMEOUT function in TCP API in LabVIEW take up a lot of the CPU?

ryank · ‎04-29-2011

It sounds like you are doing a good job of discarding the errors on the read side, how are you handling the timeout on the TCP Write side?

That's a reasonable way of sending TCP data, if you are already including packet numbers, some developers prefer to just use UDP and implement their own acknowledgement.

The timeout on the TCP functions waits for an interrupt, with normal settings it should not poll and affect the processor (especially on Windows). You can of course check this easily by watching the task manager while running a TCP read.

Regards,

Ryan King

mtru · ‎05-02-2011

I had not thought about timeouts on the TCP Write side of the equation. I did not even know for sure that a variable was available for that in STM. I checked my code and I see a cluster labeled (Options) with a boolean and a numeric for the TCP Write timeout (1000 default). I do not change this so suspect it is 1 second by default. What is the boolean in the cluster for? It does not appear to be used at all. Do you have a recommendation for handling TCP Write timeouts?

Since my last post, I have gone down and back a couple of rabbit holes but ultimately discovered the problem is that I'm starving the processors during EXE initializations and while doing file writes to the HD (up to 54 data files can be written to by the system at any given time). I've added a buffer to the data write portion of the code so that it will only write to the hard drive every X thousand characters (string) and that has significantly improved performance. I can almost run all of my test banks (4 banks for 4 test heads and 1 bank of two test heads), but still am pushing the 8 cores over 95% at times during EXE startups.

Looking at three options now to reduce CPU usage: 1) Add time delays to not allow multiple test heads to start up at same time within the same bank, 2) Stagger startups between banks with some type of supervisory control program (the test banks currently run completely independently of each other), 3) Look into optimizing code to lessen processor requirement at startup (remove local variables?). The third option is a last resort in that I don't know how to manage a LabVIEW executable to minimize CPU usage at startup directly, only indirectly with code optimizations. We are using LabVIEW 2009 SP1 and I have read that LabVIEW 2010 has improvements that make EXE builds run more efficiently. Do you think this would be a good application for that?

Components

Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)

Re: Simple TCP Messaging (STM)