TCP/IP large data packets sending problem

Mark_Yedinak · ‎04-01-2013

@AndyN wrote:

Rolfk, Nathand ,

The labview RT (sending) computer has a static address. The other computer has a preassigned IP that does not change.

The broken connections happen every 10-30 min. I do don't think it can be related to renewing the IP but I will check it with full static addresses on both sides. Thanks for the hint.

If the connection is broken ( timeout on any side), I reset it and automatically start a new one and the transmission keeps going like nothing happened. This feature was built in since the very beginning.

A user that does not pay too much attention does not even see it happening but the data during the 1s of handshaking is lost.

This is a different loop that handles all the TCP/IP connection/reconnections. It is not shown here but work flawlessly for a long time.

The datagram (the missing one) is getting sent and partially recevied. I.e. one side sent 11kB and the other side was expecting 11kB but got only 5kB. The sender complains about the lack of confirmation, the receiver about missing the remaining part and both close the connection. As I mentioned before, the missed part looks like the remainder of the division of the TCPIP buffer size 65kB/11kB (or rather the part that did not fit in the buffer). Wire shark reports maximum size of the buffer. - >Nathan

I did not work on making a more intelligent read/write timeout connection I am not sure if it is even worth it. I would have to handle the data pilling up on one side and figure out a balance control not too overflow the network with the data from the buffer causing more connection problems. Essentially I would have to put a sort of small layer over the TCPIP with exactly the same functionality. If TCPIP fails for some reason with data transmission why would I assume I can do it better?

Also the timeout is used to diagnose a broken connection here. It makes it more complicated when a timeout can be actually a sign of other problems. Sending 11kB at 60Hz for 10 min takes less than 17ms. It goes like that for lets say 15 min. Then suddenly after 10 000 cycles 1500ms is not enough...

This is where I am a bit fuzzy with understanding. I assume that LV Write TCP is the blocking type- it stops the execution until it receives confirmation of sending the datagram?

No, the TCP Write will return immediately after the OS has accepted the data. The OS has a fairly large buffer so you can buffer quite a bit of data while it dribbles out on the network. If you use a LAN analyzer this is very easy to see. Send a large amount of data to a device and watch your code return immediately. Then you can observe the TCP communication actually sending the data out on the network via the LAN analyzer. Unless your application layer protocol has some sort of ACK/NAK there is no way for you to block your execution until all of the data has actually been sent.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

nathand · ‎04-02-2013

AndyN wrote:

The datagram (the missing one) is getting sent and partially recevied. I.e. one side sent 11kB and the other side was expecting 11kB but got only 5kB. The sender complains about the lack of confirmation, the receiver about missing the remaining part and both close the connection. As I mentioned before, the missed part looks like the remainder of the division of the TCPIP buffer size 65kB/11kB (or rather the part that did not fit in the buffer). Wire shark reports maximum size of the buffer. - >Nathan

I'm not sure what you mean by "Wire shark reports maximum size of the buffer." Does this mean that wireshark sees the entire contents of the data packet, but your application doesn't receive it? Is Wireshark running on the sending computer, the receiving computer, or some other machine?

How much other network traffic is going to the machine? How fast are you doing reads? I'd think partial packets would show up as an error somewhere, but you might see if the network driver allows you to adjust the number of receive buffers (for example with some Intel network cards, see http://communities.intel.com/community/wired/blog/2011/06/24/parameter-talk-tx-and-rx-descriptors). If there's some other source of large amounts of network traffic destined for the same computer, disable that service temporarily and see if performance improves.

AndyN · ‎04-02-2013

Nathand,

The 65 k is the TCP maximum and wireshark shows this to be the size on bioth sides.

Wireshars runs on the receiving computer. The other one is LVRT.

The only and only trafic is the one I am creating sending the data. It is a network only for this purpose no other computer can access it.

Mark,

So LV "TCP Write" is NON blocking? Did I get it right?

rolfk · ‎04-03-2013

@AndyN wrote:

Nathand,

The 65 k is the TCP maximum and wireshark shows this to be the size on bioth sides.

Wireshars runs on the receiving computer. The other one is LVRT.

The only and only trafic is the one I am creating sending the data. It is a network only for this purpose no other computer can access it.

Mark,

So LV "TCP Write" is NON blocking? Did I get it right?

It could be theoretically blocking just as the underlaying BSD socket write() could. It usually won't since BSD socket write() normally just pushes the buffer into the TCP/IP socket and lets it handle the rest. That said the socket implementation on the VxWorks OS has some simplifications and even quirks that are not always fully spec conforming. It also had/has a few bugs that are almost unavoidable if a developer has to fight between pefect implementation and realtime performance.

However if you say that Wireshark on the receiving end indeed sees the packet then the bug can not really be in the sender (aka RT system). If Wireshark sees the entire packet, the receiving TCP/IP socket should too. And I would grant you the possibility that the problem is in the LabVIEW layer of the TCP/IP nodes, IF there were others seeing the same issue. Unfortunately for you I and many others have done very demanding TCP/IP communication solutions between LabVIEW and various other systems and haven't come across this.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

robko · ‎02-17-2014

Hey there!

Have you solved the problem? Did you push your data (Push-Flag set to 1 in TCP/IP segments.. PSH = 1 under flags in your captured segments, see Wireshark)? If so or not, it might help me in a similar problem and maybe here. Yet, I haven't started a discussion about that as I'm looking for an answer if it might have to do something with that flag.

I know what the PSH does (nice article here), but I'm wondering what happens if for a long time there is some kind of data flood and the receiving side might be too slow (connection...CPU etc. are no problem [monitored], receiver is LV RT, sender Win7). In this discussion a connection re-establich was proposed. That's also what I'm testing but always getting a connection "freeze" after some time (the more I slow down my receiver using a wait-in-reading-time, the faster the receiver gets a problem with not having the time to read the incoming data.. I measured the reading time). Wireshark in this point reports a zero window size (Wireshark @ sender side) and that's ok if there is too much incoming data as this tells the sender "stop sending, I'm busy".... but if there is continuously too much data, the connection somehow even won't re-establish.

I just stumbled over that and maybe someone has a suggestion what the problem might be. If I flush my receiving-queue on the receiving side (right after when I see the zero window info, before getting freezed), the receiver works longer.... Also interesting: the shorter I flush (by reading out data quicker as the receiver-slow-down-timer), the quicker a zero window occurs and if this state stays for a while, the freeze comes again. Is it possible that packets with PSH set (=1) avoid a full queue (full if size set and app slower to read than incoming traffic) that might lead to a LIFO-behavior rather than FIFO?

Maybe I'm wrong, but as I'm new in TCP/IP-communication in LV, I guess that there is some clue that also ended in the problem that the discussion here already addressed... Is the LIFO/FIFO-switch possible when incoming data is faster than the receiver can read it and if the receiving buffer gets full?

AndyN · ‎02-17-2014

Hey,

I never found a solution to it. It is possible if I let the system wait for 5s or so it would clean up its act and catch up but I needed a reliable real-time communication. Any delay bigger that 1s was a bust for me.

Some ppl transfer way bigger chunks of data but it does not matter for me because I needed to have thousands of frames arriving without any unnecessary delay. I did not care about the average throughput.

My problems started after some sort of LV update....The hiccup started.

I left the system automatically closing the connection and establishing a new one. Most of the time it wasn't even visible to the user. Yet, it does not qualify as a solution.

I switched jobs since then but I am pretty sure the problem remains. I am sorry I cannot provide any useful update on it.

robko · ‎02-17-2014

Hey! Thank you for the pretty quick answer, even after one year! That's sad that you could not find a solution, but hopefully you're happy with your new tasks! I'm also thinking about choosing another way to solve that problem because there seems to happen much more uncontrolled and not reachable for most of us on these lower levels. Thanks anyway for your answer and if there is some smart architect: I guess both of us would be glad to hear that there is an answer to our question.

I also thought about segment-header-manipulation (e. g. by ettercap), but like you, I have also thousands of segments coming sometimes as measurements (sometimes in bursts) and I'm not willed to put a switch between the two hosts... or is there another way I don't know, how to implement the etterfilters from ettercap in this place (cross-over eth-connection)? If so: I would try to set the PSH-flags to 0 to see if the traffic would go above a certain threshold (to get the FIFO-behavior again if I'm not totally wrong with my thoughts ).

Sometimes stuff like that can make mad if it doesn't work in a way it should - if there wouldn't be such a clue ^^

Question at the NI architects: I know that in closed connection-environments UDP is preferred, but not by everyone (lossy) and that TCP may perform as well as UDP - correct me if I'm wrong.. I just wonder, how some people get so large traffic done without losses (hints?).. Or do they switch between these protocols and take into account losses in "conguested phases"/bursts? I read that also in another stackoverflow-discussion with a C++-Sender as I have one.. But I'm handling the traffic now on the sender-side by an intermediate app, by port-redirecting using LabVIEW and think about filtering some data depending on content.. but maybe there is some smart idea without those losses.. Because also on the sender-side itself I get that strange freeze behavior! So it might be a TCP-problem as described above... maybe really because of the PSH-stuff with FIFO/LIFO..

rolfk · ‎02-18-2014

LabVIEW really doesn't do a lot of low level stuff like changing options on the socket. So modifying the PUSH flag or such things is not something that LabVIEW would do. By setting that flag or others you certainly could cause all kinds of strange errors but they are not LabVIEW specific. The most involved part in the LabVIEW layer is the data buffering. LabVIEW maintains its own intermediate buffer to implement the 4 different modes. Handling of those is done by calling the select() and read() functions to implement internally asynchronous socket handling.
Aside of that there is nothing very special about how LabVIEW works with the socket.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

robko · ‎02-19-2014

Tanks rolfK, but I know that LabVIEW ist not the right tool for lower level TCP/IP-control and that is why I ask if someone has experienced these problems of packet bursts that maybe s. o. observed with Wireshark like me and/or already used ettercap in addition to filter data using etterfilters in advance (in addition to LabVIEW)... and maybe solved it in another way on another level - like with the other tools that manipulate the flag or just filter packets (drop()) that contain some specific data.

But thanks for the info about the intermediate buffer and details. That is an info, I was looking for to get it confirmed. Right now, I'm writing a simple tool that reads some data in each packet on the server-side that - against my first observation - has no problem by listening (on the Win7-side) and only the traffic over eth fails - maybe still because of some inadequate timeouts or simply because the receiver is too slow + the mentioned buffer-problems occur. Message filtering by pre-analysis on one side sould be a workaround-solution for my problem.

If anyone knows some other low-level filtering tools, I'd be glad about any advice, as I'm sure, it would also solve the problem described here...

rolfk · ‎02-19-2014

If the behaviour doesn't manifest itself over localhost but only when you go through the eth port then you can't exclude the possibility of a bug in the hardware driver. The driver may work alright in low to medium traffic situations but mangle something in high traffic situations.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

LabVIEW

TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem

Re: TCP/IP large data packets sending problem