Download with TCP

zerotolerance · ‎12-14-2013

Hello everyone.

I'm wondering if someone can shed some light in this issue.

I have a "TCP Download" VI that downloads data from an Internet server. Think of the Internet Server as a Cloud Storage Facility where I have no control over it.

It is not known how many bytes I should download from the server, but I do know the termination character the server sends when all the data is finished.

My problem is speed. Currently, the maximum speed that I can download is about 0.9Mbps, where this should actually be well over 10 Mbps. The attached image shows how my download VI is constructed.

I can reach the correct speed using third party software, but not with the LabView counterpart.

Is there any way I can make improvements on this and get the correct download speed, or is this a Native LabView problem.

Thanks

nathand · ‎12-14-2013

Approximately how large are the files? Do you know how many times this while loop iterates? Using concatenate strings in a loop is inefficient, because each time LabVIEW will need to allocate a new string and copy the existing string into it, which will take an increasing amount of time as the string gets larger. To maximize the transfer rate, read as much data as possible in a single TCP Read. Preallocate a string (or byte array) as large as the file could possibly be, and insert the TCP Read output into it at the correct index. When you finally receive the terminating character, resize the array or string to the correct length. Also consider using Immediate mode for the TCP Read, instead of Standard.

zerotolerance · ‎12-14-2013

@nathand wrote:

To maximize the transfer rate, read as much data as possible in a single TCP Read.

I have tried many different approaches for a while now, but the problem is the actual TCP Read section. No matter how inefficient the rest of the VI is, it is all overshadowed by the fact that TCP Read is not downloading the data as fast as it should, since the major delay is happening on TCP Read.

As you can see, I've allocated 500ms for 4096 bits, but even that tends to through Timeout errors almost every 2 to 3 iterations. I've tried increasing the timeout considerably, but even than the effect was minimal.

And yes, I've tried the CR, Imediate options as well, with very little if not negligible effect.

PhillipBrooks · ‎12-14-2013

@zerotolerance wrote:

As you can see, I've allocated 500ms for 4096 bits, but even that tends to through Timeout errors almost every 2 to 3 iterations. I've tried increasing the timeout considerably, but even than the effect was minimal.

And yes, I've tried the CR, Imediate options as well, with very little if not negligible effect.

What you describe might be related to the Nagle algorithm. You could try turning it off by using VIs found here:

http://digital.ni.com/public.nsf/allkb/7EFCA5D83B59DFDC86256D60007F5839

LV Punk: Global free, sequence structure free and BETTER THAN YOU !
Coordinated Universal Time (UTC) format string: %^<%Y-%m-%dT%H:%M:%S%3uZ>T

nathand · ‎12-14-2013

I doubt that this is a Nagle algorithm issue, unless the sending side is poorly written, but it wouldn't hurt to try.

Do you have code for the non-LabVIEW application that achieves the speed you expect? If you don't have code for it, can you at least share the name of it, if it's publicly available? Perhaps there's some trick to the protocol that it uses that is missing from the LabVIEW code. Are you running that application on the same computer so it's a direct comparison? Is LabVIEW running in a virtual machine or some other unusual environment? Do you have a specification for the format in which the data arrives?

Is there any chance you can share your code, instead of an image? What else is happening in your LabVIEW code at the same time? If you have a greedy loop running in the same thread, or at higher execution priority, that could potentially slow down the TCP Read.

Again, how large do you expect the transfers to be? 4096 bytes (NOT bits) is pretty small. I currently have an application running on a sbRIO (so not a direct comparison due to different operating system) that can saturate a 100mbit ethernet link by reading an 8mb file in a single TCP read.

I still highly recommend that you not use concatenate strings in a loop. Also, what's going on in your "END?" vi? If data transmission always stops after the terminating character, then you can efficiently check for the end by checking only the last character of each received string, which perhaps you're already doing.

zerotolerance · ‎12-14-2013

Hi

You are right Nathand, the Nagal Algorithm has no effect. Actually, it runs alitle bit slower (+-100Kbps) when I disable it. Now Anwering some of the questions.

This project uses NNTP Protocol. The data that comes from the server is Encoded using the yEnc Encoding Mechanism

http://www.yenc.org/

I am using Windows 7 64 bit with Labiew 2013 Version 13.0f1 (64-bit). The 3rd Party software that I'm comparing the speed to is NewsBin Pro.

http://www.newsbinpro.com/, and I don't have the source code. Its a comercial software.

The original size of the data is about 6 Mbits. I only know the size of the original Decoded data, but the data I'm recieving is in Encoded format, so because of that, the recieving data is about 3 to 5 % larger than the original size (that I know beforehand).

Attached is the code (the one showed in the picture).

This particular VI is designed to run in parallel with up to 8 copies of itself (as reentrant). When I increase the number of simultanious connections to the server, the speed increases as well and the maximum I could get is about 4.5 Mbps. But If I run it on its own, I can only go as high as about 0.95 Mbps.

Maybe I'm doing something catastrophically wrong, so please feel free do make any amendmends to the VIs and I'll test the results.

Regards

Kas

nathand · ‎12-14-2013

How are you determining the download speed? I'm not convinced that your VI to do this works properly - what's special about the number 35? If you finish downloading before you reach 35 iterations, as far as I can tell the progress won't update.

The VI you posted is not reentrant, but your comments make it sound like it is intended to be.

I'm not familiar with the details of NNTP and yEnc, and I don't have time to study it. In addition you didn't post the rest of the project that would show how you're implementing the protocol and what else is happening while this VI is running.

It sounds like you can calculate the maximum possible size of the download, and it looks like the CRLF mode might be appropriate based on your end-of-article check. I would try doing a single huge TCP Read, large enough to read the maximum possible length, with a long timeout in CRLF mode and see how that performs and whether it works for you (I don't know if CRLF mode will be sufficient).

zerotolerance · ‎12-14-2013

Hi Nathand.

The code I've given is made specifically for this post, so I didn't bother making it re-entrant etc. I cannot post the whole code because this is part of a bigger project that may have the possibility of becomming a comercial software.

The NNTP Protocol is pretty simple, its basically a command based layered on top of basic TCP protocol. The yEnc is alitle bit more involved but thats taken care of on another part of the software.

The main issue that I currently have is the download speed (i.e. getting the data from the server).

"It sounds like you can calculate the maximum possible size of the download,"

This is not possible without having the original data. I only know the size of the original data, and there is more than 1 way of encoding the data using the yEnc encoder, especially if the original data has alot of CR and LF and ".", the size can change. But the overhead of the encoder is somewhere between 3 and 5%, So I can only get a rough estimate.

"and it looks like the CRLF mode might be appropriate based on your end-of-article check"

You may be correct here but the data comming from the server is divided in lines. This is because of how yEnc works, each line of the data has a fixed size of 128 or 256 bytes, where each line ends with CRLF. So if I do a CRLF Read, I get the data in chunnks of 128 or 256 bytes. And because of the overall size of the data, I thought if this is an overkill (i.e. executing TCP read hundreds of times before I get the whole data).

"what's special about the number 35"

Nothing really, its just as you said, I wait 35 times untill I get a speed reading. Since this is still fresh, I just placed thhis section to see if I get more stable results.

Overall, can I ask if you think I'm calculating the speed correctly, or if you know a better way to calculate the download speed, keeping in mind that this code will be running in parallel with itself multiple times.

I'm trying to find a free software that can tell me (measure) the ACTUAL download speed when I run this VI. Maybe I'm calculating the speed wrong and that the download itself is actually running correctly.

Thanks

Kas

nathand · ‎12-15-2013

Again, you can calculate the maximum possible size of the download - take the size you know, add 6% or a bit more. A better approach might be to read the size you do know in a single read, then do a second read in immediate mode for whatever is left.

I cannot tell if you're calculating speed correctly. Your approach looks overly complicated. I would just track the time at which the download started, and continuously sum the number of bytes received and the amount of elapsed time. I suspect you'll get incorrect speed values due to the feedback node not being initialized to the time at which the download starts - instead, it will be initialized to 0. Also, the speed calculation VI isn't re-entrant (at least the one you posted). If you open multiple connections and use that VI without it being reentrant, of course it will look like the speed is faster.

Why not do a much simpler speed calculation - get the time before and after the while loop, divide the total string length by the elapsed time after the loop - and see if it matches?

PhillipBrooks · ‎12-16-2013

To clarify for me as well as others reading this post - is the problem

"LabVIEW takes longer to download the file via TCP than the stand-alone application"

OR

"How does one accurately calculate the rate / remaining transfer time of a file via TCP"

LV Punk: Global free, sequence structure free and BETTER THAN YOU !
Coordinated Universal Time (UTC) format string: %^<%Y-%m-%dT%H:%M:%S%3uZ>T

LabVIEW

Download with TCP

Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP

Re: Download with TCP