LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

TCP - Allow files descriptors > 1024

Hi,
We have a problem with the TCP/UDP functions for linux systems:
- reproduced on Linux CentOS6, CentOS7 and CentOS8
- LabVIEW 2020-64 bits (and previous versions)
- Increasing the open number files to 9999 (ulimit command)


Although we increased the open number files with ulimit command, LabVIEW seems to have a limitation when the application has more than 1024 Files Descriptors opened simultaneous.

We reproduce the problem in the attached VI. After openning ~1400 Files (any kind of file), openUDP, readUDP, closeUDP randomly crash LabVIEW.

 

Our assumption is that the limitation resides inside the low-level LabVIEW functions. Some applications deal with a similar problem by replacing select() by poll() in their implementation of networking primitives (i.e. here, here, here, or here)...

 

If you have any ideas that helps us to solve or to explain LabVIEW crash…
Thanks !

 

CrashUdpCentOS6.mp4 ‏2112 KB is a screen capture of the crash.

0 Kudos
Message 1 of 11
(2,048 Views)

Yes, LabVIEW uses select() under Linux to handle events on network sockets. So that limit of fds greater than 1024 causing issues with select() is certainly a problem when you use already more than 1000 file-descriptors for other things.

And there is no easy way to change that. Changing software to use poll() instead of select() is a serious modification that has potentially far reaching implications in terms of backwards compatibility, so NI is not going to change that just like this. The LabVIEW source code is simply to complex to make such modifications in low level components at a whim.

Use of epoll() has even more potential pitfalls and is purely Linux only, so not a solution for handling on other platforms such as MacOS, which uses the same network handling code currently.

Rolf Kalbermatter
My Blog
Message 2 of 11
(2,005 Views)

Thank you very much Rolf for your quick and clear-cut answer.

 

As explained here (https://access.redhat.com/solutions/488623), a possible solution for NI without modifying the low-level functions with poll() would be to modify the value of FD_SETSIZE in the glibc before recompiling LabVIEW.

 

I thinks it's in the interests of NI (and ours 🙂) to fix this bug/crash because this limitation is highly limiting for partners that develop large communication systems. Moreover, Linux seems to take an important role for NI (for instance : NI Linux RT replacing PharLap) and I will not understand that the widely used communication protocols, like the TCP and UDP functions, result in a crash or in different behaviour according OS.

Do you think it's possible to draw their attention to this subject ?

0 Kudos
Message 3 of 11
(1,938 Views)

@Ubik) wrote:

Thank you very much Rolf for your quick and clear-cut answer.

 

As explained here (https://access.redhat.com/solutions/488623), a possible solution for NI without modifying the low-level functions with poll() would be to modify the value of FD_SETSIZE in the glibc before recompiling LabVIEW.


Not really. That limits the number of file descriptors one can wait on in a single select() operation. The three fd_set structures passed to this function are limited by the FD_SETSIZE constant and shouldn't even be 1024 by default, as that is going to waste a lot of stack space (those fd_set structures are normally declared as local variables, consuming stack space for each of them).

 

That select() has an internal limit of not working with fd's whose integer value is higher than 1024 is a different issue and built into the glibc or possibly even kernel implementation of that function.

 

LabVIEW adds one fd_set value per currently active connection to this poll mechanism, so unless you try to open more network connections/sockets than the default FD_SETSIZE, this limit doesn't apply.

 

But according to your problem description what you do, is opening a zillion file descriptors for real files, but that will cause the fd value for network sockets to go eventually beyond 1024 and that is something the select() implementation can't handle internally for whatever strange reason (most likely the usual reason, allowing more would require a completely different implementation that is more complicated and possibly performance trouble, so who the heck is ever going to use more than 1024 file descriptors? Fuck it!)

Unless there is some kernel (or glibc) tweak somewhere to circumvent that limitation you can't get around this limit other than replacing select() calls with poll() calls.

Rolf Kalbermatter
My Blog
0 Kudos
Message 4 of 11
(1,919 Views)

Thank you even if you lost me in the explanations. I understand that is a (relative) complex problem but many applications seem to avoid this limitation.

 

@rolfk  a écrit :

But according to your problem description what you do, is opening a zillion file descriptors for real files [...]

Not a zillion, ~1400 files are enough

 


@rolfk  a écrit :
(most likely the usual reason, allowing more would require a completely different implementation that is more complicated and possibly performance trouble, so who the heck is ever going to use more than 1024 file descriptors? **bleep** it!)


This consideration is subjective. To say nothing of the trend of big data, systems with networks of thousand communicating sensors or sensing nodes may be common in industry or other contexts.

 

Ideally, we would like NI to investigate with this attached VI code (which is not trivial) to try avoiding this limitation and to understand the "strange reason" of the crash.

0 Kudos
Message 5 of 11
(1,900 Views)

@Ubik) wrote:

Thank you even if you lost me in the explanations. I understand that is a (relative) complex problem but many applications seem to avoid this limitation.

 

@rolfk  a écrit :

But according to your problem description what you do, is opening a zillion file descriptors for real files [...]

Not a zillion, ~1400 files are enough

 


@rolfk  a écrit :
(most likely the usual reason, allowing more would require a completely different implementation that is more complicated and possibly performance trouble, so who the heck is ever going to use more than 1024 file descriptors? **bleep** it!)


This consideration is subjective. To say nothing of the trend of big data, systems with networks of thousand communicating sensors or sensing nodes may be common in industry or other contexts.

 

Ideally, we would like NI to investigate with this attached VI code (which is not trivial) to try avoiding this limitation and to understand the "strange reason" of the crash.


Except that that limitation of not being able to handle file descriptors with a higher value than 1024 in select() is not an NI limitation but either a glibc or kernel limitation. I would tend to believe that it is mostly glibc related, fd's are really a libc thing, the kernel works with different identifiers (inodes).

 

But the glibc folks likely will say that this is a historical limitation that can't be changed without rendering some backwards compatibility invalid, and since poll() is supposed to work fine, they decline to "fix" it.

 

So it's back on NI's plate again, but NI does have other more higher priority items to tackle, and quite a few of them, so I would definitely not hold my breath for a fix. You wouldn't survive that. 😁

Rolf Kalbermatter
My Blog
0 Kudos
Message 6 of 11
(1,884 Views)

You still should submit a bug report though. And no, posting here in the forums is not a bug report. Sometimes someone from NI might see reports here and submit them as internal bug report themselves but only if they feel confident to be able to properly report all the technical details involved. But folks working on that level of LabVIEW code (deep down in the platform specific interface functions) tend not to read these forums, as most of it is just plain boring LabVIEW user problems.

So make sure to report it as a bug here  (you can always refer to this thread in it if you want) and if you receive a confirmation with a bug ID number and feel very generous towards the community you could even report that bug ID in this thread. 😊

Rolf Kalbermatter
My Blog
0 Kudos
Message 7 of 11
(1,855 Views)

Thanks again because you are generous in attempts to explain. We appreciate because this problem causes us stress and big complications. We will report the bug at the attached link

0 Kudos
Message 8 of 11
(1,851 Views)

We (LabVIEW R&D) finally banged our skulls together until we concluded that, long story short, rolfk was basically right about everything in this thread. 👍

 

A big complicating factor was that a lot of the underlying code is more or less presuming a Winsock-specific state of affairs, i.e. the only things that are file descriptors are TCP/UDP sockets, so that the only thing you need to keep track of, vis-à-vis FD_SETSIZE limitations, is socket counts. So we thought we we already doing these sorts of checks... but that code was lying by omission. Shame on us, I suppose.

 

That said, rolfk was right about almost everything.


And there is no easy way to change that. Changing software to use poll() instead of select() is a serious modification that has potentially far reaching implications in terms of backwards compatibility, so NI is not going to change that just like this. The LabVIEW source code is simply to complex to make such modifications in low level components at a whim.

Use of epoll() has even more potential pitfalls and is purely Linux only, so not a solution for handling on other platforms such as MacOS, which uses the same network handling code currently.


I need to push back very specifically on this point. LabVIEW's architecture lends itself very well to abstracting across other networking APIs; the underlying implementation is fully encapsulated. In fact, there's been a (now-#ifdef'd-out) poll()-based implementation of the network code since 2001. (I am not entertaining turning that on, because I don't think it's been tested since then, and because far better alternatives are now available.) The fact that all OSs currently use the same network implementation is an unusually happy coincidence, and not something that is set in stone.

 

In other words, if this is important to you, please say so on the Idea Exchange. There are presently people watching that with the power to control feature priorities, and there are absolutely things we could do to improve this.

0 Kudos
Message 9 of 11
(1,329 Views)



@rtollert wrote:

 

That said, rolfk was right about almost everything.


And there is no easy way to change that. Changing software to use poll() instead of select() is a serious modification that has potentially far reaching implications in terms of backwards compatibility, so NI is not going to change that just like this. The LabVIEW source code is simply to complex to make such modifications in low level components at a whim.

Use of epoll() has even more potential pitfalls and is purely Linux only, so not a solution for handling on other platforms such as MacOS, which uses the same network handling code currently.


I need to push back very specifically on this point. LabVIEW's architecture lends itself very well to abstracting across other networking APIs; the underlying implementation is fully encapsulated. In fact, there's been a (now-#ifdef'd-out) poll()-based implementation of the network code since 2001. (I am not entertaining turning that on, because I don't think it's been tested since then, and because far better alternatives are now available.) The fact that all OSs currently use the same network implementation is an unusually happy coincidence, and not something that is set in stone.

Well, if it would be that simple it's of course a big question why it hasn't already been done loooong ago! 🙂

 

My take on this is, yes it is in principle fairly simple to do, but there is always a very high risk that even such obvious changes can have very unexpected side effects on certain platforms. Especially since network communication is not really using the same network implementation at all. Winsock is, while indeed based on the BSD socket library interface that all modern Unix OSes use, definitely not the same as what you would find on your BSD system, and that is again not really the same as what you find in the Linux socket implementation. The interfaces are pretty much the same, but the underlying implementation is absolutely not and that also has consequences for the semantics of those interfaces, especially when you talk about asynchronous operation, which the select(), poll(), epoll() etc. are exactly about.

 

As to the Idea Exchange, I was trying to see what idea would be about this, but can't seem to find anything.

Rolf Kalbermatter
My Blog
Message 10 of 11
(1,304 Views)