Deallocate Memory on cRIO with Network Streams and AF

cbutcher · ‎01-22-2020

I have some cRIO code that previously used a class I've defined to bundle some metadata and numeric arrays (typically ~1000 elements, sometimes just 1 and sometimes up to maybe 10000 depending on the settings, which I've yet to determine the best values for).

These objects are flattened and sent over Network Stream to a connected computer - this (seemingly) worked fine.

As part of a refactoring to reduce the number of slightly different implementations of more or less the same code, I created a TCP listener which waits for requests for connection and then provides information to open a connection to a dynamically launched Network Stream connection using Actor Framework.

The flow goes something like:

PC sends a TCP message with information about data sources to stream
cRIO listener spawns a new Actor which creates a NS Write Endpoint, and sends the name of this endpoint back to the incoming connection (PC)
PC opens a NS Read Endpoint with the information from the cRIO TCP server
When results are created on the cRIO, they go to a singleton Actor which determines which of the dynamic Actors care about that data source
The result is sent to each of the "connection" Actors, which write it over Network Stream to the connected PC (typically just one, but in principle this scheme allows multiple connections simultaneously to the cRIO for different computers)
"Connection" Actors are removed from the distribution list and stop when the remote endpoint (PC) disconnects, or for unhandled errors etc

When I run this with larger data objects (higher rates etc) the cRIO eventually (sometimes quickly, if you pick a big enough value for acq rate) throws error 63 (connection refused). According to Flashing Status Light on CompactRIO Controller the error which later follows (indicated by 4 flashes and a pause) is a common indication of Out-of-Memory errors, and the error 63 occurs because the cRIO restarts.

As soon as the result object has been sent to the connections (if any exist) the object can be discarded on the cRIO, but that doesn't seem to be happening. The VIs (probably) in question is shown below (note that the array of enqueuers can be and often is empty, when no connections exist):

Would placing the Request Deallocation Function in one or both of these VIs be a suitable way to avoid my problem, or do I have larger issues?

Why is this perhaps necessary (previously LabVIEW was able to manage without the extra overhead, and in that case the Network Streams were also managed by Actors, although in a different manner) - i.e. what did I do here that prevents garbage collection or leaks memory?

wiebe@CARYA · ‎01-22-2020

@cbutcher wrote:
Would placing the Request Deallocation Function in one or both of these VIs be a suitable way to avoid my problem, or do I have larger issues?

I doubt it. AFAIK, The RDF will only clear the deallocatable memory of the VI; memory that potentially could be recycled when called again. So this memory will not grow anyway. Unless of course the VIs are clones that are not closed. But if they are clones, closing the clones would be the solution, and the RDF will only make the memory leak less.

I think you have other issues.

Do you have access to the Real Time Execution Trace Toolkit? That might reveal the real issues.

Are you sure you close the TCP\IP listener reference? That one is often overlooked, as you often don't use it.

Search LabVIEW like a graph!

cbutcher · ‎01-22-2020

I tried the Real Time Trace Toolkit earlier this week and although it gave me some interesting insights regarding cpu usage (I had a random number generation I could mostly avoid but was calling unconditionally that took a surprising amount of time) I didn't see anything regarding memory usage or leaks.

I'll take a look at the Listener but I only have one, so I doubt it's responsible for a crashing cRIO.

I initially was concerned that the problem might lie with allocation for the network streams but I've reduced the buffer size and I don't think even a full buffer with large elements should exceed the available RAM (but I'm still worried that although the buffer holding pointers might be reused, the memory for the objects could be leaking somehow). That being said, I can crash it *I think* with no continuous connections, so I think the problem lies in distribution somehow.

Several of the potential problem VIs are clones of one form or another, but all are statically called, so I don't think they should allocate more than either the pool number for shared clones or the call location number for preallocated... Maybe I'm miscounting and this is a bigger issue than I expect.

cbutcher · ‎01-22-2020

Reading an archived manual I see that I should be able to (with detailed logging toggled true on the RT system) view memory allocations using green flags of some sort.

I'll try again and add custom events for the VIs that seem most likely, along with perhaps new and closed connections.

I found the RTETT (or whatever) quite difficult to read, but it was certainly detailed.

wiebe@CARYA · ‎01-22-2020

There is of course a change you're running into an obscure bug in LabVIEW.

I guess you could try to execute just parts of the code, but that might be difficult if the parts are highly dependent on each other.

I'd start by keeping the NS endpoint, but dismissing the results. But perhaps that's all you can do...

Search LabVIEW like a graph!

MrJackHamilton · ‎01-25-2020

cRIO embedded systems have a single CPU thread and can thread starve easily.

I've done lots of throughput optimization on cRIO systems. One of the issues is running FOR or while loops without any delays. When they do run, the spike the CPU load as without a delay the priority is high.

Class architecture has lots and lots of empty one-run loop VI's, which if called recursively can chew up CPU doing nothing very fast.

Counter intuitive: adding a ~10 msec delay in a while loop of a state machine, or other loops can alleviate the priority of those loops.

As embedded systems are not multi-threaded, you have to design in priority of loops, also loops run much faster in the embedded. Unlike Windows, you can run loops in the high kHz. So adding a 100usec delay [use the Real0-Time timing palette to get the usec delay] and start peppering your loops with these.

Good luck.

cbutcher · ‎01-26-2020

Jack,

Thank you for your reply. I'll check over any loops I have again, but I don't believe any should be executing at unconstrained rates.

The cRIO-9045 that I'm using has two physical cores, and I believe the LabVIEW run-time/thread model actually has quite a few separate main threads, but the RT-Execution Trace Toolkit does clearly show that of course only one per CPU can be active at a time (so, 2 in my case). As you say, this means that lower priority tasks/processes can become starved if higher priority tasks are using a large fraction of the available time (e.g. a greedy loop).

Following my rewrite, the CPU usage dropped significantly, but clearly I have a problem with my memory usage somewhere.

Since my colleague needs to get on with experiments, I reverted to previous code using git and have modified it to hold some of the minor improvements I've made as part of this process.

In the coming week, I hope to try and use available downtime around experiments to trial reintroducing various parts of the new, failing design, to attempt to locate the problem so I can fix it. I'll post back if I find something interesting.

If you have any other thoughts, please feel free to suggest them - it may point me in the right direction that bit faster.

Edit: Wiebe, I tried the execution trace toolkit again and was able to observe large numbers of "Waiting for Memory Allocation" flags (or whatever they're called). I'm going to profile the old code next week and see how similar/different it is, but I haven't been able to identify based on the profiles what is causing the allocations, or indeed if it is one with many waiting flags, or many allocations... Manually allocating large arrays for the relevant VIs before loops started didn't appear to help, and in fact caused a bunch more problems. I'll have to take another look at reentrancy settings, maybe I have some problems there with the combination of preallocated and shared given the dynamic dispatch constraints (i.e. not preallocated).

cbutcher · ‎02-06-2020

Today I read this post: LV Real time crash on specific map use with objects and so I'm updating this thread with my findings over the last week.

I profiled my old code and didn't see huge qualitative differences. Of course the results are not the same, but it wasn't vastly different.

Some periodicity can be identified in the logs and so I've tried to close some memory allocation bits, but I seem to have only moderate success, and in any case, the number of flags (whilst worrying-looking) doesn't appear to cause any problems.

I have run the code for long periods of time (several days) with the data "streaming" to nowhere - that is, having never had a client connect. Once the client has connected (and even then if disconnected) the problem occurs.

I'm wondering if this is related: Comparison of Collections - essentially an empty set is not the same as a set with all of its elements removed.

LabVIEW

Deallocate Memory on cRIO with Network Streams and AF

Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF

Re: Deallocate Memory on cRIO with Network Streams and AF