Disabled property node hangs loop

David_Sprinkle · ‎11-11-2011

I've got a parser loop, operating on streamed data from a CRIO via UDP. The loop operates at about 90Hz. In response to the user opening a file, the code will use a property node to enable (or disable and gray) a couple of boolean front-panel objects. When these property nodes execute, I see the CPU usage (in Windows task manager) go to close to 100% and coincidentally, the parser loop hangs. Other loops within the same VI continue to run. CPU usage stays at 100% until I force a VI abort.

With a small test-VI, I've noted that these property nodes require roughly 10ms to execute. This seems quite sluggish, but nonetheless, my thoughts are that my code, running at 90Hz, would be able to tolerate a single slip of the loop execution time, particularly because the UDP data is queued.

Any thoughts regarding this property node execution time or suggestions on how to improve the code? Thanks in advance.

Mark_Yedinak · ‎11-11-2011

It would be most helpful if you could post at least a stripped down version of your code. It is kind of hard to offer suggestions withouyt seeing what you currently have.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

David_Sprinkle · ‎11-25-2011

Mark, thank you for offering to look at a stripped-down version of the code. I just couldn't spare the time required to simplify our complex code. However, I've been working on this problem in the past couple of weeks since I originally posted the question, and have made some progress toward a solution. Although I still do not have a conclusive explanation for the Labview's behavior, I thought I'd follow up with a list of what appears to have made improvements to the code. I'll concede that these suggestions are not definitive, but the problems are not repeatable and without any transparency into Labview's internal behavior, my analysis of the problems and my attempts to find a fix are admittedly speculative in nature. Software development shouldn't be magic, but damn it seems like Labview requires we dance around a black candle. Frustrating. OK, exiting rant mode, here is a list of what NOT to do if you want Labview to be more stable:

-- Do not use frames around your front panel objects. Our main panel has approximately 100 front-panel indicators. In an attempt to make the interface more intuitive, in a recent code revision we grouped the objects using frames. The effect was a sluggish UI and a processor loading that went to close to 85%. I'm aware from posts on this forum that overlapping indicators forces Labview to update all when any is updated. This is an understandable coding contraint. OK, fine, we weren't overlapping any idicators. But for pete's sake, why should the same constraint apply to a purely decorative object like a frame? This strikes me as a fundamental philosophical flaw in LV's coding. Group N objects with a nice frame -- update the objects N-squared times. If this is the result of using a frame, I would have preferred that NI not even offer the option. Bad choice on the part of the Labview coders and bad choice on our part for assuming zero frame impact on performance.

-- Do not use property nodes. We occasionally gray-out front panel objects when appropriate for the state of the software. This appeared to be contributing to Labview's instability. I built a diagnostic routine that measures execution time for the "gray-out and disable" property node. Generally around 8ms, but occasionally as high as 16ms. Good grief. I've got a code loop running at 90Hz. A 16ms hit isn't easy to tolerate or frankly to understand. Particularly when slow execution is the BEST of the consequences -- the worst is that the property node seemed on occasion to precipitate a hung loop.

-- Do not use Labview's built-in queue structure. Our code was originally using queues to hand off packets of data from a UDP-listener loop to a packet-parser. The UDP-listener blocks on UDP reception, shoves the packets into the queue. The packet parser blocks on data available in the queue and subsequently writes the data to file. NI would have you believe, and I did believe for a while, that this is an elegant producer/consumer approach to this problem. When our problem would occur, the UDP-listener continued to put data into the queue, but the packet parser would never retrieve it. Just went off into nowhere, consumed and forgotten by the Great Labview Scheduler in the Sky. The loop would hang, wouldn't respond to the stop button, would require a forced-abort. Subsequently, if we simply restarted the code, we couldn't be assured that the packets retrieved from the queue would be in chronological order. Seemed to be just randomly retrieved. Clearly the failures had corrupted some of the Labview internal data structures that govern the queue operation. We couldn't assure proper behavior unless we shut down and restarted Labview each time the error occurred. The solution was to abandon code elegance in favor of sequential operation -- get rid of the queue, listen for the UDP packet then parse it immediately. No queue handoff. No further parser lock-ups.

I'm not sure what other bombs might be lurking in our code. Our listener and parser code hasn't lately hung, but the problem is starting to move on to other loops. They'll run for hours and then just stop. Dead. Frozen. In the most recent cases, even the abort button won't shut them down. We have to use Windows Task Manager to kill them. I'll admit to harboring some deepening skepticism for any of the more clever and powerful "features" that NI has added to LV. From my perspective, these more powerful features cannot come free of cost -- they must impose some unavoidable computational burden on LV itself, a burden that LV seems unable to handle, with unpleasant consequences. Must we impose a moratorium on Timed Loops? Event structures? To what level of simplicity must I drive our code to ensure stability?

Thanks, everyone, for tolerating my frustration, and for your comments, if you've got any guidance you can offer.

-dave sprinkle

johnsold · ‎11-25-2011

Dave,

As Mark Yedinak requested, diagnosing the problems without seeing code is quite difficult.

From some of the things you have said, I have some suspicions. Any front panel update requires interaction with the OS. Property nodes force the thread containing them to operate in the UI thread. Many UI operations block until the OS responds. This probably accounts for your 8-16 ms timing. The human user cannot respond to anything changing on the front panel in less than about 100 ms, so it does not make sense to update the indicators more often than that. The loop(s) with the indicator termminals and the property nodes should probably run at no more than 5-10 Hz.

I have never heard of queues creating the kinds of problems you are attributing to them. "The packet parser blocks on data available in the queue and subsequently writes the data to file." How is it blocking? Is it waiting for data to dequeue or is the parsing blocking after the dequeue? How do you handle partial packets? How often do you write to the file and how much data do you write each time? I suspect that something simple could be changed here to improve the performance. Can you show the packet parser code?

Again, I have not heard of frames causing problems like this, but I rarely use them. Could you make clusters of the groups of indicators? This would have a similar appearance to the frames and not have the performance hit. Of course you cannot have both controls and indicators in the cluster.

Have you checked for memory leaks? Code that freezes after hours is often a hint that a memory leak is occurring somewhere.

What version of LV and what OS are you using?

Lynn

David_Sprinkle · ‎11-25-2011

Thanks, Lynn, for your insight. In response:

Yes, I understand the human response time implication. We're running the indicator update loop at 20Hz. When the frames were in place, the numbers would update at roughly between 1 and 2Hz. The exact frequency is difficult to quantify, but it was clear that the indicators were not updating at near the rate of the indicator update loop. When we removed the front-panel frames, the numbers flicker faster than I can perceive. The improvement in front-panel update rate is striking. With frames, indicators would blink, blink, blink. Without frames, indicators were a blur. Qualitative, but undeniable.

We are calling the property nodes only when necessary. Tens or hundreds of seconds may pass between calls. My aversion to their use isn't solely due to their time expense, but rather that their use seemed to coincide with the loop lock-up behavior.

The parser loop blocked on data available in the queue, using the -1 or "wait forever" argument. When data was retrieved it would immediately be parsed and written to file. Data was written and retrieved from the queue in one complete UDP packet at a time. Although I wrote code for handling a partial packet, it has to date never been used. The code has always retrieved exactly one packet, correctly aligned. Because the parser loop blocks on available data, it should in theory execute at the UDP frequency, which is 90Hz.

Thanks for the suggestion about memory leaks. We'll investigate. However, when watching Windows Task Manager, the memory use trace seems fairly constant. At the risk of seeming naive, wouldn't a memory leak cause an upward trend in the trace?

We're running LV11 and Windows 7 64-bit.

I'll make one last speculative query -- we are running a video card that will support 3 monitors. I have noticed some very flaky behavior in the software for this card, even outside of the LV environment. (As an example, we needed to uninstall/reinstall the driver just to add or change a monitor.) On one occasion, as we were running our Labview code, one of our monitors was powered down. The video card detected this, attempted to resize all of the windows to fit onto two screens, everything blinked a couple of times, and when the displays came back up, our LV loop had hung. Coincidental?

Is is possible that a buggy interaction between the OS and the video-card could hang our LV loop?

Thanks again.

-dave sprinkle

johnsold · ‎11-25-2011

Dave,

There have been reports of video drivers misbehaving with LV, so that is certainly worth investigating.

How do you know that the UDP-listener is actually enqueuing data? Is it possible that the queue got released? Do you see any errors in either the enqueue or dequeue loops? If so, what are the errors? In the parser loop try putting a timeout, perhaps several hundred milliseconds, on the Dequeue. If it times out, use a Queue Status to see if anything was put into the queue.

How often do the failures occur?

Lynn

LabVIEW

Disabled property node hangs loop

Disabled property node hangs loop

Re: Disabled property node hangs loop

Re: Disabled property node hangs loop

Re: Disabled property node hangs loop

Re: Disabled property node hangs loop

Re: Disabled property node hangs loop