The SW team says NI Forums can't help - prove them wrong!!

Ruler · ‎08-17-2022

Situation:

Large industrial system using multiple cDAQs with many widgets and doohickies attached. Several digital IO modules, several analog IO modules, RDT modules, etc. Also some networked serial IO (RS232 modbus and TCP/IP modbus)

PC running windows and Labview 2018 or 2020 (same issue with either version).

Experienced SW team: 2 less-senior with a couple years of labview each and 2 more-senior with many years of labview experience. These are smart, hard working people.

6 loop producer/consumer setup.

Loop 1

gets data from cDAQs and network - runs on a timer - 2Hz.(0.5 seconds)
places data into input queue to loop 2
This loop runs like a champ

Loop 2 - 6

run when data arrives on input queues (nominally 2 Hz)
process data, set outputs, update UI, read UI, log to disk, etc
Each loop feeds queue to next loop which drives execution. 6 loops at 2 Hz means a data point and resultant outputs take 3 seconds to work their way through the whole program.

Program runs well for minutes, hours, days, or weeks. Queues all stay essentially empty. Until it doesn't....

Problem:

On occasion, we see excessive slow down in loops 2 - 6 and the queues begin to accumulate data. Loop 1 runs like champ but input queue for loop 2 can grow to thousands of records. Timing on all Loop 2- 6 is slow and erratic, even though they should run whenever there is data on input queue. During this time, labview.exe processor usage is roughly double normal. Over-all processor usage not drastically changed. Memory used increases as you expect with growing queues but nothing stark.

Then it self corrects. The labview.exe processor usage goes even higher as all the loops suddenly get busy processing all the old data in the queues and all the queues slowly get back down to empty, then it runs normally...until it doesn't.

Timing on these events does not correlate to time since PC reboot, or time since labview was started, or anything else. They can occur weeks apart or minutes apart. The events can last from 15 seconds to 2 hours(thousands of queued records).

We have a ton of data by logging execution times within the loops, using NI Trace Tool Kit, using Labview Performance and Memory profile monitor, logging processor, memory, network statistics with external programs, etc. Those different sources of data provide conflicting input that prevent meaningful progress. (Example: NI Performance and Memory profile monitor says Loop 1 runs more quickly during these events but monitoring of serial traffic from that loop shows timing is actually slightly degraded. Logging of vi processing times shows some sub vis slow down significantly while the NI Performance and Memory profile monitor shows no change to those vi times).

Request:

I am not requesting the NI forums to tell me what is wrong(I wouldn't believe you if you did 😊 ). I am requesting that those of you with experience with something similar in scale and complexity assist with suggestions for research and testing that will lead us to what is wrong. What tools (aside from those listed) and methods can be employed to identify the change that creates, then resolves, this issue? I looking for root cause, not a guess at solution.

Thank you in advance for any help you can provide.

santo_13 · ‎08-17-2022

Glad to see a post where the OP is willing to get hands dirty to do the work than rely on forum members to do the work for them.

Please provide more information on why there are 5 loops (2-6) for the listed tasks. is it like one loop per task?

Few things that I can think of,

Some VI used across all the 6 loops somehow become a deadlock
Do you use Notifiers and wait on notifications?
What kind of data processing do you perform?
Do you log to a network location?
Does your application rely on any internet connection?

Santhosh
Soliton Technologies

New to the forum? Please read community guidelines and how to ask smart questions

Only two ways to appreciate someone who spent their free time to reply/answer your question - give them Kudos or mark their reply as the answer/solution.

Finding it hard to source NI hardware? Try NI Trading Post

Ruler · ‎08-17-2022

Santo_13:

To your questions:

Please provide more information on why there are 5 loops (2-6) for the listed tasks. is it like one loop per task? Pretty much. Get data, process data, get controls input, generate control outputs, update GUI, logging and alarms

Some VI used across all the 6 loops somehow become a deadlock - interesting idea. will explore
Do you use Notifiers and wait on notifications? - no, queues are primary transport
What kind of data processing do you perform? - algebra and logical operators. We're not doing image processing or factoring primes or anything like that
Do you log to a network location? - no, HDD only, and not a lot of data going to disk, think 1 kB/s
Does your application rely on any internet connection? - nope

SlippinJimmy85 · ‎08-18-2022

Here are some suggestions/questions/considerations:

- Does this behavior happen in the EXE only or also in LabVIEW?

- Try to disable loops 2-6. I mean: disable all the loops 2-6. If problem disappears, enable only loops 2-4 and so on.. until you find if there is a specific loop which is the root of your issue.

Use disable structures or case structures to disable loops and, of course, be careful not to have queues with producers and not consumers.

- Try to disable the content of the loop 2-6. I mean remove all the logics and operations. Keep data transfer only (enqueue and dequeue functions).

- Are you sure that your HDD is ok? Maybe something stucks in HDD operations and this makes one of your loop slower. You can try to run your EXE on another PC or simply disable HDD operations.

t.n14 · ‎08-18-2022

I'm not a big experienced LabVIEW programmer, but gamer 😉 So some Ideas to random performance loss:

Is your windows set up to performance? Deactivating unnecessary features, like weather, news, cortana, ...

I know Windows updates and anti virus can have a massive impact on performance.

Mark_Yedinak · ‎08-18-2022

I would get network traces of your TCP based communications to see if there are delays you are encountering in you communications. As suggested, see if you have have common subVIs that are being called in multiple loops. If these VIs are not re-entrant your parallel loops will effectively run synchronously. This could have a big impact if it is something like slow network communications or slow file access. What external factors are occurring when you see these delays? Are there virus scans or other network security tools (for example port scanners) running at the time? Is some type of background maintenance happening on the PC? There is definitely some external force that is causing this. Fundamentally you know the underlying code is sound since it can run for long periods without issue.

I had an issue where I needed to collect processing times for an application I developed and I implemented a simple process performance class that I used to time my state machines. Something like this may be helpful to you as well. The VI profiler may not count time for a subVI if it is effectively idle and waiting for a timeout. Specifically I am thinking about a dequeue or a TCP Read operation. For example, when the profiler include the wait period for a dequeue operation if no data was available. I suspect it does not. Therefore implementing your own profile will include this time. This way you would be able to identify where the delay is occurring and have a better chance at resolving the issue.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

santo_13 · ‎08-18-2022

@Mark's inputs reminded me of a similar scenario where one company had scheduled a centralized backup running at midnight which took up majority of processing power and slowed down the data acquisition application.

Santhosh
Soliton Technologies

New to the forum? Please read community guidelines and how to ask smart questions

Only two ways to appreciate someone who spent their free time to reply/answer your question - give them Kudos or mark their reply as the answer/solution.

Finding it hard to source NI hardware? Try NI Trading Post

Frozen · ‎08-18-2022

"6 loops at 2 Hz means a data point and resultant outputs take 3 seconds to work their way through the whole program"

Maybe I am not following your logic...

Loop 1 is running at 2Hz
Loop 2 executes on reviving data from loop 1.

Is loop 2 the only recipient of said data?

What is the execution time of loop2? ms? seconds? minutes?

Is loops 2 to 6 parallel or just serial.

Loops 2 to 6 must be executing faster than 2 Hz or else you would *always* have a piling up queue. There for, the "resultant outputs" should be no more than 2Hz.... unless you get into the situation that you are trying to debug.

Oops... none of this is helping you debug your program.

---------------------------------------------
Former Certified LabVIEW Developer (CLD)

Kyle97330 · ‎08-18-2022

Is there any chance there's some combination of data that queue 2 gets at one time or in a row that puts it into an odd state?

You could add code to queue 2 that does something like:

* Each time it gets a message, add the message to a rolling log of the last N messages received (maybe 100 or so, whatever seems to make sense).

* Each time it gets a message, it looks at the queue to see how many messages are pending. If it's over some amount that seems unreasonable to stack up, it triggers.

* When it triggers, it saves the messages in the rolling log that it got to disk somewhere, but then doesn't trigger again until the queue drops back to a reasonable level and then goes up again

* At a later time, you can load up these rolling message logs, feed them to an offline version of the code with no hardware that won't send any results to your production alarms or logs or whatever

* See if rerunning that sequence of messages also triggers a slowdown that you can repeat

* If it does, then check your algorithms for what's gone wrong

Ruler · ‎08-19-2022

I've prepped a more detailed description of the events that include some sanitized data, but still working on permission to post it.

There has been some good ideas presented, some of which I have data for already and will provide.

Kyle, I love the idea of the recording and play back to see if the data in the queue is somehow creating the issue. I have considered that there is corrupt data that comes in and creates some issue but haven't had much traction with getting that analyzed. By the time the data is written to the log, it has been cleaned up. I ran traces on the serial port data to see if any data from serial port devices were corrupt but didn't find anything.

LabVIEW

The SW team says NI Forums can't help - prove them wrong!!

The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!

Re: The SW team says NI Forums can't help - prove them wrong!!