Unknown lockup cause

Chris.Osiecki · ‎04-29-2008

I have an issue that I would like some help with. I have a status monitoring program that will stop working after random periods of time.
    There are 4 mail parts to the program. There is a DAQ loop which acquires 16 analog channels at 100ks/S as 16bit integers. One channel is a 5V signal that acts as a trigger. The data is collected continuously in 50 ms blocks and passed via a named queue to a data handler program. The data handler uses the trigger signal to build data blocks that contain integer number of triggered blocks. The trigger rate runs from 0 to 1kHz. The data handler will only build blocks up to 1.5 seconds (667 mHz). The triggered data blocks are as long as 1.5 seconds, but only as short as 50ms. When the trigger rate is high enough, the blocks are built to contain multiple triggers in a 50ms block. The data handler also builds 1 second data blocks that are untriggered. The output of the data handler is two queues. One for triggered data and one for 1 second timed data. The DAQ and data handler are built in such a way that they have to run together, but will run indefinitely and use a minimal of processing (~5% on a 2.4GHz Core2Duo). The data handler also send out notifications when it outputs data blocks.
    This data feeds independently operating (individually called sub routines) analysis routines that push their results to a UI loop using references. The structure of the analysis subroutines are all very similar: A combination of Wait on Notification and Wait (ms) controls timing and then Preview Queue Element is used to get data. The data is processed and the results are passed to the UI via a reference. There is one analysis routine that does processing that many other functions use. It also sends out its own notifier to let functions that need its data know there is new data. Since its results are small (5x7 DBL array) and I use notifiers to prevent race conditions, I use a global variable. The UI is very large. It covers a 1920x1200 screen and a 1600x1200 screen. There are 9 analysis processing subroutines that run and one program for monitoring the sub-programs. All total, there are 15 sub-programs running. I was very careful in the construction of my programs, so every array is preallocated and index values are used instead of reallocating space. All buffer allocations occur at the initialization of subroutines instead of happening when sub-vi's are called. When the program is running, utilization runs between 30-45%. Also, memory use is stable and does not increase as the program runs.
    The problem I have is that the program will go from 40% utilization to 100% after some random period of time and this will cause the DAQ buffer to overflow and the loop will crash. Without DAQ, the processing is pointless, though the functions are still running; they are just waiting for new data. This problem will occur anywhere from 2 minutes to 14 hours after the program is started. Sometime the useage jumps to 100% for just a few seconds and drops back before the DAQ buffer overflows. Then things just keep right on going. I have run the profiler and when this happens, it seems to be a different sub vi each time that has an exceptionally long run time. Its something to do with everything running together because when this doesn't happen with just the DAQ and data handler, or the DAQ, data handler and any one or two sub-routines.
    The DAQ and data handler have to ties to any of the other functions. All their output is either queues, notifiers, or globals. The queues are handled internally and sub-routines are not required to dequeue to prevent the queues from filling up. Also, the globals are loop counters and boolean status values, no data arrays are being passed. The notifier is simply a boolean as well.
    Any thoughts will be greatly appreciated. I'd like to get this stable so I can focus on improving the analysis, rather than stability issues. I will be as helpful as I can if you need any more info for troubleshooting. I may have been as clear as mud about how this thing works.
Thanks,
Chris

TWGomez · ‎04-29-2008

Is this part of a Dean Koontz novel???

Posting the vi might help...even though it sounds like a rather large complicated program...

Preview Queue Element won't remove it from the queue, so eventually you will have mem problems.

If the CPU is being pegged then you might be needing (missing) a wait function somewhere.

The Wait(ms) and Wait on Notification in same loop seems redundant...but who knows...maybe there's a good reason for it...

Are there other programs running, such as virus scans???They can always be a problem...

Message Edited by TWGomez on 04-29-2008 02:25 PM

________________________________________________________

Use the rating system, otherwise its useless; and please don't forget to tip your waiters!
using LV 2010 SP 1, Windows 7
________________________________________________________

Chris.Osiecki · ‎04-29-2008

Sorry for the long post. There are tons of posts where people just say they are having a problem, but give no details. To respond though:

There would be a lot of work involved with posting the vi. I can do parts, but not the whole for various reasons.

The data handler dequeues the data from the DAQ queue. It also dequeues elements from its own data queues before enqueueing new ones. I have to do it this way since I have multiple independent functions that need data from the same output queues.

I use both the Wait(ms) and Wait on Notification because the data handler runs at 20Hz, but most of the sub functions run at slower rates (1-5Hz) This way those functions run on the most recent data, but at a slower rate. It usually goes Wait on Notification >> Perform analysis >> Wait for 250ms

There may be a virus scanner of on the system. I will check. Other than that though, LabVIEW is the only thing running. Its a dedicated status monitoring system.

TWGomez · ‎04-29-2008

COsiecki wrote:

I use both the Wait(ms) and Wait on Notification because the data handler runs at 20Hz, but most of the sub functions run at slower rates (1-5Hz) This way those functions run on the most recent data, but at a slower rate. It usually goes Wait on Notification >> Perform analysis >> Wait for 250ms

The Wait on Notification will wait as long as it needs to though....

15 tasks is quiet a few, but I don't think it should really be a problem...

When you first start things going how much CPU time does each one soak up? Does the task that monitors the subtasks have a potential to lock up in any manner? what exactly is it doing? I assume you have made sure any and all loops have at least a small ms wait in them?.?.?.?.?

________________________________________________________

Use the rating system, otherwise its useless; and please don't forget to tip your waiters!
using LV 2010 SP 1, Windows 7
________________________________________________________

johnsold · ‎04-29-2008

If you put a non-negative timeout on the Wait on Notification, then it will not wait forever (could a notification be missing?). Just check the Timed out? output to determine whether a valid notification was sent. It might be possible to use the timeouts for some or all of your delays in those places where you use notifiers or queues to transfer the data.

Lynn

Chris.Osiecki · ‎04-29-2008

There are some functions that don't need to run on every block of data, so I just wait while some of the data goes unused by that particular function. Once the wait is over, it will get the notifier on the next new block of data.

I have some startup delays so that everything isn't starting at once. Useage peaks at 70% or so, but for less than a second. Each of my sub routines writes its iteration counter value to a global. They also set a boolean true when they start and false when they stop. The task that monitors them reads the global counter values and status booleans every 500ms.

I'll verify about the waits, but I am pretty sure they do.

TWGomez · ‎04-29-2008

What does the function do though...It checks the interation globals, but for what purpose? Does it do something if they don't all match?

________________________________________________________

Use the rating system, otherwise its useless; and please don't forget to tip your waiters!
using LV 2010 SP 1, Windows 7
________________________________________________________

Joseph_Loo · ‎04-29-2008

There is a couple of things you might want to check out:

1. Is you machine controlled by an IT organization? If so, they may be sneaking things in the system cause a 100% utilization.

2. I remember somewhere, they mentioned that Labview uses a garbage collector (I could be wrong about this). If this is true, the peaks you are seeing could be a garbage collection phase.

3. You did not mention if the crashes is relatively uniform in time. If it is, you might want to try adding or removing memory on your computer to see if it affects the time between crashes.

Chris.Osiecki · ‎04-29-2008

The function just tells me how many times the loops have run and whether or not they are still running. It doesn't do anything else. I can use the rate of a program and the number of times it ran to tell me how long it had been running when it stopped. Since things run independently, I wouldn't have any other way of knowing what stopped when.

Chris.Osiecki · ‎04-29-2008

This computer is not connected to any network. The crashes are not uniform in time. They happen anywhere from a few minutes to many hours after starting the program. Memory useage reported by Windows never goes above 750MB. The system has 2 GB.

LabVIEW

Unknown lockup cause

Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause

Re: Unknown lockup cause