10-24-2019 05:33 AM
I have a large object-orientated application that is performing laser machining on multiple machines in a factory. All machines run the same LabVIEW application and have identical software components installed to the PC. (Win7, 64-bit LabVIEW etc)
All products comprise of multiple lines that are machined by a motion system controlled via a .NET interface. Each line that we machine, around 200,000 per day, are read from file in a producer loop and buffered in a queue before being dequeued by a consumer loop and machined, the queue is never bigger than 10 lines because the read producer loop queries the current queue size before deciding to read more lines.
This producer-consumer architecture is a basic LV building block and works well. The actual queue reference is initialized and stored in a Functional Global VI, when a line is read from file, the data is used to populate a CLUSTER which is then passed to Enqueue element. The reference wire from the Global goes first to GetQueueStatus, then to Enqueue element. It works perfectly every day, but very occasionally, around once every
2-months, we see this error:
"T-1-Get Queue Status in Cut.lvclass:EnqueueLine.vi->Cut.lvclass:RunProcess.vi->Rig_Operation_Main.vi"
Error 1 means an input is invalid. After the error occurs, I'm able to go back and check the text file reads okay and the CLUSTER populates. I'm therefore of the belief that the error occurs because the Queue Reference is invalid.
Can anyone offer suggestions as to why this might happen and how to fix the bug.
My code writes a text log file as it goes to aid debug, it looks like this when the error happens:
Normal running:
20191021 162903Z EnqueueLine.vi Start
20191021 162903Z Enqueue LineIndex 429 Start Queue Size 18 End Consumer Queue Size 26
20191021 162903Z EnqueueLine.vi Finish
With error:
20190824 110326Z EnqueueLine.vi Start
20190824 110326Z Enqueue LineIndex 1041 Start Queue Size 0 End Consumer Queue Size 0 (T-1-Get Queue Status in Cut.lvclass:EnqueueLine.vi->Cut.lvclass:RunProcess.vi->Rig_Operation_Main.vi)
20190824 110326Z EnqueueLine.vi Finish (T-1-Get Queue Status in Cut.lvclass:EnqueueLine.vi->Cut.lvclass:RunProcess.vi->Rig_Operation_Main.vi)
10-24-2019 06:43 AM
The most likely situation is you have a case structure or Event Structure that is using the "Use Default If Unwired" tunnel option and you failed to wire through your queue reference. That feature should be turned off in 98% of the situations I have seen.
10-24-2019 07:34 AM
The most likely is still some simple programmer error. But:
Queues and very rare instances of invalid refnums... I recalled that a former colleague of mine fixed a bug in G# years ago that had these symptoms. The bugfix note claims "LabVIEW queues ref nums seems to be reused every 4096 element."
If you cannot find anything, I can call him and ask him to elaborate on the issue.
10-24-2019 09:51 AM
Thanks for the suggestions. Here a snip of code to be clear how simple it is.
The Producer Loop:
Inside the Enqueue VI:
The queue reference comes from the yellow colored Functional Global Variable VI. At 200,000 lines per day between 9 manufacturing rigs, that's 73 million enqueues a year. We're seeing the fail around 6 times per year.
So far, it has only failed at LineIndex greater than 1000. The largest features machined are around 6000 lines.
You will notice I have put a case structure and a while loop in there. This is a recent attempt at reversion, if the error occurs, the VI will clear the error, re-read the Reference and re-attempt the Enqueue.
10-24-2019 10:07 AM
It has also occurred to me that GetQueueStatus is also called in the Consumer Loop.
If both Producer and Consumer loop try to call GetQueueStatus at the same instant, I would imagine that both have there own reference to the call and the LabVIEW scheduler handles it all in the background. But maybe the potential exists for a bug.
10-24-2019 10:43 AM
I had the same error except for a file reference that manifested itself in an edge case. Not a programming error, probably an OS or intrinsic LabVIEW problem, if @rolfk is reading this, he would probably know the answer.
Below is the offending code.
Basically, I would write a special file header, close the file, than open it using a H5 library. Every now and then, I would get the invalid reference error, where my file ID was not valid, after the H5 Open File function. The error was telling me that the file I just closed before that function was not valid. Note the Close file function did not give any errors. According to data flow, everything should be valid.
This was an edge case. When I used the same exact files on internal disk drive, I never got an error. I only got the error when I was using an external USB drive and had a lot of small files that I was converting, that is, going through the code above. For whatever reason, the latency over the USB bus was not taken into account. My work around was to catch the error and retry the Open H5 function if there was an error. That is, after a small delay the invalid file reference suddenly became valid. This worked.
This has no relation to your problem, but if you are opening references on external devices, there may be some unaccounted latency. In my case, it was not really a programming error.
mcduff
10-24-2019 10:56 AM
Hi mcduff, thanks for the info, I'm not using any external storage that might cause a delay, however, your experience gives me hope that my reversion code will solve the bug.
10-24-2019 12:45 PM
A far out idea that is mostly likely not the case but just in-case...
If a Action Engine (or any VI( is set as "sub-routine", instances of the sub-VI will have a call option "skip if busy".
If the VI in question is configured to "Skip if Busy" there is a possibility that another thread is using the sub-VI, and the configured instance will be skipped in which chase the skipped VI will return the "default-default" of any data returned by the sub-VI.
If the VI is not set as sub-routine forget about everything I just wrote,
Ben
10-25-2019 04:36 AM
thanks Ben, I searched for callers, the Enqueue VI is used only in one place, it's only a sub-VI to keep the diagram tidy. Soon the code with the reversion that re-reads the reference will be deployed to the factory floor. I'll report back if the bug remains.
06-18-2020 05:13 AM
I can confirm the bug has now gone. The reversion routine that re-reads the reference must be effective as we have had no more fails.