09-14-2021 07:12 AM
On behalf of Nightshade42:
Hi all,
Interesting issue when trying to send requests to a DQMH module from TestStand. Part of our software application is a single DQMH module that is used as an OPC client – it uses the OPC UA Toolkit and is compiled as a PPL. In TestStand, we are using the parallel process model as we have 18 custom test devices for testing UUT’s simultaneously. The status of these test devices is sent to the OPC client module which in turn sets tags on the local OPC server for use by the system PLC. We have a separate DQMH module as a driver for the test devices which we clone for each test device. TestStand is configured to use LabVIEW Run-Time Engine as the server. Other info: TestStand 2020 (64-bit), LabVIEW 2020 SP1 (64-bit), DQMH 5.
Now if we need to re-initialise all of these test devices (say in response to a safety stop), they will all want to send an updated status to the OPC Client module more or less simultaneously. From the test device DQMH module clones, they each call a TestStand sub-sequence with the updated test device status which makes a request to the OPC client module. So if 18 test devices are initialised at the same time, the 18 test device modules all call the same sub-sequence which all make a request to the single OPC client and generating multiple events in the EHL of the OPC client module. This works okay for one or two test devices, but when we get more, the whole LabVIEW/TestStand environment seems to lock up for a few seconds while it sorts itself out. This lock up behaviour is causing other issues elsewhere in the software, so we want to ensure there are no delays introduced when updating the OPC tags.
What is interesting is if I skip the OPC Client API request step in the TestStand sub-sequence, the whole software works fine (apart from not updating the OPC tags obviously) and does not lock up. So calling the same sub-sequence multiple times from the test device modules is not the issue. If I enable the request step in TestStand, but disable all functions in the EHL of the OPC Client module (that is, the request event is generated in the EHL, but it doesn’t functionally do anything at all, including enqueuing a message in the MHL), we see the lock up behaviour. So it seems like actually calling the DQMH API request multiple times from TestStand is the issue here.
We have a workaround where we introduce a delay for different test devices to try to spread the OPC Client API requests out a bit, but we still see it.
Has anyone else seen something similar and have a solution?
09-14-2021 09:42 AM - edited 09-14-2021 09:42 AM
Hi,
From what I understand, things are struggling between the moment the request is fired and the moment it is treated (event is dequeued from the event structure FIFO).
Here are few questions to help narrowing the problem :
From my understanding, the problems happen o the LV side. Can you modify your tester to reproduce such behavior ? (add a button to fire N times your request in a row)
09-14-2021 05:25 PM
Hi Cyril, thanks for the response.
1. Request VI is re-entrant. I've run the module through DQMH validation tool and also checked VI properties myself
2. Deadlock happens on request or event? Don't know. I will look at it today. Event inspector would be nice to use, but this is behaviour is on the deployed system using compiled PPLs, so I don't think I'll be able to use it.
3. Module EHL only enqueues a message to the MHL - nothing else. Even if the enqueue message is disabled, I see this lock-up issue.
I can reproduce the issue by re-initialising all of the test devices so that they all make the same request to this module, but I haven't done anything else yet
09-15-2021 09:45 PM
Hi again,
So I've found that it had nothing to do with the DQMH module or the OPC UA toolkit. It was simply that we were calling a VI too many times - the type of VI didn't matter.
I replaced the DQMH request VI in the TestStand sub-sequence with a very simple stand-alone VI (output = input +1). This VI was not included in a PPL and was set to re-entrant. I ran it again and still had the issue. All of the test devices were calling this sub-sequence at the same time which all called this VI at the same time. Too many VI calls and it locked up the software. Again, if I skipped the VI, it all worked without locking up, so I know it wasn't calling a sub-sequence multiple times which caused the problem, it was calling VI's inside the sub-sequence.
09-15-2021 09:59 PM - edited 09-15-2021 10:00 PM
This is very surprising if the VI is re-entrant !
If so, it seems to be an issue related to TestStand. But I've never experienced such kind of behaviour.
This could be problematic for anyone using the parallel or batch model!
May it be related to a sequence setting somewhere?
Maybe opening a support with NI about this would highlight a problem somewhere.
09-15-2021 11:32 PM
Quite possibly. I also looked at all of the sequence properties, but couldn't find anything that would make a difference.
I can't really do anything about so many requests from the test device modules occurring at the same time. I counted them up and there were 54 calls to this VI in this sub-sequence in about 700ms, plus another 18 calls to a different VI in a different sub-sequence at the same time.
I am developing a workaround now where I just write the values to TestStand StationGlobals. The OPC Client module polls those values regularly, and updates the OPC tags itself. This seems to work well, so I don't think this is a show-stopper issue any more, but it would be good to know how to resolve the original issue.
As an aside, the DQMH API request VIs are reentrant, but Obtain Request Events.vi are not. I think I need to review my reentrancy knowledge, but if the calling VI is reentrant, does this automatically make the sub-VI reentrant as well?
09-16-2021 01:19 AM
I do not know the OPC-UA integration. But there is a similar behaviour when working with .NET code. We therefore always use the following settings:
RunState.Engine.DoDotNetGarbageCollection(0)
Engine.DotNetGarbageCollectionInterval = 10000
Maybe you have to experiment a bit with the settings.
09-16-2021 01:25 AM
Thanks for the reply. We know it is definitely not an OPC-UA toolkit or DQMH issue.
I'll keep your settings in mind though.
09-16-2021 02:58 AM
SubVIs of a reentrant VI are not made reentrant.
You can check if this is your problem by modifying your vi to use a feedback node to cache the User Event that it is getting from that non-reentrant SubVI (since I don't think it ever changes).
09-16-2021 06:26 AM
Indeed reentrancy do not apply to subVIs.
All the subVIs within a VI must be re-entrant to achieve a 'real' reentrancy.
The VI you can from TS is a request which is re-entrant and only contains primitives and re-entrant VIs.
The VI you pointed out of played only once per module life cycle. So I'm doubting this one is the problem.
If I remember well, with shared reentrancy LV creates a pool of VIs that will be launched one after the other upon request. When the end of the pool is reached, it must reallocate memory to extend the pool for new callers. When a caller ID stopped, it lets an empty space in the pool which then can be taken by a new instance. Not sure about that, but I've got in a corner of my head that the pool is set to 4 instances (each time you call a re-entrant VI and the pool is full, LV reallocates 4 new memory spaces in the pool).
You're definitely overpassing that number. May it takes time to run the allocation mechanism in a row?
Have you tried changing the reentrancy of the VI to have is own memory space (preallocate ; must be applied to all subVIs) ?