11-24-2021 04:37 PM - edited 11-29-2021 11:02 AM
I am running into a strange issue I am not sure how to trouble-shoot/debug. I have a Main VI that has a static VI that launches Dynamic VIs.
I am needing to run headless, thus need to utilize the Embedded Runtime Engine.
First I wanted to make sure my code was working properly when compiled as a normal LabVIEW EXE. I wanted to test the long-term running of the EXE, no memory leaks, able to shutdown and load the Dynamic VIs as needed. This worked great. I see no memory leaks after running for 24-hours.
The DAQmx based instrument uses a User Event Structure to provide N samples acquired into Buffer for the Read VI to be triggered. The data is then sent to a consumer Queue and if there is a client connection, the data is sent out over TCP. The data is converted from 2D to 1D interleaved in the consumer, then flattened to string with Size = F. The data dequeue and data conversion happens whether there is a Client Connection or not, but the TCP Write VI is not called if there are no clients.
The Dynamic VIs and the support VIs needed (i.e. DAQmx VIs) are located in a Source Distribution. The Dynamic Launcher for both instances of EXE versions loads the Dynamic from the Source Distribution.
When I compile my Main.vi into a Shared Object and then call that VI with a C executable to run in the Embedded LabVIEW Runtime is where I am seeing an issue! The system loads the Dynamic VIs as designed. The DAQ instrument loads and launches properly. The data is passed over the Producer/Consumer Queue just fine. The data connection to the TCP Client works properly.
For both tests I use the same client and just leave it connected to stream the TCP data.
But the C EXE launched version running in the Embedded Runtime Engine crashes after about 3 hours with a "Double Free or Corrupt" memory error. I am not calling any kind of memory release function in my code of course. I am not building an array or concatenating any strings in the DAQmx VI. The Consumer Queue loop is keeping up just fine in both instances. The data conversion from 2D to 1D interleaved and then flatten to string all happen in 1/1000 of the time between data packets incoming from the Producer Loop.
CentOS 7.6 running Gnome
LabVIEW 2019 SP1 (x64 of course)
LabVIEW Runtime (for normally built LabVIEW EXE)
LabVIEW Embedded Runtime (for the Main.vi in the .SO and launched by C exe)
DAQmx is up to date
Normal LabVIEW EXE version:
This version of EXE requires an X GUI for being able to run the EXE.
Embedded LabVIEW Runtime version.
I am getting NO LabVIEW generated errors in my code. I am running using Syslog and each VI logs any error that might occur.
In my /var/log/messages file I will see the memory fault called out -
LabVIEW caught an Error
Double free or corrupt
Then the code will be aborted.
A memory dump appears to happen - but I do not know how to interpret this.
NI - what is the difference between the two run-time engines that would cause the same VIs to behave differently?
Is DAQmx fully tested in the Embedded LabVIEW runtime? That's one possible source of the issue I can think of...
Also occasionally in the Embedded Runtime version when I stop the DAQmx VI - I will also get Double Free or Corrupt fasttop error. All my opened references are closed in proper order, and I get no LabVIEW errors in closing my Queue ref or stopping my DAQmx task or releasing the User Event, etc. as the VI shuts down.
I get no unloading error when using the regular LabVIEW EXE/Runtime.
11-30-2021 11:56 AM
OK - there seems to be two issues.
I changed the code to remove a DVR read that was storing the DAQmx task. The read was a parallel read type with the DVR not being written back into the right side node.
The User Event Structure when triggered by the DAQmx Event would write the DAQmx Task and the Number of Elements into the DVR. The State Machine would transition to the Read Data state and the DAQmx task and Num Elements would be read from the DVR (of Variant, used as Attribute LUT).
To test if the read of the DVR of the VAR LUT was the issue, I moved the Read Data state into the User Event Structure.
This removed the Write into the DVR and the Parallel Access style READ from the DVR in the Read case. Now only the DAQmx Read and publish via Queue is in the User Event Case for the DAQmx Event. The read is also a MALLEABLE VI - and I had trouble with a Malleable not loading properly before in the dynamic launched VIs - maybe that contributed.
This is the code that I skipped calling in the Read Data case. It would pull out the Cluster of DAQmx Task and Number of Points. Ran fine for 3 hours, but seems to have been causing the SIGSEGV faults after that time. Only proof I have of that is that the code then ran for 12 hours and no SIGSEGV fault once I skipped this call.
The write that was in the USer Event Structure could also have been the issue I suppose, although I added code to skip the write if the DAQmx Ref and sampleInterval were the same (which they should be) so it seems like the read was the more likely issue:
The code ran for 12 hours this was in the C EXE form - up from 3 hours of running. That seemingly removed the SIGSEGV fault. I thought to try this since I saw a /var/log/messages entry of Variant writing empty (or some such verbage) around the prior crash where there was the SIGSEGV.
Again - none of this happens in the regular LabVIEW Runtime, only the Embedded.
This time the error captured by the system was a SIGBUS fault.
FileManager1: Reason:
FileManager1: LabVIEW caught a fatal signal
FileManager1: 19.0.1f1 - Received SIGBUS
After this the program seemed to still be running looking at the Resource Manager. The memory had not increased in the whole 12 hours - so there isn't a memory leak.
SIGBUS seems like there is a memory addressing issue on a read.
I'm not sure how to debug whether this is the DAQmx read that is causing the problem or the Queue used to pass from the producer to the consumer. I suppose I can cut out the Queue and not write the data anywhere after reading it from the DAQmx and that might narrow it down more.
Something is different between the normal LabVIEW EXE compiled version and the C EXE compiled into .SO version....as I have had no issue running this in the GUI version of the Runtime.
I suppose one way around this is to run the GUI based EXE Runtime, but then I would need to assign an X Display to the EXE on start-up - which might be possible. https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000019RYlSAM&l=en-GB
12-02-2021 10:29 AM
More validating experimentation run last night.
I started the regular LabVIEW runtime engine EXE version from a terminal and assigned a display to it using >DISPLAY=:0 ./EXE_name
This version again has run for 24 hours with no issue, no crashes, no memory leaks. I was able to stop and close the dynamic DAQ instrument, and then reload as designed with no crashing. I'd rather not have to install a GUI on the system that this will be deployed onto for production use, but I guess that is one solution if we can't figure out what is happening with the Embedded Runtime.
12-22-2021 12:27 PM
I've run the normal LabVIEW EXE runtime for over 4 days and been able to shutdown and reload the instrument with no errors.
The C EXE that calls the LabVIEW Main from the .SO to run in Embedded Runtime will only run up to about 24 hours.
I ran it last night and found that this morning it was still running, but that when I issued the shutdown command the C EXE crashed with a ‘abrt-hook-ccpp’ SIGABRT crash.
The only thing I really haven't done yet it transition from LVCLASS in the VI and instead of dynamic LVClass loading, just use regular nodes and forego the HAL aspect of the code.
The other option to try is going to LV2020