I have a cRIO based LabVIEW program that is intended to run for a long period of time. I'm monitoring the resident set size of the labview process using the method outlined here (ie, cat /proc/$(pidof lvrt)/status | grep RSS). This shows a stable (or possibly very slowly increasing) amount of memory usage at ~300MB. However, my program mysteriously trips my watchdog and crashes after ~2 days.
While investigating, I monitored the total amount of memory available with the property node method outlined here. The amount of memory remaining continually shrinks from >1.3 GB of memory an hour after my program starts to ~480 MB an hour before it crashes.
Does anyone know what a possible source for this discrepancy could be or any logic next steps I can take in figuring this out?
I'm using a cRIO-9048 with LabVIEW 2018 and the NICRIO1810 drivers.
Thanks for the ideas k-waris. I don't get a particular error message, I just see (after the fact) that my cRIO re-booted itself in the logs, presumably after my watchdog timed out. I have some error management/reporting in place, but it is not completely comprehensive. I have some intermittent performance errors (dropping incoming Ethernet packets) that seem to get worse the longer my program runs.
I thought I was fairly diligent about closing all of my references, but it never hurts to double check. If stray references were the cause, do you know why this would reveal itself in decreasing system memory but not increase the labview process RSS? According the this page "[RSS] memory monitoring is primarily useful for detecting memory leaks], so I would expect a problem to reveal itself here.
Unfortunately I cannot share my code publicly, which I realize limits the amount of help you can offer.
rtollert, I just restarted my program and will monitor /tmp for anything strange.
To know whether or not your watchdog rebooted your cRIO, it would be useful to know if the watchdog is dependent on certain parts of your code running or just a heartbeat between two devices or something else.
Since you've mentioned network traffic dropping in and out, I'll share a mistake I've made in the past which was that I put TCP/IP communication inside of a timed loop on the RT. That's a big no-no because of the jitter associated with TCP/IP communication. I was dropping my network connection frequently when I had it in a timed loop and if your watchdog depends on that then maybe it's rebooting your cRIO.
If you want to confirm that it is the watchdog that timed out, there is a file you can check. Discussed here:
You may find that the file path given in the solution has changed slightly. Look for i2c.
One other thought. if it isn't already, the watchdog code should be in a loop of its own. I have seen code where the watchdog code is in a loop that is performing other functions and it doesn't get petted in time.