What conditions satisfy the "reboot into safe mode when crashed" behavior?

Daklu · ‎04-19-2017

First, this question actually pertains to a VxWorks board, not a Linux board. I'm posting here because there doesn't seem to be a relevant real-time community for VxWorks (and my question may be OS agnostic... I hope.)

I have an application written in LV2013 running on an sbRIO-9606 with CompactRIO 13.1 installed. Under certain conditions we do an automatic reboot using the RT Restart Target vi from rtutility.llb. Occasionally, the device will reboot into safe mode (improper installation) instead of run mode. (I'm rebooting every 3-4 minutes for the purposes of figuring this out. In production reboots may happen as often as every 20 minutes.)

I know we're not running out of storage space as was the problem with a previous user. I also know about the YouOnlyLiveTwice key, and using that seems to prevent the device from rebooting into safe mode, but I'd rather address the root problem (something unknown is triggering safe mode) rather than mask it (prevent it from entering safe mode when it thinks it should.)

What does the OS look for when deciding whether it should reboot into safe mode?

Daklu · ‎04-20-2017

@Daklu wrote:

Under certain conditions we do an automatic reboot using the RT Restart Target vi from rtutility.llb.

Quick note: We've replaced these VIs with the VIs from the session API. Rebooting into safe mode still occurs at unpredictable intervals.

BradM · ‎04-21-2017

Hi Daklu,

Since I'm less familiar with VxWorks targets, I passed along your question to some of the greybeards who have a bit more experience with it. Since the target is reporting "Improper Installation", the concern is that something is happening to the system that causes the actual runtime (or needed .out libraries) to fail to load. It's important to check/take note of the reason in safemode to rule out other issues.

Universally, the recommendation was to enable serial console and try to capture what's going on when this issue manifests.

dkfire · ‎07-25-2017

Hey

Have you found any solution to your problem.

We seems to have the same setup, with LabVIEW 2013, running an application on a cRIO-9074 that has to reboot once every day, but from time to time it enters Safe Mode with a software failure.

We haven't been able to find a reason for it getting into Safe Mode after a reboot.

Daklu · ‎07-25-2017

We found a solution, but we never positively identified the root cause.

In our system a reboot can be triggered by the sysconfig Restart vi (if shutdown occurs smoothly) or by not whacking the watchdog (if shutdown doesn't occur smoothly.) We discovered some of the loops were not shutting down prior to the reboot occurring, leading us to believe the watchdog was causing the reboot.

I suspect when the watchdog reboots the system under certain conditions, the system thinks the software crashed. By design, when the system "crashes" twice in a row it reboots into safe mode.

However, rebooting into safe mode wasn't consistent behavior, implying the conditions for deciding the software crashed consist of more than just having the watchdog trigger a reboot. I don't know what those conditions are, or what conditions cause the crash counter to get reset. (e.g. Does pushing the reset button set the crash counter to zero?)

Ultimately we solved the problem by hardening our shutdown process and making sure all the loops stop execution properly.

-Dak

(We did follow Brad's suggestion and captured console output, but solved the problem before we were able to capture anything interesting.)

BradM · ‎07-26-2017

Even though it seems that this topic has reached a sensible conclusion, I just wanted to add information I've gathered to help others who may run into this sort of situation:

Daklu wrote:
...
I have an application written in LV2013 running on an sbRIO-9606 with CompactRIO 13.1 installed. Under certain conditions we do an automatic reboot using the RT Restart Target vi from rtutility.llb. Occasionally, the device will reboot into safe mode (improper installation) instead of run mode. (I'm rebooting every 3-4 minutes for the purposes of figuring this out. In production reboots may happen as often as every 20 minutes.)

I know we're not running out of storage space as was the problem with a previous user. I also know about the YouOnlyLiveTwice key, and using that seems to prevent the device from rebooting into safe mode, but I'd rather address the root problem (something unknown is triggering safe mode) rather than mask it (prevent it from entering safe mode when it thinks it should.)

NI Linux RT is a different beast altogether: if LabVIEW RT crashes (either due to an actual bug in LV or your application, or issues in libraries used by your application), the OS will not reboot. Instead, LabVIEW RT itself will be restarted (one of the benefits of a dual-mode OS). If this happens twice, if you have configured a startup application (and have not set the "YouOnlyLiveTwice" token), LabVIEW will not load your startup application (it's likely that something in the application is causing LVRT to crash).

VxWorks, being a single-mode OS, has no notion of a separate "process". LabVIEW RT is just part of the OS, the single process that is running on the system. The OS has a notion of threads, different sequences of execution with a shared memory space, and this is how LVRT on VxWorks (and Pharlap ETS) provide multithreaded support. This is also why, if a LabVIEW RT application misbehaves on these targets, it will cause an OS-level exception. This means the OS can no longer make guarantees of the sanity of the shared memory space (or processor state, for that matter). This exception will reboot the controller to restore a known good state to the OS.

On a normal reboot, a section of non-volatile memory is used to track this notion that the system reset was application-requested and that things shut-down cleanly. On an exception, the code that handles the exception uses this same area of non-volatile memory to keep track of how many times the system has restarted due to system crashes. If they exceed one, the default action is to reboot into safemode.

Now, the tricky bit is that, depending on what's going on in your application when you've requested a reboot of the target, this can lead to crashes. This is particularly tricky since, externally, it seems OK: you've requested a reboot, the target, indeed, did reboot. The issues come in when you see the situations that you're seeing: that the reboot requests actually triggered a crash and ultimately lead to the target booting into safemode. This was done in the hopes of making sure your target would be remotely accessible in the event of a problematic app (if the OS is continually rebooting, you can't access it from MAX).

Finally, a button-press reboot does not modify the "crash count", it doesn't zero it out, it doesn't increment it, so it doesn't really play into this.

BradM · ‎07-26-2017

Also, this is probably the most suitable location for this question: https://forums.ni.com/t5/Real-Time-Measurement-and/bd-p/280

NI Linux Real-Time Discussions

What conditions satisfy the "reboot into safe mode when crashed" behavior?

What conditions satisfy the "reboot into safe mode when crashed" behavior?

Re: What conditions satisfy the "reboot into safe mode when crashed" behavior?

Re: What conditions satisfy the "reboot into safe mode when crashed" behavior?

Re: What conditions satisfy the "reboot into safe mode when crashed" behavior?

Re: What conditions satisfy the "reboot into safe mode when crashed" behavior?

Re: What conditions satisfy the "reboot into safe mode when crashed" behavior?

Re: What conditions satisfy the "reboot into safe mode when crashed" behavior?