I apologize in advance if my questions are basic. I'm fairly new to a job that involves numerous Compact RIO 9074's deployed to monitor many scientific instruments. About once a week, one of the 9074's (not the same one each time) will decide to reboot itself. This random cRIO might restart normally or it might decide to enter safe mode, and I have not figured out how to predict which will happen. Sometimes the restarts are preceded by an entry in the kernel error log, sometimes they are not. I have been unable to find much information about the error code given, which is Exception code: 0x00000300.
Once the 9074 enters safe mode, all the instruments are effectively offline until I can restart the Compact RIO. If I reboot it from NI MAX, it typically starts normally and this confuses me. If there was an issue with the software, whether FPGA or RT, I would think it would reboot into safe mode every time. It's the intermittent nature of the problem that is difficult.
To track this down, I've tried to correlate restart frequency with case ventilation (the 9074's are inside rack mount boxes) and number of shared variables each cRIO hosts. I have been unable to find any satisfying correlations there.
I have attached two error logs. Both units spontaneously rebooted overnight last night. The one with a kernel exception in the log restarted itself normally. The one without a kernel exception in the log entered safe mode and I had to restart it manually.
How can I get to the bottom of a) why these units are restarting themselves and b) why they occasionally start up in safe mode?