04-13-2015 04:27 AM
Hi
We have a cRio 9068 which seems to have inexplicably stopped working. The problem first occurred when I was running some C++ code on the device which then crashed. In order to restart the thing quickly I switched the power, from a regulated supply, off and then on again (I know I probably should have gone through MAX for this). After this the cRio would only boot into safe mode.
We have been in contact with an NI enginner who says the problem is due to a bootline error. The device will only boot into safe mode with an ‘improper installation’ error in NIMAX and if you try and format you get an error message; 'Error while reading from the local disk. The file may be corrupt or not present' . Through terminal I’ve tried nisystemformat -f -t ubifs -c -r -n none but, although that seemed to work it didn’t fix the problem.
We’re due to deploy the device very soon so I don’t want to send it back for repair. Is there any advice on how we can fix a bootline error like this? Can anyone shed some light on how such an error might have occurred?
We’re due to deploy quite a few of these devices and to say the least, it’s a bit worrying that one just suddenly stops working whilst on the bench. NI stuff is supposed to be very reliable so some information on this error from someone would be much appreciated.
Thanks for the help
Jamie
04-13-2015 08:51 AM
There is a recovery mode for this exact situation (restoring the system to factory configuration without requiring you to send it back in). Your support contact should be able to help you with that process. However that process will wipe out the evidence we would need to figure out how it got into that state in the first place and I agree that is important to understand. Let me know which is more important to you: deploying it very soon, or getting to the root cause of the problem. If you have time to get to the root cause that would be my preference. A first step in that direction would be to record verbose console output of an attempted boot into run mode (both the failed boot into run mode and the failover into safe mode may be informative). To increase verbosity of the boot output, set a nonzero bootdelay: from the safe mode admin prompt, "fw_setenv bootdelay 3" (or something else other than 0). If you have a chance, capture that and attach it here.
04-14-2015 05:29 AM
Thanks for the quick response!
I've done what you asked and attached .txt file with consule out from a restart. I noticed one line which showed an error out.
* Stopping Avahi mDNS/DNS-SD Daemon: avahi-daemon [fail]
I don't know if that's significant or not but it's line 414 if so.
The engineer we've been speaking to is away at the moment so if you could give some advice on how to factory reset that would be great. We'll wait untill you've had some time to get to the bottom of this though.
Thanks again for the help.
Jamie
04-14-2015 08:33 AM
One serious problem is this line:
The SystemWebServer daemon failed to start ...
I would have expected nisystemformat -c to fix the most common causes of that, but maybe (just guessing, so far) it's not a bad config partition (the "-c"), but a corrupt file that SWS needs on the rootfs (that could be fixed by calling nisystemformat without the "-c"). However that doesn't explain why you get an error trying to format while already in safe mode, and formatting without "-c" might clobber evidence, so you might not want to try that yet. I'll ask the developers who work on nisystemformat to suggest some diagnostic steps. The Avahi error is interesting and possibly related, good eye. I'll highlight that when I ask around. Meanwhile can you check that Avahi is running when you are booted to safe mode (from the console, "ps | grep avahi")?
I'd prefer to have you work with a support engineer for the factory reset ("recovery mode") process, if we get to that point. It's a low-level operation and not very user-friendly.
04-14-2015 09:46 AM
The SystemWebServer daemon failed to start ...
Before you reset your target, could you send us a copy of SystemWebServer's log files for analysis? The server and it's various modules create several log files on the disk while running. One is located at `/mnt/userfs/var/local/natinst/log/SystemWebServer.log` and several others are in the `/mnt/userfs/var/local/natinst/tracelogs/` directory.
Also, could you tell us which version of safe mode you're running and which version of LabVIEW Real-Time you're attempting to install? You can determine the safe mode version by running `nisafemodeversion` on the console.
04-15-2015 10:49 AM
Hi All
Sorry for late reply. So response to ps | grep avahi was ;
1430 avahi avahi-daemon: registering [NI-cRIO-9068-019a8a93.local]
1431 avahi avahi-daemon: chroot helper
1437 admin /usr/sbin/avahi-dnsconfd -D
1895 admin grep avahi
I've attached a zip file with all logs.
Thanks for the help.
Jamie
04-16-2015 02:33 PM
According to the system logs, System Web Server failed to open it's own log file (SystemWebServer.log) which is a fatal error that ultimately leads to boot failure.
$ tail -4 log/errlog.txt
04/15/2015 16:34:19 appweb: Error: Cannot open log file /var/local/natinst/log/SystemWebServer.log
04/15/2015 16:34:19 appweb: Error: Cannot write to ErrorLog: /var/local/natinst/log/SystemWebServer.log
04/15/2015 16:34:19 appweb: Error: Error with directive "ErrorLog"
04/15/2015 16:34:19 appweb: Warning: Server "default" cannot be unregistered from the Service Locator: unrecognized name
Unfortunately, I can't tell why this is happening just by looking at the log. My guess is that one or more of the directories along the path don't grant `webserv` user the necessary access. On a working NI Linux RT system, ownership and permissions for that path should look like this:
drwxr-xr-x 8 admin administ 872 Apr 13 11:03 /var/ drwxr-xr-x 3 admin administ 224 Apr 10 10:48 /var/local/ drwxrwxr-x 10 lvuser ni 784 Apr 13 11:04 /var/local/natinst/ drwxrwxr-x 2 lvuser ni 1368 Apr 15 07:36 /var/local/natinst/log/ -rw-r--r-- 1 webserv ni 0 Apr 15 07:36 /var/local/natinst/log/SystemWebServer.log
The run mode file system is mounted at /mnt/userfs/ when your target is in safe mode, so you can manually edit ownership/permission setting from the console. E.g. run `chmod 644 /mnt/userfs/var/local/natinst/log/SystemWebServer.log` to change the log file's permissions to `-rw-r--r--` and `chown webserv:ni /mnt/userfs/...` to change ownership to `webserv:ni`. You can use similar commands for the directories if they're broken.
All of these settings should reset when you reformat your target from MAX and reinstall software, so manual editing shouldn't be necessary if you're comfortable resetting your entire target. Unfortunately, I can't debug this any further without shell access to your system. I might be able to reproduce it on my end if you tell me which version of LabVIEW RT you're running.
04-17-2015 12:29 PM
Yeah, problem is I can't reformat from MAX - I get the error mentioned in the first post.
OS is NI Linux Real-Time ARMv7-A 3.2.35-rt52-2.0.0f0
I can try another reset from terminal as suggested by ScotSalmon but only want to do this if you all have what you need?
04-17-2015 03:15 PM
Yeah, problem is I can't reformat from MAX - I get the error mentioned in the first post.
OS is NI Linux Real-Time ARMv7-A 3.2.35-rt52-2.0.0f0
I'm guessing you're probably using a 2014 safe mode based on this system version string. However, I don't know that for sure unless you send me the output of the `nisafemodeversion` command.
Assuming it is 2014, I managed to reproduce at least one of the issue you're running into. It looks like MAX can't do a remote reformat after the config partition is manually reformatted using the command you posted earlier `nisystemformat -f -t ubifs -c -r -n none`. I filed a bug report for the issue. I was able to work around it by running the following commands from the safe mode console and restarting MAX on my desktop. These commands will reset both run mode and config partitions on your target -- I.e. a full wipe. Afterwards, you should be able to run a remote reformat and software install from MAX and get your target back into a working state.
nisystemformat -f -c -t ubifs nisystemformat -f -t ubifs reboot
I can try another reset from terminal as suggested by ScotSalmon but only want to do this if you all have what you need?
Unfortunately logs file can only get us so far. There's not much else that I can do remotely to diagnose your System Web Server failure. As I said in my last post, it could be a permission issue. You mentioned running some C++ code on the target before this happened. Was that program running as `admin` and changing something on the file system? On Linux systems, the super user (I.e. admin on the NI Linux RT distribution) can bypass all access restrictions and do just about anything. Even the smallest programming bugs can have big system-wide implications. Of course, it could just as easily be something in our code that broke the system as well. I wasn't able to reproduce any boot failures with a clean 2014 stack.
04-20-2015 07:29 AM
That worked a charm. Thanks! It's a relief to know this is indeed a software issue and recoverable without sending the device off.
As for the C code I wasn't changing anything out with the home folder other than LEDs. i.e. I was changing brightness settings in "/sys/class/leds/nizynqcpld:status:red/brightness","w". I'm stumped myself- No idea how this could have happened. If you want to have a detailed look at the code it's open source and located at https://sourceforge.net/p/plabuoy/svn-code/HEAD/tree/cRio_Daq_cpp/
Cheers
Jamie