Real-Time Measurement and Control

cancel
Showing results for 
Search instead for 
Did you mean: 

cRio 9068 bootline error

Hi

 

We have a cRio 9068 which seems to have inexplicably stopped working. The problem first occurred when I was running some C++ code on the device which then crashed. In order to restart the thing quickly I switched the power, from a regulated supply, off and then on again (I know I probably should have gone through MAX for this). After this the cRio would only boot into safe mode.

 

We have been in contact with an NI enginner who says the problem is due to a bootline error. The device will only boot into safe mode with an ‘improper installation’ error in NIMAX and if you try and format you get an error message; 'Error while reading from the local disk. The file may be corrupt or not present' . Through terminal I’ve tried  nisystemformat -f -t ubifs -c -r -n none but, although that seemed to work it didn’t fix the problem.

 

We’re due to deploy the device very soon so I don’t want to send it back for repair. Is there any advice on how we can fix a bootline error like this? Can anyone shed some light on how such an error might have occurred?

 

We’re due to deploy quite a few of these devices and to say the least, it’s a bit worrying that one just suddenly stops working whilst on the bench. NI stuff is supposed to be very reliable so some information on this error from someone would be much appreciated.

 

Thanks for the help

 

Jamie

 

0 Kudos
Message 1 of 10
(6,428 Views)

There is a recovery mode for this exact situation (restoring the system to factory configuration without requiring you to send it back in). Your support contact should be able to help you with that process. However that process will wipe out the evidence we would need to figure out how it got into that state in the first place and I agree that is important to understand. Let me know which is more important to you: deploying it very soon, or getting to the root cause of the problem. If you have time to get to the root cause that would be my preference. A first step in that direction would be to record verbose console output of an attempted boot into run mode (both the failed boot into run mode and the failover into safe mode may be informative). To increase verbosity of the boot output, set a nonzero bootdelay: from the safe mode admin prompt, "fw_setenv bootdelay 3" (or something else other than 0). If you have a chance, capture that and attach it here.

 

0 Kudos
Message 2 of 10
(6,419 Views)

Thanks for the quick response!

 

I've done what you asked and attached .txt file with consule out from a restart. I noticed one line which showed an error out.

 

 * Stopping Avahi mDNS/DNS-SD Daemon: avahi-daemon                       [fail]

 

I don't know if that's significant or not but it's line 414 if so.

The engineer we've been speaking to is away at the moment so if you could give some advice on how to factory reset that would be great. We'll wait untill you've had some time to get to the bottom of this though.

 

Thanks again for the help.

 

Jamie

0 Kudos
Message 3 of 10
(6,407 Views)

One serious problem is this line:

 

The SystemWebServer daemon failed to start ...

 

I would have expected nisystemformat -c to fix the most common causes of that, but maybe (just guessing, so far) it's not a bad config partition (the "-c"), but a corrupt file that SWS needs on the rootfs (that could be fixed by calling nisystemformat without the "-c"). However that doesn't explain why you get an error trying to format while already in safe mode, and formatting without "-c" might clobber evidence, so you might not want to try that yet. I'll ask the developers who work on nisystemformat to suggest some diagnostic steps. The Avahi error is interesting and possibly related, good eye. I'll highlight that when I ask around. Meanwhile can you check that Avahi is running when you are booted to safe mode (from the console, "ps | grep avahi")?

 

I'd prefer to have you work with a support engineer for the factory reset ("recovery mode") process, if we get to that point. It's a low-level operation and not very user-friendly.

0 Kudos
Message 4 of 10
(6,400 Views)

 

The SystemWebServer daemon failed to start ...

Before you reset your target, could you send us a copy of SystemWebServer's log files for analysis? The server and it's various modules create several log files on the disk while running. One is located at `/mnt/userfs/var/local/natinst/log/SystemWebServer.log` and several others are in the `/mnt/userfs/var/local/natinst/tracelogs/` directory.

 

Also, could you tell us which version of safe mode you're running and which version of LabVIEW Real-Time you're attempting to install? You can determine the safe mode version by running `nisafemodeversion` on the console.

 

0 Kudos
Message 5 of 10
(6,391 Views)

 

Hi All

 

Sorry for late reply. So response to  ps | grep avahi was ;

 

1430 avahi avahi-daemon: registering [NI-cRIO-9068-019a8a93.local]
1431 avahi avahi-daemon: chroot helper
1437 admin /usr/sbin/avahi-dnsconfd -D
1895 admin grep avahi

 

I've attached a zip file with all logs.

 

Thanks for the help. 

 

Jamie

0 Kudos
Message 6 of 10
(6,364 Views)

According to the system logs, System Web Server failed to open it's own log file (SystemWebServer.log) which is a fatal error that ultimately leads to boot failure.

 

$ tail -4 log/errlog.txt 
04/15/2015      16:34:19        appweb: Error: Cannot open log file /var/local/natinst/log/SystemWebServer.log
04/15/2015      16:34:19        appweb: Error: Cannot write to ErrorLog: /var/local/natinst/log/SystemWebServer.log
04/15/2015      16:34:19        appweb: Error: Error with directive "ErrorLog"
04/15/2015      16:34:19        appweb: Warning: Server "default" cannot be unregistered from the Service Locator: unrecognized name

 

 

Unfortunately, I can't tell why this is happening just by looking at the log. My guess is that one or more of the directories along the path don't grant `webserv` user the necessary access. On a working NI Linux RT system, ownership and permissions for that path should look like this:

 

drwxr-xr-x    8 admin    administ       872 Apr 13 11:03 /var/
drwxr-xr-x    3 admin    administ       224 Apr 10 10:48 /var/local/
drwxrwxr-x   10 lvuser   ni             784 Apr 13 11:04 /var/local/natinst/
drwxrwxr-x    2 lvuser   ni            1368 Apr 15 07:36 /var/local/natinst/log/
-rw-r--r--    1 webserv  ni               0 Apr 15 07:36 /var/local/natinst/log/SystemWebServer.log

 

The run mode file system is mounted at /mnt/userfs/ when your target is in safe mode, so you can manually edit ownership/permission setting from the console. E.g. run `chmod 644 /mnt/userfs/var/local/natinst/log/SystemWebServer.log` to change the log file's permissions to `-rw-r--r--` and `chown webserv:ni /mnt/userfs/...` to change ownership to `webserv:ni`. You can use similar commands for the directories if they're broken.

 

All of these settings should reset when you reformat your target from MAX and reinstall software, so manual editing shouldn't be necessary if you're comfortable resetting your entire target. Unfortunately, I can't debug this any further without shell access to your system. I might be able to reproduce it on my end if you tell me which version of LabVIEW RT you're running.

 

 

0 Kudos
Message 7 of 10
(6,341 Views)

Yeah, problem is I can't reformat from MAX - I get the error mentioned in the first post. 

 

OS is NI Linux Real-Time ARMv7-A 3.2.35-rt52-2.0.0f0

 

I can try another reset from terminal as suggested by ScotSalmon but only want to do this if you all have what you need?

0 Kudos
Message 8 of 10
(6,316 Views)

Yeah, problem is I can't reformat from MAX - I get the error mentioned in the first post. 

OS is NI Linux Real-Time ARMv7-A 3.2.35-rt52-2.0.0f0



I'm guessing you're probably using a 2014 safe mode based on this system version string. However, I don't know that for sure unless you send me the output of the `nisafemodeversion` command.

Assuming it is 2014, I managed to reproduce at least one of the issue you're running into. It looks like MAX can't do a remote reformat after the config partition is manually reformatted using the command you posted earlier `nisystemformat -f -t ubifs -c -r -n none`. I filed a bug report for the issue. I was able to work around it by running the following commands from the safe mode console and restarting MAX on my desktop. These commands will reset both run mode and config partitions on your target -- I.e. a full wipe. Afterwards, you should be able to run a remote reformat and software install from MAX and get your target back into a working state.

 

nisystemformat -f -c -t ubifs
nisystemformat -f -t ubifs
reboot

 

 

 


 I can try another reset from terminal as suggested by ScotSalmon but only want to do this if you all have what you need?

Unfortunately logs file can only get us so far. There's not much else that I can do remotely to diagnose your System Web Server failure. As I said in my last post, it could be a permission issue. You mentioned running some C++ code on the target before this happened. Was that program running as `admin` and changing something on the file system? On Linux systems, the super user (I.e. admin on the NI Linux RT distribution) can bypass all access restrictions and do just about anything. Even the smallest programming bugs can have big system-wide implications. Of course, it could just as easily be something in our code that broke the system as well. I wasn't able to reproduce any boot failures with a clean 2014 stack.

 

Message 9 of 10
(6,301 Views)

That worked a charm. Thanks! It's a relief to know this is indeed a software issue and recoverable without sending the device off.

 

As for the C code I wasn't changing anything out with the home folder other than LEDs. i.e. I was changing brightness settings in  "/sys/class/leds/nizynqcpld:status:red/brightness","w". I'm stumped myself- No idea how this could have happened. If you want to have a detailed look at the code it's open source and located at https://sourceforge.net/p/plabuoy/svn-code/HEAD/tree/cRio_Daq_cpp/

 

Cheers

 

Jamie

0 Kudos
Message 10 of 10
(6,278 Views)