Continuous system crash

peter_smith · ‎07-27-2008

Hi Everybody!

We have installed 10 pieces of cFP 2200 rt controllers for a company. All of them run the same software, only the I/O channel names differ. Those names are red from a config file, so the software running on them is exactly the same.

The strange thing is that 2 of them crashes many times. Sometimes they become unreachable, sometimes MAX can see them, but says Safe Mode, Software Error.

I have reinstalled the Application and the OS (Fieldpoint 6.0.1) many times on them, but no success.

I searched a lot for the error, but the only thing I found that could be useful is a .txt file on the controllers which is created when the system crashes: rtlog.txt.

This file can be found only on these 2 controllers. So I'm absolutely sure this is a VxWorks / NI problem.

The log files contain the same error log many times with different timestamps, so the problem is the same every occasion.

Here is one Error log from the first controller

*** BEGIN SYSTEM EXCEPTION LOG ***

Target type: cFP-2200
Target code: 7351

System time (UTC): 2008-07-27 19:18:22
System tick count: 7897 ms

Exception code: 0x00000700

Register contents:
DAR = 0x00000000 DSISR = 0x00000000
MSR = 0x0008B032 FPCSR = 0x82028000
LR = 0x00333284    CTR = 0x00000050
CR = 0x00000000    XER = 0x00000000

GPR 0 = 0x00000050    GPR 1 = 0x008c0e78
GPR 2 = 0x00000000    GPR 3 = 0x006b9b00
GPR 4 = 0x006B9B60    GPR 5 = 0x00000001
GPR 6 = 0x00000001    GPR 7 = 0x00000002
GPR 8 = 0x00000000    GPR 9 = 0x00000050
GPR 10 = 0x006B9BB4    GPR 11 = 0x008c0e94
GPR 12 = 0x008C0FB8    GPR 13 = 0x00000000
GPR 14 = 0x00000000    GPR 15 = 0x00000000
GPR 16 = 0x00000000    GPR 17 = 0x00000000
GPR 18 = 0x00000000    GPR 19 = 0x00000000
GPR 20 = 0x00000000    GPR 21 = 0x00000000
GPR 22 = 0x00000000    GPR 23 = 0x00000000
GPR 24 = 0x00000000    GPR 25 = 0x00000000
GPR 26 = 0x00000000    GPR 27 = 0x00000000
GPR 28 = 0x00000000    GPR 29 = 0x00000000
GPR 30 = 0x0035A5E0    GPR 31 = 0x008c0e78
PC = 0x00000050 in module 0x0

Thread ID: 0x008C0FB8   Thread name: USB OHCI Interrupt
Thread stack base: 0x008C0FB8 stack size: 8192

Call Stack:

All Loaded Modules:
    MODULE NAME     MODULE ID TEXT START DATA START BSS START
    --------------- ---------- ---------- ---------- ----------
       mxsutils.out 0x00d77798 0x00dc91e8 0x00e3ad28 0x00e3d820
          mxsdb.out 0x00b0aea8 0x00c78a48 0x00d64be0 0x00d6b1f0
         nirpcs.out 0x00b09e40 0x00c27ac0 0x00c3dd20 0x00c3e128
         glogos.out 0x00aed4a8 0x0102eac8 0x01131200 0x01135460
        logosrt.out 0x01f73030 0x00e53b68 0x01015ea0 0x010241d0
         bb_lib.out 0x00aec3e0 0x01f08fe0 0x01f68670 0x01f69f00
           lvrt.out 0x00abca50 0x016bfc98 0x01e9f750 0x01eee188
       libexpat.out 0x00abbe68 0x00ac8d28 0000000000 0x00ae6858
         ni_emb.out 0x00a97ce8 0x00ab44f0 0x00abb160 0x00abb238
       ftpserve.out 0x00a50360 0x00a98c18 0x00aa2c28 0x00aa3218
         target.out 0x00a07300 0x00a51218 0x00a809d8 0x00a816a8
        vx_exec.out 0x00999830 0x00b14218 0x00bd8110 0x00bda4c0

Memory statistics:
Total system memory:           129256448 bytes
Free memory:                   107175232 bytes
Largest free block:            101189392 bytes
Peak usage:                    29261520 bytes

*** END SYSTEM EXCEPTION LOG ***

The other one has a similar error log with Exception code: 0x00000700 and Thread ID: 0x008C0FB8 Thread name: USB OHCI Interrupt.

I can't post it because of space limits and maybe it is also a bit useless...

One thing I don't understand: Both controllers seem to have some USB rerlated problem, but cFP 2200 has no USB port!

Anyone could help please?

Thank you in advance a lot!

Péter Kovács

PeterAdelhardt · ‎07-28-2008

Hello Péter,
thanks for your post. I have done some research on the error you described and have not found anything helpful so far. From what you describe the erroneous behavior is intermittent and only visible for 2 out of 10 identical controllers which leads me to believe it's a hardware problem with those two controllers.

I would like to ask you to try and re-format and re-install those controllers one last time. If they still do not run as expected, please contact you local NI office (http://www.ni.com/niglobal/) and initiate an RMA process, so our technicians can analyze and repair the affected devices.

Thanks and best regards,

Peter

--
Peter A.
Field Sales Engineer, NI Germany

Marian_O · ‎08-27-2009

Hi all,

as far as I heard, does the 22x0 Series cFP-Controllers work with VxWorks-Operating Systems. You need a more actual version of NI Fieldpoint then 6.0.1! Please try to use the latest version, which is 6.0.5. Does the error still appear?

http://joule.ni.com/nidu/cds/view/p/id/1365/lang/de

Best regards

Marian Vorderer

peter_smith · ‎08-31-2009

Hi MarianMO!

The problem still exists no matter what version of FieldPoint software you use.

We are already in contact with the NI engineering team in Texas through the local NI office in Hungary.

The problem is really caused by the new operating system on the FieldPoints.

As far as I know, the engineering team figured out that the source of the problem is in the new OS.

Namely, if you use TCP/IP communication in timed loops, and you assign a priority to those loops, as soon as there happens one (or more - I'm not sure) communication error, the OS collapses (??) or at least the Ethernet communication stops.

We have been told, that there is no solution for the problem, probably there will be some patches or fixes in the future.

However there is a workaround: don't use TCP/IP communication in timed loops.

(It's a bit interesting, because I had to use them in timed loops as I wanted to set the priority for the communication part higher than some other parts of my code...)

So we had to use the 21xx series of the cFP.

Best regards,

Peter Kovacs

Marian_O · ‎09-05-2009

Hello Petér,

thank you for your reply. Did the colleagues mention any reference number (for example CAR) on this issue?

Best regards

Marian Vorderer

DirkW · ‎09-08-2009

Hi Marian, Peter,

Yes, there is a known issue with timed loops and the TCP open function that causes the controller to hang and to reboot eventually.

The CAR ID is 141011.

Right now the only workaround would be to not use the open function within a timed loop or to ping a target first to make sure it is active and reachable, before you use the TCP open function.

DirkW

peter_smith · ‎09-09-2009

Hi DirkW!

Thank you for the correct ID. The solution to ping the target is not going to work in all case. When we discovered the problem, we did not use the ping function, however we were able to ping all devices on the network. The problem came from overloaded/(any other problem) cFPs. Even though we were able to ping those devices, the TCP communication was not successful, or got interrupted.

So the solution to ping the target before the communication is not 100% sure.

Peter

DirkW · ‎04-12-2010

Hi Peter,

I finally figured the reason for the intermittend crash of your 2200 cFP controller. I found another controller that had this crash behavior and the reason is actually a bug within the RT kernel that causes the USB kernel part to generate Interrupts even when this controller doen't have a USB interface attached to it.

The rush of interrupts causes the crash and the controller ends up with 4 blinks of the status LED.

The solution would be to manually disable the USB interface for your controller.

Her is how it works:

1. Make sure the Console Out dip switch is enabled
2. Connect to the target with a serial console
3. Reboot the target
4. Within half a second after rebooting, the prompt "Press any key to stop auto-boot..." appears. Press any key.
5. Now you are at a boot prompt.
6. Type 'c', then press ENTER 14 times until you get to prompt "other (o)"
7. Make sure the safe mode dip switch is enabled --- otherwise the change will not stick after a reboot
8. Type 0x4000, then ENTER
9. Now you should be back at the boot prompt
10. Disable the Safe Mode dip switch
10. You can either type '@' and ENTER to continue booting with the new settings, or just reboot... either works

I know you did get back to our 21xx series. Perhaps this gives you a new chance with the 22x0 ones.

Let us know what you think.

DirkW

FieldPoint Family

Continuous system crash

Continuous system crash

Re: Continuous system crash

Re: Continuous system crash

Re: Continuous system crash

Re: Continuous system crash

Re: Continuous system crash

Re: Continuous system crash

Re: Continuous system crash