pcimxi module causing system panic

Randy P. · ‎09-20-2004

We have an UltraSparc Solaris 9 system which is panicing relatively often. The panic seems to be coming from a NULL pointer being dereferenced in the pcimxi module.

Sep 20 12:42:41 p02rfu01 panic[cpu0]/thread=2a100045d40:
Sep 20 12:42:41 p02rfu01 unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=2a100045740 addr=8 mmu_fsr=0 occurred in module "pcimxi" due to a NULL pointer dereference

We are running version 2.0.0 of the NICpcimxi package. I don't see any newer versions or patches available. Has anybody else experienced this problem? If so, any suggestions. Thanks!

JDesRosier · ‎09-29-2004

Dear Randy-

First and foremost, I apologize for the slow reply. Our email boxes have been swamped for the past two weeks.

Do you have each of the modules have a specified base address in Resman? Please make sure that you have assigned a base address for each of the modules. Sometimes not doing so causes similar symptoms.

The next step is to delete all of the modules in resman and restart/rerun Resman. Normally, this fixes similar problems.

Please keep me updated.

Best Regards,

-Joe Des Rosier-
National Instruments

Randy P. · ‎09-30-2004

Thank you for responding. I'm not very knowledgable with regards to this equipment, so please bear with me. Below is the output from resman:

National Instruments VXI Resource Manager version 3.0

Identifying VXI/VME/MXIbus devices
Waiting 1 seconds for SYSFAIL* to be removed from the backplane
Searching for statically configured devices
Configuring Extender 0
Static Message Based device "PCI-MXI-2" found at LA 0.
Static Mainframe Extender device "VXI-MXI-2" found at LA 1.
Verifying self-tests
Configuring Mainframe 1
The Slot 0 device in frame 1 is LA 1.
Card detected in Slot 0
Card detected in Slot 3
Card detected in Slot 4
Card detected in Slot 5
Card detected in Slot 6
Card detected in Slot 9
Card detected in Slot 10
Card detected in Slot 11
Card detected in Slot 12
Static Mainframe Extender device "VXI-MXI-2" found at Logical Address 1 in slot 0.
Static Message Based device "HP E1416" found at Logical Address 24 in slot 3.
Static Register Based device "HP E1411" found at Logical Address 32 in slot 4.
Static Register Based device "E1476" found at Logical Address 40 in slot 5.
Static Register Based device "E1470" found at Logical Address 48 in slot 6.
Static Register Based device "HP E1339" found at Logical Address 72 in slot 9.
Static Register Based device "HP E1339_2" found at Logical Address 80 in slot 10.
Static Register Based device "HP E1339_3" found at Logical Address 88 in slot 11.
Static Message Based device "VX4802" found at Logical Address 96 in slot 12.
No Dynamically Configured devices found.
Verifying self-tests
Mapping Bus Signals Between Frames
Mapping VME Interrupts.
Mapping VXI Triggers.
Mapping Utility Bus signals.
Configuring address map
Configuring A16 address map
Configuring A24 address map
Configuring A32 address map
Configuring commander/servant hierarchy
Finding 'Commander' Message Based Devices
Initializing Commander/Servant Hierarchy
Allocating VXI/VME irq lines
Requesting protocols for LA 24
Requesting protocols for LA 96
Initiating normal operation

Resource Manager succeeded!

Does this look correct? Also, it appears that each run of resman rebuilds the resman.tbl, so I'm not sure what more is needed to delete the modules and rebuild with resman.

Thanks!

Randy

Richard Thrapp · ‎10-07-2004

Hi, Randy.

I'm one of the developers who has worked on the Solaris NI-VXI driver.

That resman output looks okay and gives us a good starting point for further investigation.

Could you post the section of the log file: /var/adm/messages that pertains to the crash? You can search for the string "BAD TRAP" in that file to find the relevant section. If the crash isn't in that file, logs get rotated out from time to time. It might be in the messages.1, messages.2, etc file.

Also, can you characterize how often your system crashes in this manner? Is it approximately once per hour, day, week, or month?

I'll probably have some more questions for you once I can take a look at those logs.

Thank you very much for your patience!

-- Richard

Randy P. · ‎10-07-2004

I've gotten some more info from our engineers regarding the problem as well. Evidently the system only panics when they are talking to the Spectrum Analyzer via PCI attached GPIB. You would think it is a GPIB problem, but the panic is from the pcimxi module. It doesn't always crash when connecting to the SA, however it can be frequent. The NIpcigpib software is 2.3, NICpcimxi is 2.0.0, and NICvisa is 3.1. The crash string is as follows:

Sep 27 10:04:23 p02rfu01 ^Mpanic[cpu0]/thread=2a100045d40:
Sep 27 10:04:23 p02rfu01 unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=2a100045740 addr=8 mmu_fsr=0 occurred in module "pcimxi" due to a NULL pointer dereference
Sep 27 10:04:23 p02rfu01 unix: [ID 100000 kern.notice]
Sep 27 10:04:24 p02rfu01 unix: [ID 839527 kern.notice] sched:
Sep 27 10:04:24 p02rfu01 unix: [ID 520581 kern.notice] trap type = 0x31
Sep 27 10:04:24 p02rfu01 unix: [ID 381800 kern.notice] addr=0x8
Sep 27 10:04:24 p02rfu01 unix: [ID 101969 kern.notice] pid=0, pc=0x78154524, sp=0x2a100044fe1, tstate=0x4400001604, context=0xc6
Sep 27 10:04:24 p02rfu01 unix: [ID 743441 kern.notice] g1-g7: 780292d8, 1c00, bb, fffffffffffffffe, 3000027e590, 10, 2a100045d40
Sep 27 10:04:24 p02rfu01 unix: [ID 100000 kern.notice]
Sep 27 10:04:24 p02rfu01 genunix: [ID 723222 kern.notice] 000002a100045470 unix:die+80 (31, 2a100045740, 8, 0, 7802bc27, 30004bc0c0b)
Sep 27 10:04:24 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 00000000014135f0 000002a100045740 000002a100045638
Sep 27 10:04:24 p02rfu01 %l4-7: 0000000000000031 000003000000e898 0000000000000000 0000000001111824
Sep 27 10:04:25 p02rfu01 genunix: [ID 723222 kern.notice] 000002a100045550 unix:trap+874 (2a100045740, 0, 10000, 10200, 0, 1)
Sep 27 10:04:25 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000000 0000000001437b28 0000000000000031
Sep 27 10:04:25 p02rfu01 %l4-7: 0000000000000005 0000000000000001 0000000000000000 0000000000000000
Sep 27 10:04:25 p02rfu01 genunix: [ID 723222 kern.notice] 000002a100045690 unix:ktl0+48 (fffffffffffffffe, 3000027e590, 101, 7802bc18, 2a10072ae9c, 2a10072ae98)Sep 27 10:04:25 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000005 0000000000001400 0000004400001604 000000000102bf54
Sep 27 10:04:25 p02rfu01 %l4-7: 000000007802bc18 fffffffffffffffe 0000000000000006 000002a100045740
Sep 27 10:04:26 p02rfu01 genunix: [ID 723222 kern.notice] 000002a1000457e0 pcimxi:HandleVIRQLevel+78 (0, ffffff50, 1, 7802bc18, ffffffffffffffff, 2a10072b0d4)
Sep 27 10:04:26 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000000 0000000000000000 0000030004e1a000
Sep 27 10:04:26 p02rfu01 %l4-7: 000003000000b170 000003000000b198 000003000000b1c0 00000300038c4c88
Sep 27 10:04:26 p02rfu01 genunix: [ID 723222 kern.notice] 000002a1000458f0 pcimxi:HandleVSID+140 (0, ffff2900, 20, 0, 100c4f4, 0)
Sep 27 10:04:26 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000280 0000000000000000 0000000001400428
Sep 27 10:04:26 p02rfu01 %l4-7: 0000000001400000 0000000000000016 00000000014007e8 0000030003c231f8
Sep 27 10:04:27 p02rfu01 genunix: [ID 723222 kern.notice] 000002a1000459c0 pcimxi:handleEvent+d0 (ffff2900, 2a100045d40, 20, 0, 29, 0)
Sep 27 10:04:27 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 0000000078029
800 0000000078154290 0000000000000000 00000000ffff2900
Sep 27 10:04:27 p02rfu01 %l4-7: 00000000000000c2 00000000000019c8 00000000780281e8 0000000000000008
Sep 27 10:04:27 p02rfu01 genunix: [ID 723222 kern.notice] 000002a100045a80 pcimxi:vxi_softintr+74 (30004a01cf8, 85d, 1400000, 2a100045d40, 10ba0, 78142d38)
Sep 27 10:04:27 p02rfu01 genunix: [ID 179002 kern.notice] %l0-3: 00000000ffff2900 0000030004a01cf8 0000000000000001 0000000000000001
Sep 27 10:04:27 p02rfu01 %l4-7: 0000030000265ae8 00000000780281e8 00000000014d2000 00000000014d2000

I can get you a crash dump if it will help any (we have lots of them). Thanks.

Randy

Richard Thrapp · ‎10-11-2004

Randy,

Thanks for posting the crash log. I am currently looking into it. However, for the time being, we might be able to make the process of debugging this issue more efficient if we switch to direct email. Could you please send an email to support through ni.com/support, and let them know that you're trying to get in touch with me regarding this issue on the discussion forums?

Thanks!

-- Richard Thrapp

fmhess · ‎05-11-2007

I've had similar problems with NULL dereferences in HandleVIRQLevel on one of my Linux boxes using a driver based on nivxi 2.1. I've worked around it by commenting out the vxi_tertiary_handler_internal() call so vxi_tertiary_handler() does nothing. The driver still seems to work.

May 11 14:44:52 cabana kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
May 11 14:44:52 cabana kernel: printing eip:
May 11 14:44:52 cabana kernel: f8a7ef6f
May 11 14:44:52 cabana kernel: *pde = 00000000
May 11 14:44:52 cabana kernel: Oops: 0000 [#1]
May 11 14:44:52 cabana kernel: SMP
May 11 14:44:52 cabana kernel: Modules linked in: ni_pcimxi appletalk ax25 ipx p8023 nfs lockd nfs_acl sunrpc nvidia agpgart autofs4 ipv6 xt_tcpudp xt_state ip_conntrack nfnetlin
k iptable_filter ip_tables x_tables sr_mod sbp2 ide_disk i2c_i801 i2c_core snd_hda_intel psmouse serio_raw pcspkr tsdev evdev parport_pc snd_hda_codec eth1394 snd_pcm_oss snd_mix
er_oss parport snd_pcm snd_timer snd rtc soundcore snd_page_alloc ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod ide_generic ide_cd cdrom sd_mod usbhid piix ohci1394
ieee1394 ahci libata scsi_mod generic ide_core ehci_hcd tg3 uhci_hcd usbcore thermal processor fan
May 11 14:44:52 cabana kernel: CPU: 0
May 11 14:44:52 cabana kernel: EIP: 0060:[] Tainted: P VLI
May 11 14:44:52 cabana kernel: EFLAGS: 00010246 (2.6.18-4-686 #1)
May 11 14:44:52 cabana kernel: EIP is at HandleVIRQLevel+0x33/0x180 [ni_pcimxi]
May 11 14:44:52 cabana kernel: eax: 00000000 ebx: 00000001 ecx: 00000000 edx: ffffff50
May 11 14:44:52 cabana kernel: esi: 00000000 edi: 00000000 ebp: dfab1ee8 esp: dfab1e74
May 11 14:44:52 cabana kernel: ds: 007b es: 007b ss: 0068
May 11 14:44:52 cabana kernel: Process events/0 (pid: 6, ti=dfab0000 task=dffefaa0 task.ti=dfab0000)
May 11 14:44:52 cabana kernel: Stack: c02ca8b0 c011624d 00000000 00000001 c02ca8ac 00000001 00000000 00000000
May 11 14:44:52 cabana kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
May 11 14:44:52 cabana kernel: 00000000 00000000 00000000 00000000 00000000 00000046 ffffffd2 00000000
May 11 14:44:52 cabana kernel: Call Trace:
May 11 14:44:52 cabana kernel: [] __wake_up_common+0x2f/0x53
May 11 14:44:52 cabana kernel: [] HandleVSID+0x66/0xb0 [ni_pcimxi]
May 11 14:44:52 cabana kernel: [] handleEvent+0x64/0x6c [ni_pcimxi]
May 11 14:44:52 cabana kernel: [] vxi_tertiary_handler_internal+0x11/0x30 [ni_pcimxi]
May 11 14:44:52 cabana kernel: [] printk+0x14/0x18
May 11 14:44:52 cabana kernel: [] vxi_tertiary_handler+0x19/0x1e [ni_pcimxi]
May 11 14:44:52 cabana kernel: [] run_workqueue+0x78/0xb5
May 11 14:44:52 cabana kernel: [] vxi_tertiary_handler+0x0/0x1e [ni_pcimxi]
May 11 14:44:52 cabana kernel: [] worker_thread+0xd9/0x10b
May 11 14:44:52 cabana kernel: [] default_wake_function+0x0/0xc
May 11 14:44:52 cabana kernel: [] worker_thread+0x0/0x10b
May 11 14:44:52 cabana kernel: [] kthread+0xc2/0xef
May 11 14:44:52 cabana kernel: [] kthread+0x0/0xef
May 11 14:44:52 cabana kernel: [] kernel_thread_helper+0x5/0xb
May 11 14:44:52 cabana kernel: Code: 68 8b 45 08 89 45 a4 8b 45 10 89 45 a0 89 c3 fc 31 c0 b9 0d 00 00 00 8d 7d a8 f3 ab 0f bf 7d a4 0f bf cb 8b 04 bd c0 4c a9 f8 49 70 08 b
e 01 00 00 00 4b d3 e6 e8 f0 4d ff ff 83 c4 10 66 83
May 11 14:44:52 cabana kernel: EIP: [] HandleVIRQLevel+0x33/0x180 [ni_pcimxi] SS:ESP 0068:dfab1e74

VXI and VME

pcimxi module causing system panic

pcimxi module causing system panic

Re: pcimxi module causing system panic

Re: pcimxi module causing system panic

Re: pcimxi module causing system panic

Re: pcimxi module causing system panic

Re: pcimxi module causing system panic

Re: pcimxi module causing system panic