From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

Linux Users

cancel
Showing results for 
Search instead for 
Did you mean: 

Random nipalk Oops

Running on an unsupported distribution, 2.6.27.4 SMP 32bit kernel.

NI-VISA 4.5.1

Randomly, the nipal/nipalk kernel module will crash.  The following code is an example of what I'm doing (function return checks have been stripped).  The third line is where the kernel module crashes.

viOpenDefaultRM( &defaultRM );

viFindRsrc( defaultRM, "PXI?*INSTR", VI_NULL, &count, visa_handle_string );
viOpen( defaultRM, visa_handle_string, VI_NULL, VI_NULL, &PLX_REGS );

Dmesg follows.

PLX9054 0000:01:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
PLX9054 0000:01:01.0: setting latency timer to 64
BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<e09229b8>] :nipalk:nipalk-unversioned0002049+0x90/0x10c
*pdpt = 000000001c51b001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: NiViPciK(P) nipxirmk(P) nidimk(P) niorbk(P) nipalk(P) nikal(P) ipv6 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core ata_piix floppy parport_pc parport ohci1394 ieee1394 ide_scsi loop nfs lockd sunrpc r8169 skge sky2 forcedeth bnx2 tg3 e100 mii e1000e igb e1000 libphy usb_storage ohci_hcd ehci_hcd uhci_hcd BusLogic 3w_xxxx

Pid: 4198, comm: pdaqtest Tainted: P          (2.6.27.4smp #4)
EIP: 0060:[<e09229b8>] EFLAGS: 00010247 CPU: 0
EIP is at nipalk-unversioned0002049+0x90/0x10c [nipalk]
EAX: 00000001 EBX: dd109b80 ECX: 00000002 EDX: 00000000
ESI: dc4e2648 EDI: 00000002 EBP: dd109adc ESP: dd109ac0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process pdaqtest (pid: 4198, ti=dd109000 task=de032420 task.ti=dd109000)
Stack: 00000000 00000001 00000001 00000000 dd109b80 dd109b38 dc4e2648 dd109b04
       e095ebe5 dc4e2648 00000001 00000002 dd109b80 00000000 00000000 dd109b80
       dd109b84 dd109b4c e0918d94 dd109b38 dc4e2648 00000000 00000001 00000002
Call Trace:
[<e095ebe5>] _ZN10tBusFlavor3getE18tBusAttribute64Bit14tBusWindowTypemPl+0x4d/0xe8 [nipalk]
[<e0918d94>] nipalk-unversioned0001839+0xe0/0x16c [nipalk]
[<e0743d3c>] NiViPciK-unversioned0000126+0xc4/0x20c [NiViPciK]
[<e0904674>] nipalk-unversioned0001347+0x44/0x50 [nipalk]
[<e0743af3>] NiViPciK-unversioned0000127+0x23/0x11c [NiViPciK]
[<e07410e8>] NiViPciK-unversioned0000067+0x2c/0x48 [NiViPciK]
[<e073fe40>] NiViPciK-unversioned0000043+0x1d4/0x4b8 [NiViPciK]
[<e0745fe3>] NiViPciK-unversioned0000184+0x2f/0x38 [NiViPciK]
[<e073d45f>] NiViPciK-unversioned0000014+0xab/0xd94 [NiViPciK]
[<e08dab42>] nipalk-unversioned0000037+0xc6/0x19c [nipalk]
[<e0941a81>] nipalk-unversioned0002572+0x6d/0x94 [nipalk]
[<e096e73c>] _ZNV14tSyncAtomicU32mmEi+0x14/0x30 [nipalk]
[<e09423ea>] nipalk-unversioned0002552+0x4a/0x7c [nipalk]
[<e09423f5>] nipalk-unversioned0002552+0x55/0x7c [nipalk]
[<e08f9dc3>] nipalk-unversioned0001224+0xeb/0x130 [nipalk]
[<e08fa5ee>] nipalk-unversioned0001236+0xfa/0x260 [nipalk]
[<e08eeb60>] nipalk-unversioned0000994+0xcc/0x194 [nipalk]
[<c011d7a1>] __wake_up_sync+0x38/0x4e
[<e096e6f8>] _ZNV14tSyncAtomicU32ppEi+0x14/0x30 [nipalk]
[<e092b01a>] _Z15ioControlHelperPvmS_m+0x3a/0x154 [nipalk]
[<e09423f5>] nipalk-unversioned0002552+0x55/0x7c [nipalk]
[<e092bae1>] nipalk-unversioned0002368+0x29/0x48 [nipalk]
[<e092b2e7>] nipalk-unversioned0002362+0x1b3/0x1fc [nipalk]
[<c018d501>] signalfd_release+0x9/0xb
[<e0717719>] nNIKAL100_ioctl+0x1e/0x30 [nikal]
[<c018d501>] signalfd_release+0x9/0xb
[<c0172228>] vfs_ioctl+0x50/0x60
[<c018d501>] signalfd_release+0x9/0xb
[<c01724aa>] do_vfs_ioctl+0xfc/0x106
[<c01724de>] sys_ioctl+0x2a/0x40

[<c0103762>] syscall_call+0x7/0xb
[<c018d501>] signalfd_release+0x9/0xb
[<c0320000>] init_hwif_ali15x3+0x6a/0x114
=======================

Strace output.  The last line during a crash scenario is incomplete.  I'm gussing the ioctl to /dev/nipalk fails.

stat64("/usr/local/vxipnp/linux/NIvisa/visaconf.ini", {st_mode=S_IFREG|0644, st_size=2361, ...}) = 0
gettimeofday({1265321013, 271111}, NULL) = 0
gettimeofday({1265321013, 271143}, NULL) = 0
gettimeofday({1265321013, 271162}, NULL) = 0
gettimeofday({1265321013, 271194}, NULL) = 0
gettimeofday({1265321013, 271217}, NULL) = 0
semop(458753, 0xbffbf840, 1)      = 0
semop(458753, 0xbffbf830, 1)      = 0
gettimeofday({1265321013, 271297}, NULL) = 0
semop(491522, 0xbffbf7f0, 1)      = 0
gettimeofday({1265321013, 271338}, NULL) = 0
ioctl(6, 0xc018d501, 0xbffbf4c0)  = 0

I can get the module to crash by running the code in a loop and will typically see a failure after 10-20 iterations.  In my testing environment, I reboot the box, run this code and repeat.  The module will randomly crash within 50 iterations.

For fun, I wrote up a quick app to just replicate the ioctl calls to the /dev/nipalk character device but haven't been able to force a crash.

Obviously, the null pointer is causing the module to crash.  Can you tell me what resource is missing (why we have a NULL) or otherwise fix the code to check for a NULL and return an error, etc?

Thanks!

- Wade Brown

0 Kudos
Message 1 of 2
(4,003 Views)

Might well be one of the usual subtle incompatiblities that always come with proprietary kernel modules. 

 

https://forums.ni.com/t5/Linux-Users/Summary-of-problems-with-proprietary-drivers/gpm-p/3821192

 

But might also well be just crappy code.

 

Linux Embedded / Kernel Hacker / BSP / Driver development / Systems engineering
0 Kudos
Message 2 of 2
(2,895 Views)