Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

kenstern · ‎11-18-2013

Running the MHDDK examples with a PXIe 6341 DAQ card crashes the Linux kernel, either using NI-VISA or the LinuxKernel OSInterface.

The only example which do not appear to crash the system are the aoex1 and aiex1.

Attached are the messages cut from the system log after reboot, as well as the application messages which show up when the system crashes.

The chassis is a PXIe-1062Q with a PXIe 8115 controller in it, running Fedora 2.6.43.8-1 in 32-bit mode.

There are other PXIe cards in the chassis, supported by the DAQmx 8.0.2f and NIDMM 2.5.2 drivers.

NI PXI-6509: "Dev5"
NI PXI-2503: "Dev4"
NI PXI-2503: "Dev3"
NI PXI-2503: "Dev2"
NI PXI-4070: "Dev1"

Steven_T · ‎11-19-2013

Hello Kenstern,

It appears that you are forcing the application to prematurely terminate. I noticed the "^C" in the output, this usually indicates that you are pressing ctrl-C.

I would expect bad behavior and possible crashes because we are configuring DMA and Mapping the board's registers in the example application. Terminating the program prematurely will result in the execution to immediately stop..and not clean up the DMA or other operations. That said, the examples are meant to be executable documentation for how to program the registers of the board. It is not meant to show you how to properly create a driver for Linux.

Do these crashes happen without the premature termination?

Why are you terminating the application?

Thanks,

Steven T.

kenstern · ‎11-19-2013

That particular example, aoex3, looped continuously, so the ctrl-c was just to stop it.

When I try another example, say aoex5, it also crashes the system.

This was using the VISA examples.

Interestingly enough, I was able to debug the application, and it crashed in releaseBoard on the final viClose on the Rm Session ( line 133, of osiUserCode.cpp )

Steven_T · ‎11-19-2013

Hello Kenstern,

The example should not loop continuously, there should run for a set amount of time before cleanly exiting. We found that on Linux, the time.h library could result in these timings to go on for significantly longer than the reported time.

Could you run an example, such as aoex5, while capturing with NI I/O Trace? Attaching the raw generated trace would allow me to us to see the parameters used to call viClose. From your debugging, did it seem that the rmSession value getting closed was correct? What version of VISA are you using? What version of NI-KAL and NI-PAL? I believe a NI system report would tell us this information if you have run that utility before.

Another note is that VISA does not officially support Fedora. This basically means that it has not been tested and may not work as expected.

I'll look into the iotrace when you attach it.

Thanks,

Steven T.

kenstern · ‎11-19-2013

niiotrace is not installed and is not bundled with the daqmx or dmm drivers (NI-VISA is bundled)

In the debugger, the valid rm session handle is being passed.

The nimhddk_visa/VISA/osiUserCode.cpp appear to have the correct logic for acquireBoard and releaseBoard for the sessions:

acquireBoard excerpt:

status = viOpenDefaultRM(&sesn);

status = viOpen(sesn, brdLocation, accessMode, openTimeout, &vi1);

   status = viMapAddress(vi1, VI_PXI_BAR0_SPACE, 0, BAR0sz, 0, 0, &mem0);
   tVisaSpecific *specific = new tVisaSpecific;
   specific->sesn = sesn;
   specific->vi1 = vi1;

releaseBoard excerpt:

   viUnmapAddress(specific->vi1);
   viClose(specific->vi1);
   viClose(specific->sesn);    <--- crash here

Here is what is installed:

-bash-4.2# rpm -qa | grep ^ni
nimxpi-1.4.1-f0.i386
nidaqmxef-1.4.1-f3.i386
ni653x-1.1.1-f0.i386
nivisa-5.0.4-f0.i386
nidmm-2.5.2-f0.i386
niscarabmm-1.2.1-f0.i386
nidaqmxcapihelp-1.6.1-f0.i386
niscxi-1.5.1-f0.i386
nicvirte-8.0-7.i386
nirpci-4.0.0-f1.i386
nipalki-2.6.3-f0.i386
nimdbgi-1.3.1-f0.i386
nimru2i-2.4.1-f0.i386
nidaqmxcapiexmp-1.6.1-f0.i386
nitimingi-1.5.2-f0.i386
nidaqmxswitch-1.6.1-f0.i386
nivisak-5.0.4-f0.i386
nidimi-1.9.2-f0.i386
nivisa-config-5.0.0-f0.i386
nidaqmxhelp-1.0.2-f0.i386
nipali-2.6.3-f0.i386
nimxdfi-1.4.1-f1.i386
nipxirmi-1.6.0-f0.i386
nimxs-4.0.1-3007.i386
nidaqmxcfgi-1.4.0-f0.i386
nidsai-1.5.1-f0.i386
nimdnsresponder-1.4.0-f2.i386
nivisa-devel-5.0.3-f1.i386
niorbi-1.9.0-f0.i386
nidaqmxinfi-8.0.1-f0.i386
nimioi-1.7.2-f0.i386
nikali-2.3.1-f0.noarch
nistci-1.3.3-f0.i386
nicdigi-1.5.1-f0.i386
nidaqmxcapii-1.6.1-f0.i386
niIviEngine-2.4.0-f3.i386
nivisaserver-5.0.0-f0.i386

I found an article on the NI discussion boards talking about nirlp and daqmx:

http://forums.ni.com/t5/Driver-Development-Kit-DDK/nirlp-and-daqmx-together/td-p/2427272

Using this technique, I am able to load the kernel module, and still use DAQmx with the other devices.

I would still like to understand why NI VISA crashes the system, however.

Steven_T · ‎11-19-2013

Hello Kenstern,

Thank you for specifying this behavior. Do you have information about how the MHDDK Linux Kernel operating system interface crashes? So far, the information you have provided has been specific to the VISA operating system interface.

Jumping back to the VISA crash...I'd like to narrow down what it takes to produce the system crash. Under controlled environments, we don't see one.

1. Does this crash happen if only VISA and the MHDDK were installed on the system (no DAQmx)?

2. Does this crash happen when using a new version of VISA? The newest is VISA 5.4.

3. Does this crash happen with one of the NI supported distributions? RedHat/Scientific Linux/openSuSE

Do you have links to all software on the system (including the OS/distribution and NI driver downloads, so I can attempt to repeat the software setup)?

Thanks,

Steven T.

kenstern · ‎11-20-2013

1. I have not uninstalled DAQmx. I've limiting the Linux Kernel Driver to only the DAQ 6341 card.

Exercising the aiex3 example, I found that the kernel crashes at line 366:

dma->configure(bus, nNISTC3::kReuseLinkRing, nNISTC3::kIn, dmaSizeInBytes, status);

I turned on kernel debugging (nirlpk.h) and can see that one of the parameters passed to 'mmap' (vma->vm_pgoff = 0x32334) seem off.

Nov 19 15:52:55 localhost kernel: [   14.917120] nirlpk: module license 'Copyright (c) 2012 National Instruments Corporation. All Rights Reserved.' taints kernel.
Nov 19 15:52:55 localhost kernel: [   14.917123] Disabling lock debugging due to kernel taint
Nov 19 15:52:55 localhost kernel: [   14.917266] nirlpk: nNIRLP_init
Nov 19 15:52:55 localhost kernel: [   14.917268] nirlpk: timestamp - Nov 19 2013 15:48:04
Nov 19 15:52:55 localhost kernel: [   14.917269] nirlpk: PAGE_SIZE: 4096
Nov 19 15:52:55 localhost kernel: [   14.917271] nirlpk: registering pci driver (major 250)
Nov 19 15:52:55 localhost kernel: [   14.917292] nirlpk: nNIRLP_pciProbe() device found - id: 0xc4c4
Nov 19 15:52:55 localhost kernel: [   14.917294] nirlpk: Allocated tPCIDevice (df9c2f60)
Nov 19 15:52:55 localhost kernel: [   14.917303] nirlpk:   bar0: 0xdf200000
Nov 19 15:52:55 localhost kernel: [   14.917344] nirlpk: driver initialized sucessfully
...

Nov 19 16:04:45 localhost kernel: [ 725.103726] nirlpk: nNIRLP_open(inode (f53b7df0), file (f58b9380))
Nov 19 16:04:45 localhost kernel: [ 725.103728] nirlpk: Allocated nNIRLP_tDriverContext (f2761860)
Nov 19 16:04:45 localhost kernel: [ 725.103733] nirlpk: minor 0
Nov 19 16:04:45 localhost kernel: [ 725.103737] nirlpk: mmap: vma->vm_start = 0xb76cc000
Nov 19 16:04:45 localhost kernel: [ 725.103738] nirlpk: mmap: vma->vm_end   = 0xb770c000
Nov 19 16:04:45 localhost kernel: [ 725.103738] nirlpk: mmap: vma->vm_pgoff = 0xdf200
Nov 19 16:04:45 localhost kernel: [ 725.109214] nirlpk: nNIRLP_release
Nov 19 16:04:45 localhost kernel: [ 725.109216] nirlpk: free nNIRLP_tDriverContext (f2761860)
Nov 19 16:04:45 localhost kernel: [ 725.109217] nirlpk: exit release
Nov 19 16:07:21 localhost kernel: [ 881.478603] nirlpk: nNIRLP_open(inode (f53b7df0), file (f2175d40))
Nov 19 16:07:21 localhost kernel: [ 881.478606] nirlpk: Allocated nNIRLP_tDriverContext (f2233f40)
Nov 19 16:07:21 localhost kernel: [ 881.478611] nirlpk: minor 0
Nov 19 16:07:21 localhost kernel: [ 881.478615] nirlpk: mmap: vma->vm_start = 0xb7fa8000
Nov 19 16:07:21 localhost kernel: [ 881.478617] nirlpk: mmap: vma->vm_end   = 0xb7fe8000
Nov 19 16:07:21 localhost kernel: [ 881.478618] nirlpk: mmap: vma->vm_pgoff = 0xdf200
Nov 19 16:10:33 localhost kernel: [ 1073.163057] nirlpk: nNIRLP_open(inode (f53b7df0), file (f21bee40))
Nov 19 16:10:33 localhost kernel: [ 1073.163059] nirlpk: Allocated nNIRLP_tDriverContext (f217c760)
Nov 19 16:10:33 localhost kernel: [ 1073.163064] nirlpk: minor 0
Nov 19 16:10:33 localhost kernel: [ 1073.163068] nirlpk: mmap: vma->vm_start = 0xb770f000
Nov 19 16:10:33 localhost kernel: [ 1073.163069] nirlpk: mmap: vma->vm_end   = 0xb774f000
Nov 19 16:10:33 localhost kernel: [ 1073.163070] nirlpk: mmap: vma->vm_pgoff = 0xdf200
Nov 19 16:10:33 localhost kernel: [ 1073.168554] nirlpk: nNIRLP_release
Nov 19 16:10:33 localhost kernel: [ 1073.168556] nirlpk: free nNIRLP_tDriverContext (f217c760)
Nov 19 16:10:33 localhost kernel: [ 1073.168557] nirlpk: exit release
Nov 19 16:14:19 localhost kernel: [ 1298.673089] nirlpk: allocated tDMA (0xf2237260)
Nov 19 16:14:19 localhost kernel: [ 1298.673095] nirlpk: allocated dma buffer 16384 bytes (descriptor: 0xf2237260)
Nov 19 16:14:19 localhost kernel: [ 1298.673096] nirlpk:   bus address 0x32334000
Nov 19 16:14:19 localhost kernel: [ 1298.673098] nirlpk:   cpu address 0xf2334000
Nov 19 16:14:19 localhost kernel: [ 1298.673102] nirlpk: mmap: vma->vm_start = 0xb7ffa000
Nov 19 16:14:19 localhost kernel: [ 1298.673103] nirlpk: mmap: vma->vm_end   = 0xb7ffe000
Nov 19 16:14:19 localhost kernel: [ 1298.673104] nirlpk: mmap: vma->vm_pgoff = 0x32334
Nov 19 16:14:19 localhost kernel: [ 1298.673107] nirlpk: fix page count up
Nov 19 16:14:19 localhost kernel: [ 1298.673148] ------------[ cut here ]------------
Nov 19 16:14:19 localhost kernel: [ 1298.673347] kernel BUG at include/linux/mm.h:402!
Nov 19 16:14:19 localhost kernel: [ 1298.673561] invalid opcode: 0000 [#1] SMP
Nov 19 16:14:19 localhost kernel: [ 1298.673790] Modules linked in: fuse nisdigk(PO) niswdk(PO) niscdk(PO) nisldk(PO) nicdrk(PO) nimxpk(PO) nimru2k(PO) nipxirmk(PO) nfsd nidimk(PO) lockd nfs_acl auth_rpcgss nimsdrk(PO) nidmxfk(PO) nimxdfk(PO) nimstsk(PO) nimdbgk(PO) niorbk(PO) nipalk(PO) nikal(PO) sunrpc snd_hda_codec_hdmi ppdev snd_hda_intel snd_hda_codec snd_hwdep parport_pc parport snd_seq snd_seq_device tnt4882(O) nec7210(O) gpib_common(O) snd_pcm i2c_i801 iTCO_wdt iTCO_vendor_support joydev snd_timer snd e1000e nirlpk(PO) soundcore snd_page_alloc microcode usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Nov 19 16:14:19 localhost kernel: [ 1298.675429]
Nov 19 16:14:19 localhost kernel: [ 1298.675960] Pid: 2697, comm: aiex3 Tainted: P           O 2.6.43.8-1.fc15.i686 #1 National Instruments NI PXIe-8115 Embedded Controller/NI PXIe-8115 Embedded Controller
Nov 19 16:14:19 localhost kernel: [ 1298.677098] EIP: 0060:[<f7fb808f>] EFLAGS: 00210246 CPU: 1
Nov 19 16:14:19 localhost kernel: [ 1298.677685] EIP is at get_page.part.0+0x3/0x53 [nirlpk]
Nov 19 16:14:19 localhost kernel: [ 1298.678370] EAX: 00000000 EBX: f6b236a0 ECX: 000000cb EDX: f2338000
Nov 19 16:14:19 localhost kernel: [ 1298.678989] ESI: f2335000 EDI: 00000001 EBP: f2159eac ESP: f2159eac
Nov 19 16:14:19 localhost kernel: [ 1298.679635] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 19 16:14:19 localhost kernel: [ 1298.680289] Process aiex3 (pid: 2697, ti=f2158000 task=f2100c90 task.ti=f2158000)
Nov 19 16:14:19 localhost kernel: [ 1298.680992] Stack:
Nov 19 16:14:19 localhost kernel: [ 1298.681697] f2159ecc f7fb7c09 f7fb96ac f7fb90f9 f2159ecc dfb69528 f2237260 f2233f40
Nov 19 16:14:19 localhost kernel: [ 1298.682422] f2159efc f7fb7d4a f7fb971c 00032334 f2233f40 dfb69528 b7ffe000 00032334
Nov 19 16:14:19 localhost kernel: [ 1298.683167] b7ffa000 df306a00 ffffffea 00000004 f2159f4c c050db4f b7ffe000 000000fb
Nov 19 16:14:19 localhost kernel: [ 1298.683930] Call Trace:
Nov 19 16:14:19 localhost kernel: [ 1298.684692] [<f7fb7c09>] nNIRLP_tDMA_fixPageCount+0x99/0xb0 [nirlpk]
Nov 19 16:14:19 localhost kernel: [ 1298.685487] [<f7fb7d4a>] nNIRLP_mmap+0x12a/0x140 [nirlpk]
Nov 19 16:14:19 localhost kernel: [ 1298.686299] [<c050db4f>] mmap_region+0x2ef/0x490
Nov 19 16:14:19 localhost kernel: [ 1298.687118] [<c050df1e>] do_mmap_pgoff+0x22e/0x2f0
Nov 19 16:14:19 localhost kernel: [ 1298.687947] [<c050e070>] sys_mmap_pgoff+0x90/0x1c0
Nov 19 16:14:19 localhost kernel: [ 1298.688793] [<c095681f>] sysenter_do_call+0x12/0x28
Nov 19 16:14:19 localhost kernel: [ 1298.689649] Code: 91 fb f7 e8 c4 21 5d c8 8b 15 a8 a2 fb f7 b8 66 91 fb f7 e8 b4 21 5d c8 31 d2 b8 66 91 fb f7 e8 a8 21 5d c8 5d c3 00 00 55 89 e5 <0f> 0b b9 f2 ff ff ff 31 c0 e9 00 f6 ff ff b9 f2 ff ff ff e9 2a
Nov 19 16:14:19 localhost kernel: [ 1298.691660] EIP: [<f7fb808f>] get_page.part.0+0x3/0x53 [nirlpk] SS:ESP 0068:f2159eac
Nov 19 16:14:19 localhost kernel: [ 1298.748735] ---[ end trace 46074bd7ee897d2c ]---

2. I have not been able to upgrade the version of VISA or 3. reload the operating system with a different distribution.

Fedora Core 15 has been working well with DAQmx up to now.

kenstern · ‎11-21-2013

What I found was that using DMA, the kernel driver crashed the machine on a get_page since the memory pages in additional pages was not accessed.

(printed out the page count which get_page() checked, and they were zero)

You can see that in the following log (enabled the kernel debug prints). This was using the aiex3 example.

The program allocated 16384 bytes, which is 4 physical pages (4096 bytes each).

The first access (probably to write the dma header) generated a memory fault, so nNIRLP_mmap was called in the driver.

But, that function figures out it's a dma allocation, and then calls nNIRLP_tDMA_fixPageCount which walked thru the dma buffer one page at a time.

It seems that only the first page was accessed, hence the 0 count on the other pages.

There are probably multiple way to fix this (e.g. access data each page), but all I did for now was to add

atomic_inc(&page->_count);
in the fixPageCount routine inside the 'if (up)' conditional, so that the actual call to get_page doesn't call VM_BUG_ON

---

Nov 21 10:57:33 localhost kernel: [17138.214808] nirlpk: mmap: vma->vm_start = 0xb7773000
Nov 21 10:57:33 localhost kernel: [17138.214809] nirlpk: mmap: vma->vm_end   = 0xb77b3000
Nov 21 10:57:33 localhost kernel: [17138.214810] nirlpk: mmap: vma->vm_pgoff = 0xdf200
Nov 21 10:57:33 localhost kernel: [17138.217382] nirlpk: allocated tDMA (0xf53bbe80)
Nov 21 10:57:33 localhost kernel: [17138.217386] nirlpk: allocated dma buffer 16384 bytes (descriptor: 0xf53bbe80)
Nov 21 10:57:33 localhost kernel: [17138.217387] nirlpk:   bus address 0x32540000
Nov 21 10:57:33 localhost kernel: [17138.217388] nirlpk:   cpu address 0xf2540000
Nov 21 10:57:33 localhost kernel: [17138.217390] nirlpk: mmap: vma->vm_start = 0xb77c5000
Nov 21 10:57:33 localhost kernel: [17138.217391] nirlpk: mmap: vma->vm_end   = 0xb77c9000
Nov 21 10:57:33 localhost kernel: [17138.217392] nirlpk: mmap: vma->vm_pgoff = 0x32540
Nov 21 10:57:33 localhost kernel: [17138.217393] nirlpk: fix page count up
Nov 21 10:57:33 localhost kernel: [17138.217394] nirlpk:   i: 0xf2540000, page count: 1
Nov 21 10:57:33 localhost kernel: [17138.217395] nirlpk:   i: 0xf2541000, page count: 0
Nov 21 10:57:33 localhost kernel: [17138.217396] nirlpk:   i: 0xf2542000, page count: 0
Nov 21 10:57:33 localhost kernel: [17138.217397] nirlpk:   i: 0xf2543000, page count: 0
Nov 21 10:57:33 localhost kernel: [17138.217400] nirlpk: nNIRLP_vmaFault: page count: 3

metux · ‎04-02-2017

Nov 19 15:52:55 localhost kernel: [ 14.917120] nirlpk: module license 'Copyright (c) 2012 National Instruments Corporation. All Rights Reserved.' taints kernel.
Nov 19 15:52:55 localhost kernel: [ 14.917123] Disabling lock debugging due to kernel taint

Proprietary drivers ?! Forget it - will never work reliably (unless you use *EXACTLY* the same kernel)

Linux Embedded / Kernel Hacker / BSP / Driver development / Systems engineering

Driver Development Kit (DDK)

Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card

Re: Linux kernel crashed using MHDDK analog in/out examples with DAQ 6341 PXIe card