Multifunction DAQ

cancel
Showing results for 
Search instead for 
Did you mean: 

deterministic segfault in DAQmx libs after 49 days of operations with NI 6110 under linux

Solved!
Go to solution

I have a program using a NI 6110 board through DAQmx 8.04 under 32-bit linux (Suse 11). After 49 days of operations at 1MHz (during which the program is sometimes stopped and restarted), a segmentation fault occurs inside some DAQmx libs, as follows:

 

<date> <name of machine> kernel: [4286583.578738] nimxs[940]: segfault at 11 ip 0053af14 sp 017d00a0 error 6 in libnidsadd.so[587000+37000]

 

or

 

 

 

<date> <name of machine> kernel: [4286589.134512] nimxs[1134]: segfault at 11 ip 00561f14 sp 015970a0 error 6 in libmxssvr.so[50d000+99000]

 

This happened 4 times on two different machines, each time with the same machine uptime.

 

After the segfault the cards cannot be used anymore and do not show up in nilsdev.

 

Is this a known issue of DAQmx on linux and is there a way to recover from the error other than rebooting the machine ?

 

 

0 Kudos
Message 1 of 11
(4,143 Views)

Sorry to come back with this problem that is still nagging me... Right now I "solve" it by rebooting the machines, which is invonvenient to say the least.

 

Can someone from NI tell whether some other way of resetting the driver is available (restarting NI services etc)?

 

thanks

0 Kudos
Message 2 of 11
(3,792 Views)

I cannot answer your question directly but 49 days is the time it takes for the tick counter (1 ms resolution, U32) to overflow. It sounds like something is using the tick count and does not handle the overflow well.

 

Lynn

Message 3 of 11
(3,782 Views)

thanks for this!

 

What is this counter used for? To give a little bit of context, the application driving the NI boards is a C++ program using with niDAQmx to control an experiment. The clocks for this experiment are external, 1MHz, and therefore no on-board clock is used. Our sample counters grow like the clock at 1MHz. 

 

Are there some steps that we should take to reset this counter?

0 Kudos
Message 4 of 11
(3,777 Views)

I do not know what the DAQmx functions might be doing internally.  The 49 day interval tends to be a red flag about U32 overflow at 1 ms, which is the reason I posted.  Sorry I cannot help with the details.

 

Lynn

0 Kudos
Message 5 of 11
(3,772 Views)

Ok so again to any NI employee out there... If there is indeed a counter with a ms resolution which is constantly incremented (at least when the cards are used) and which causes hell to break loose when it overflows: what can be done about it? can it be reset? and if nothing can be done *before* some random driver library segfaults, can something be done *afterwards* to restore normal operation?

0 Kudos
Message 6 of 11
(3,764 Views)

johnsold a écrit :

I cannot answer your question directly but 49 days is the time it takes for the tick counter (1 ms resolution, U32) to overflow. It sounds like something is using the tick count and does not handle the overflow well.

 

Lynn


This is also my first thought...

0 Kudos
Message 7 of 11
(3,762 Views)

Yes, but I don't use any of these counters directly, because all writes (resp. reads) are done relatively to the current position in the stream of data output (resp. input), not to the the beginning of the streams.

Here is a skeleton of the C++ (in fact C) code used:

 

// output task creation
TaskHandle outputTask;
DAQmxCreateTask(OutputTaskName, outputTask);
 // for each channel:
 DAQmxCreateAOVoltageChan(outputTask, ... DAQmx_Val_Volts, NULL);
 DAQmxCfgSampClkTiming(outputTask, "PFI0", frequency, DAQmx_Val_Rising, DAQmx_Val_ContSamps, blockSize);
 DAQmxCfgDigEdgeStartTrig(inputTask, "/Dev1/ai/StartTrigger", DAQmx_Val_Rising);
 DAQmxSetWriteRegenMode(inputTask, DAQmx_Val_DoNotAllowRegen);
 DAQmxSetWriteOffset(outputTask, 0);

 

//input task creation
TaskHandle inputTask;
DAQmxCreateTask(InputTaskName, inputTask);
 // for each input channel:
 DAQmxCreateAIVoltageChan(inputTask, ..., DAQmx_Val_Volts, NULL);
 DAQmxCfgSampClkTiming(inputTask, "PFI0", frequency, DAQmx_Val_Rising, DAQmx_Val_ContSamps, blockSize);
 DAQmxSetReadOffset(inputTask, 0);

 

//to read samples (called repeatedly):
DAQmxSetReadRelativeTo(inputTask, DAQmx_Val_CurrReadPos);
DAQmxSetReadRelativeTo(inputTask, 0);
DAQmxReadAnalogF64(inputask, blockSize, ...);

 

//to write samples (called repeatedly):
DAQmxSetWriteRelativeTo(outputTask, DAQmx_Val_CurrWritePos);
DAQmxSetWriteOffset(outputTask, 0);
DAQmxWriteAnalogF64(outputTask, blockSize, false, ...);

 

Anything that looks wrong above?

0 Kudos
Message 8 of 11
(3,753 Views)

some more information about this problem:

 

- this is not linked to the running program at all. A machine left idle during 49 days simply does not see the NI cards anymore after that delay.

- nilsdev --verbose --diag

then yields
Failed to initialize MHWConfiguration

- stopping the NI services (which removes all modules except nikal), rmmodding nikal,and reloading everything, does not work;

- additionally attempting to rescan the PCI devices after module unloading through

echo 1 > /sys/bus/pci/devices/(device number)/remove

and then

echo 1 > /sys/devices/(...)/rescan

does not work either.

 

 

0 Kudos
Message 9 of 11
(3,747 Views)
Solution
Accepted by topic author kunzjacq

... and finally :

before doing what I did in my last post (stopping services and unloading nikal, then restarting the services), restarting the nimxs daemon with

/usr/local/natinst/max/sbin/nimxs /usr/local/natinst/max/libmxssvr.so

 

makes everything work again.

(if the daemon is restarting after the modules are reloaded, the NI boards are visible with nilsdev but cannot be used).

0 Kudos
Message 10 of 11
(3,746 Views)