LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 
Reply

Linux RT - "NOHZ: local_softirq_pending 08"

Highlighted

Linux RT - "NOHZ: local_softirq_pending 08"

Hi all,

 

I'm working on narrowing down this previously mentioned 0x661 crashing issue, and have noticed some odd behaviour and errors messages on a cRIO-9067.

 

The latest error/warning message is from the linux kernel which states NOHZ: local_softirq_pending 08. Is this something to be concerned about?

 

The real-time application appears to continue running after these events occur without any obvious issue. Poking around some linux message boards seems to indicate this message is closely associated with networking calls. Co-incidently enough one cause of the 0x661 crash seems network related, so I'm beginning to wonder the two are related. For what it's worth I've never seen this message and 0x661 appear in the same log.

 

The cRIO is running Real-Time 14.5, with firmware version 3.5.0f0. See the attached log for more info.




Certified LabVIEW Architect
0 Kudos
Message 1 of 4
(2,973 Views)

Re: Linux RT - "NOHZ: local_softirq_pending 08"

Hello Michael,

 

Thank you for linking this update to the previous thread.  From the messages you are getting it sounds like it may be a networking related issue.  In the prevous thread it seems like you were not able to reliably get the error.  Are we able to crash/error consistently now?  Can you elaborate a bit more on your work narrowing down the issue?  Do we have a smaller project or code to reproduce this issue?

 

Thank you and looking forward to your update.

Clemens | Applications Engineer | National Instruments
0 Kudos
Message 2 of 4
(2,912 Views)

Re: Linux RT - "NOHZ: local_softirq_pending 08"

Hi Clemens, thanks for responding.

 

I can't reproduce the 0x661 error on command. It's more a case of deploying the RT application and letting it run for several days. The error usually presents itself within 24h, but may take longer. The quickest I've seen is within about 15 minutes from a cold boot, and twice within the hour. Other times the same RT app will run for 2-3 days before crashing.

 

I've been narrowing down the cause, or rather eliminating code which isn't the cause, using the diagram disable structure and trial and error. Removing certain components seems to stop the error (or at least up-time exceeeded 3-4 days), but then testing those same components in isolation also wasn't able to cause a crash. So at this stage it's looking like no single piece of the code is at fault - rather when multiple code modules are running together is the only time the error seems to occur.

 

I'm running a few more isolation tests over the coming days. If they don't prove useful, I'll try pare the code down to something minimal which can still cause the crash and go from there.

 

If it's any help, some typical error messages logged to /var/local/natinst/log/LabVIEW_Failure_Log.lvuser.txt are below. As you can see, each crash is a result of a SIGSEGV signal.

 

####
#Date: Mon, Jun 27, 2016 09:35:58 AM
#Desc: LabVIEW caught fatal signal
14.0.1 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0x4
#RCS: unspecified
#OSName: Linux
#OSVers: 3.2.35-rt52-2.10.0f0
#OSBuild: 197155
#AppName: lvrt
#Version: 14.0.1
#AppKind: AppLib
#AppModDate: 


####
#Date: Mon, Jun 27, 2016 09:48:38 AM
#Desc: LabVIEW caught fatal signal
14.0.1 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0xc
#RCS: unspecified
#OSName: Linux
#OSVers: 3.2.35-rt52-2.10.0f0
#OSBuild: 197155
#AppName: lvrt
#Version: 14.0.1
#AppKind: AppLib
#AppModDate: 

 




Certified LabVIEW Architect
0 Kudos
Message 3 of 4
(2,883 Views)

Re: Linux RT - "NOHZ: local_softirq_pending 08"

Hello Michael,

 

Thank you for the update and plan of action.  If we are getting SIGSEGV, it sounds like something in the code is trying to access an invalid memory address.  Do we have a sense of where we may be in the code when we are crashing?  Is there a specific call/function or series of functions/calls that may be causing this behavior?

Clemens | Applications Engineer | National Instruments
0 Kudos
Message 4 of 4
(2,856 Views)