LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

PXI Scan Engine occasional 66460 with 9144 EtherCAT

Hello all. We have an application in the field that has been running fine for 8 months or so. This application consists of one PXI-8880RT communicating with two daisy-chained 9144 chassis over EtherCAT. A week ago the system started experiencing random Scan Engine faults.

 

Each time the Scan Engine faults on code -66460 (I/O scan time exceeded). When we reset the fault the scan engine immediately recovers and is no worse for wear until the next fault. The scan engine period is set to 1ms and the -66460 error code uses the default "unconfigured" setting (on Scan Engine page under properties window).

 

Typically the scan engine faults every few hours or so, but it can also go days between faulting. CPU and memory usage on the PXI are reasonable (<20% cpu usage on all cores, most cores at 0%) and no memory leaks are seen. Again this application has been running for 8 months without issues. We also tried to reboot the PXI several times after the problem started but that didn't fix the issue.

 

We have also checked all of the associated EtherCAT cables to verify they are OK.

 

Does anyone have any ideas or insight? Anything I could probe to help find the cause of this issue?

0 Kudos
Message 1 of 10
(2,870 Views)
Update: changing the scan engine period from 1ms to 10ms seems to have stopped the occasional 66460 errors. This is a bit puzzling - why would a system that has been running for almost a year suddenly be unable to update the scan engine at 1ms? Again - the cpu usage at 1ms scan time was very mild, so the processor didn't seem to be overloaded or starved for time. Does anyone have any idea what could change within the PXI over time to cause it to be unable to update the scan engine at the same rate?
0 Kudos
Message 2 of 10
(2,833 Views)

Hi there!

 

That is indeed a very interesting situation.  May I ask if there were any changes a week (or more) ago in your system?  Perhaps a update in your code, or in the version of software packages on the computer?  If there way any variance in your system (however slight), it may give us something to go off of.  

Trevor H.
Technical Support Engineer
National Instruments
0 Kudos
Message 3 of 10
(2,828 Views)

Trevor, thanks for the reply. Having a variance or change in the system would obviously help troubleshoot this problem, but unfortunately we've racked our brains and just can't think of any difference.

 

The hardware is all the same (cables, etc). No new equipment was installed nearby. We even checked the incoming power and couldn't find any issues.

 

The software is also unchanged. The PXI was originally configured with LabVIEW 2017 and associated drivers back when the project was developed and then commissioned, but it hasn't been touched since January 2018. And the system had been running in the 8 months since.

 

One extra piece of info: when I went to the jobsite and debugged the PXI code via "Operate -> Debug Application or shared library" (code was deployed with debugging enabled), navigating and probing the block diagrams was very slow. We tried rebooting the PXI but the slowness remained. For example when trying to probe a wire that was within a timed loop sync'd to the scan engine it would occasionally hang LabVIEW and require a force close on the development computer. Note the PXI cpu usage was very low at this point seen via Distributed System Manager.

 

After updating the scan engine period to 10ms the debugging immediately became fast again and we didn't have any further issues. So something related to the scan engine was hanging or stalling at 1ms, but again the PXI cpu usage didn't seem to be outrageous. We do have timers in our code to measure the execution speed and noticed some jitter when updating at 1ms, but the jitter decreased significantly when the period was changed to 10ms. For reference our main timed loop was taking ~0.1ms to execute on average, so we figured a 1ms update rate wasn't pushing anything.

 

The system has been running for almost two days now since changing to 10ms, so whatever was failing seems to have been fixed by the slower update rate.

0 Kudos
Message 4 of 10
(2,818 Views)

After reading all the descriptions, I think the temperature must be the only change to your system. The whole world is having a very hot summer now.

 

Referring to the manual, can you confirm if the PXIe-8880 controller is still working under the epxected "Operating Envrionment" (0 - 50 °C)? High temperature will cause CPU frequency variation, this will increase the jitter of the RT.

 

My suggestion:

1. Ensure the sponge of the chassis is in a healthy state:

880501699.jpg

2. Ensure the chassis is well sealed in the unused slot:

1963896549.jpg

3. Turn the fans on High.

4. In the BIOS, disable the following CPU configuration which will impact the jitter of the RT: Hyper-Threading, Turbo Boost, C-States, and Intel VT-d.

 

Hope these will help.

0 Kudos
Message 5 of 10
(2,810 Views)
Thanks for the feedback. I will definitely check the sponge filter and will also check the BIOS settings and let you know. Unfortunately the PXI is in an air conditioned lab, so it experiences <25 C ambient all of the time. However both 9144 chassis that communicate over EtherCAT are not located in air conditioned areas, and may be subject to higher temps (we've seen up to 35 C ambient recently, and both chassis are within non-ventilated enclosures so they could see 45-50 C ambient max). Is it possible the 9144 could be affected by temperature in a way that causes the EtherCAT ring to slow down or not transmit as well, affecting the PXI? Unfortunately the ports on the 9144s are used for EtherCAT so I can't connect DSM to check the cpu usage or other diagnostics. That being said, this failure has also occurred early in the morning when the temps were <20 deg C ambient.
0 Kudos
Message 6 of 10
(2,797 Views)

I can confirm that the either the slave devices or the master could be the bottleneck for needed Scan Engine time.  There are a couple benchmarks we have for 9144 performance in this regard.

Trevor H.
Technical Support Engineer
National Instruments
0 Kudos
Message 7 of 10
(2,787 Views)

Unfortunately the problem re-occurred last night. It has successfully ran for over a week with the slower (10ms) update time, but again the EtherCAT link went down.

 

We did check the PXI filter but it is very clean. Again the PXI is in an air-conditioned lab (20C) so it has consistent environmental conditions. When the system went down the entire lab was cool (it has cooled off here in the past week, so ambient temps are <22C).

 

Our EtherCAT cables are wired in the following way:

 

8880RT (local processor Port configured for EtherCAT) -> 9144 #1 -> 9144 #2

 

Cables have been re-crimped and checked for tight bends, but everything looks good. The first 9144 chassis is fully populated, the other is half populated so we shouldn't be pushing I/O rates too hard at 10ms. The PXI shows no memory leak or high cpu usage. And the system did run without any issues for 8-9 months before the problems started to occur.

 

Are there any advanced diagnostics we can examine to figure out why the EtherCAT ring is having problems so intermittently?

0 Kudos
Message 8 of 10
(2,769 Views)

@shansen1 wrote:

.... 

Are there any advanced diagnostics we can examine to figure out why the EtherCAT ring is having problems so intermittently?


This KB seems to outline how to use Wireshark to monitor EtherCAT.

 

Please keep us updated if you go that route and find something. I woul dbe interested in what you find, and what was involved in filtering for the failing condition.

 

May God smile on your efforts (since I can not help you beyond that link)

 

Ben

 

Spoiler
Co-author of the first Ethernet Sniffer that ran on VAX written in Macro circa 1988)
Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel
0 Kudos
Message 9 of 10
(2,758 Views)

I'm seeing this now with the cRIO-9068 and a string of Beckhoff EtherCAT slaves. 10 ms scan engine time. If I just set error -66460 to "ignore" in the scan engine config, will scan engine recover from these intermittent errors?

0 Kudos
Message 10 of 10
(2,275 Views)