I have 4 cFP controllers in the field monitoring a whole bunch of sensors. Each sensor has it's own calibration, and so I built an OO hierarchy based on Dynamic Dispatch to handle it. Now there is one piece of generic software and the 4 controllers use it to do their tasks (Acquire and Log). The first three controllers work great and run for days, the fourth dies within 1 hour of running. It's the exact same architecture and code on the controllers.
I have a seperate Network thread that sends out some Health data of the device over a UDP port - it broadcasts it to the network (CPU Usage, HDD Space, Uptime). For the fourth controller, I'll notice after an hour that it's disconnected in MAX, I can't FTP into it, I can't connect through the LabVIEW project BUT this Network thread is still alive and broadcasting over the network.
Can anyone answer the following two questions?
1) In regards to the broadcast signal being active - why is that exactly? I can't get into anything else through MAX or the FTP Server...
2) In regards to the crashing of the fourth controller -> The fourth controller has some complex math where it tries to determine the root of a polynomial. The NI Toolkit NI_AALPro is used for this, and is pretty heavy. It even has a DLL call. I have a suspicion that these VI's are causing the RT Controller to crash. I'll double check tomorrow by disabling these within the code, but has anyone done this type of thing before and run into issues?
Solved! Go to Solution.
Thanks for posting on the NI Discussion Forums for help with your cFP crashes. For you first question, it's definitely odd that you can receive UDP broadcasts but not TCP data through MAX. Does that data change? Or is it possible the cFP isn't broadcasting anything new, just the last value it had and continues to send that out? Do the other 3 controllers still send out UDP broadcasts as well?
As far as the 4th controller crashing, is the complex math the only difference in the code the AALPro toolkit usage? Do you have any errror handling enabled on your VIs that are deployed?
Lastly, what version of FieldPoint and LVRT are you running on the devices?
Many thanks for responding to my questions.
1) Yes, the data changes. I have two Health parameteters that include Device Team and Uptime. Both update on the supposedly disconnected controller even though I can't FTP in and I see it disconnected in MAX
2) The four controllers are identical interms of the software deployed on them. I have a polymorphic setup to facilitate this. The fourth controller is the only one that uses the AALPro toolkit by virtue of having some channels that need it. So to answer your question, this would really be the only 'difference'. The architecture is the same, but due to dynamic dispatch they do not execute identically (depends on the channels that are assigned to the controller).
3) FieldPoint Version 13.1.0, LabVIEW Real Time 13.0.1
Tomorrow I'm going to disable those channels and leave the application running to see when/if it crashes. I'll also try at some point to leave it running in dev mode to see if it crashes.
Also I have another loop that simply blinks the Status light of the cFP at 1 second intervals....that's also working yet the cFP is disconnected in MAX
Based on the other symptoms, it looks like the reason the cFP is failing to communicate is that AAL_Pro code. However, it seems to me that the rest of the RT code is working on that cFP (the UDP broadcasts, the LED blink, etc.). It appears that the code that's 'crashing' the cFP blocks communications on the TCP side for whatever reason.
Did you ever get a chance to test the execution without the advanced math code on the 4th controller?
Yes Austin, I ran some more tests.
1) I ran the 4th controller code in the development environment with the AAL_Pro code enabled. It crashed. This is good news since we can replicate the problem in source
2) I ran the 4th controller code with AAL_Pro disabled. It crashed.
3) I ran a simple loop with the AAL_Pro operating on its own on another controller. It's been running for 20 hours now.
I also added another health parameter to each cFP that tells me the amount of space left on the external USB drive attached to it. So I see that the "disconnected" 4th controller is actually physically logging to an external disk. Of course there's no way to get to it since its FTP server is down.
My suspicion now moves to the controller itself. The next set of tests I'll run is:
1) Run the 4th controller's code on the test controller. If it runs fine, then the 4th controller has an issue.
2) To further the suspicion for #1, run a simple program on it and see how it reacts over 10+ hours.
I hope these tests give some conclusion. Unfortunately the 4th controller is in a remote area and I'll have to wait over the weekend to physically reset it.
I am very interested in the results from testing the 4th controller’s code on your test controller. If you are successfully able to run your original code on the test controller for a prolonged period of time, then we can deduce that the issue lies with the controller itself.
Once you have an update on your testing, please let us know!
Just tested the code on the test controller and it exhibits the same behavior...it's logging and the status light is lit, however, it's disconnected in MAX.
This means the problem should lie in the code itself. It's polymorphic, so it's the exact same code base. I just override a Calibration VI based on the type of sensor coming in...and the ones that were specific to the 4th controller have been disabled. It still fails!
I'll just have to go into the code and dig manually now....maybe the first step will be to diagram disable the Calibration VI as it is and see what happens.
This thing hasn't seen the end of me yet.
Thanks everyone for their support. I'm glad this thing is done and dusted!