03-15-2012 02:04 PM
In short:
cFP-2220 devices (probably all VxWorks based controllers with DHCP/Link local mode) crash if they fall back to link local and the router has Proxy ARP enabled. This happens when/if the controller checks if the link local address it intends to use is free by sending an ARP request for that address - and the router, due to the proxy arp, respons that that address points to the router. The controller will then try another link local address, then another (see picture below)...until it concludes that the interface is impossible to initialize, restarts itself...and then finally sets itself in safe mode due to repeated crashes.
Question/issue to be resolved:
Disabling Proxy ARP fixes the issue, but it is not a good behaviour in the first place (in fact fallback to link local is discouraged according to RFC3927 (see section 2.11). Unfortunately, as far as I can see, cFP-2220 does not have a DHCP-only option, only DHCP/Link Local. Is there a workaround to get DHCP only?
The longer story:
We have spent a lot of time on a very special problem lately - namely that all our cFP-2220s would crash on a customer's network, not ours, if the controllers were configured to use DHCP on NIC 1, and the DHCP server was offline. We really want the controller to keep its old dynamically received address if the DHCP server is offline, but as fas as we know yet, such functionality is not possible on LV RT (it is on Windows e.g.), so we have come up with a solution where the RT app will store the last known valid IP, and use the System Configuration API to cycle between DHCP and static addresses based on whether DHCP requests fail or not..However, this whole solution failed miserably on the customer's network...which we, after a looong journey, now have traced to not have anything with the RT app to do at all after all (no app needed to crash); its the link local fallback that causes a crash when combined with the customer's router configuration.
Attached:
The console mode output during a crash. As the console output shows this controller is running the latest LabVIEW RT 2011 drivers...
http://www.ietf.org/rfc/rfc3927.txt
Solved! Go to Solution.
03-15-2012 03:06 PM
As the long story indicates the ideal solution would really be to replace the fallback to a link local address with a fallback to the configuration last received from the DHCP server...That would eliminate the need for inefficient fiddling via the System Configuration API...
But in this particular case I am assuming that we do not get that, and then we should at least have a link local fallback that does not crash the whole controller if it fails to find a free address, but starts the RT app (so that this in turn can do what it might still need to do - like serving the secondary port, or reconfigure the setup of the primary one(!)).
03-15-2012 04:47 PM
Found a description that matches the problem exactly:
http://www.tcpipguide.com/free/t_DHCPAutoconfigurationAutomaticPrivateIPAddressingA-3.htm
"The 169.254.0.0/16 block is a private IP range and comes with all the limitations of private IP addresses, including inability to use these addresses on the Internet. Also, APIPA cannot provide the other configuration parameters that a client may need to get from a DHCP server. Finally, APIPA will not work properly in conjunction with proxy ARP, because the proxy will respond for any of the private addresses, so they will all appear to be used."
So we need NI to offer a solution that is more suitable for large networks...in the page linked to above they say that this alternative is to stay in DHCP mode until a DHCP server replies....but that's not ideal either because you may want the network to keep functioning without DHCP online. So an option to keep the address should be available (more in line with the earlier referenced RFC).This is coincidently a behaviour that many members of the SIIS group founded by the oil and gas industry want to have and might set as a requirement in their specifications...making it difficult to adhere to it with current NI hardware.
03-16-2012 01:41 AM
I've reported this as a bug with the link local feature as well know simply because its not really acceptable that it prevents the RT application from executing in such a scenario as this will stop other critical functionality that really does not need to rely/relate to connecticity on port 1 at all. If you have not checked the "Halt on TCP error"-box the controller should not halt on any TCP error.
03-19-2012 09:49 AM - edited 03-19-2012 09:55 AM
Hey Mads.
Wow, what a stinker. In RT up to 2011SP1, we have a token we put in there "just in case" so that customers who found a problem with our link local behavior could disable it. Unfortunately since we haven't heard anything about this until now it will most likely not be present in 2012 when it's released (it passed the 3-year "oops" test), but if this is a "deal breaker" we can add it back. Anyway, give this a shot:
Add a token to ni-rt.ini in the [IP_Settings] section as such:
[IP_Settings]
no_LinkLocal=TRUE
Also, ensure that the "Halt on TCP Error" check-box is "checked".
What happens when this is set is that once we attempt DHCP, if the DHCP is not successful we will NOT fall back to Link Local. This will trigger the "Halt on TCP Error" to reboot the system and try DHCP again. In this default configuration, the system will drop into Safe Mode after two unsuccessful boots (the "third boot"). In order to bypass this, and allow the controller to boot "infinitely", also add this token to the ni-rt.ini file:
[Startup]
YouOnlyLiveTwice=FALSE
This may provide you with a stop-gap until we are able to "fix" the problem (no guarantees or forward-looking statements here). We will definitely file an internal bug-report (CAR) on this, but I couldn't say where it would fit in the priority tree. Is there any way to detect this erroneous setting on the router (that you know of)?
-Danny
03-19-2012 11:20 AM
03-19-2012 11:26 AM
03-19-2012 11:35 AM - edited 03-19-2012 11:38 AM
The underlying "gotcha" is that current-generation RT systems can only perform DHCP on startup - they cannot perform DHCP once they've been brought up, except to renew a lease or renegotiate an existing lease (the lease timers trigger this activity). This means that if you want DHCP, it MUST be done at startup or else you're going to have to reboot to try again. If you do not check the "Halt on TCP Error" (which is a misnomer, it really means, "Reboot on Automatic IP (DHCP/LinkLocal) address acquisition error") then the target is SUPPOSED to drop into a 0.0.0.0 IP address state, but apparently that is not the case any more.
We do not currently have an ability to drop into a static IP address if DHCP fails. This is something you would have to specifically request, and potentially escalate through proper channels.
-Danny
03-19-2012 02:09 PM
03-19-2012 02:12 PM