02-08-2012 05:38 PM
I'm trying to create a robust communication mechanism between a cRIO and a desktop HMI application. I wanted to use Network Streams so I could avoid a lot of the pitfalls associated with managing a TCP connection. My RT code must be written to run forever, and it has to recover gracefully from network disconnections and abrupt power losses. I can simulate these things pretty effectively by yanking the Ethernet cable or hitting the Abort button in LV.
My RT target has a reader with the endpoint name "commlink/reader". My HMI has a writer with the name "commlink/writer". The HMI attempts to establish the connection: it's given the reader URL "//<RT IP>/commlink/reader". This all works fine on the first run, but not after aborting the RT code. On the next run, the RT code frequently receives error -314101 (An endpoint with the same name already exists.) from the Create Network Stream Reader Endpoint VI. I can't figure out how to clear this error so I can recreate my endpoint and reestablish the comm link. Does anyone know how? Calling the Destroy Stream Endpoint VI with the stream's refnum doesn't work, and there are no other VIs or properties that look like they handle connection management.
Solved! Go to Solution.
02-09-2012 12:35 PM
Based on my research so far if you want to ensure that all data written to the stream has been received by the reader before destroying the writer endpoint, you must first call the Flush Stream function and wait for the appropriate wait condition to be satisfied before destroying the writer endpoint. Only writer endpoints may call the flush function.
http://zone.ni.com/devzone/cda/tut/p/id/12267#toc3
I would see if that or this article help you get up and running.
02-09-2012 02:35 PM - edited 02-09-2012 02:45 PM
Hi Ben -
My network stream is of type string, and the Flush function cant be used with string-based streams. It breaks the refnum wire.
"Up and running" isn't the issue. It's staying running, or recovering when something goes wrong while running. I'm not really getting error -314101 any more, but now I'm unable to reestablish a connection after error -314220 (remote endpoint destroyed) occurs when the HMI disconnects. The whitepaper doesn't seem to discuss how to keep the link alive in the face of the long list of errors that can be thrown by a Network Stream. That's what I need a tutorial on.
Edit: I did find this section of the whitepaper:
In the event of a disconnection, the active endpoint will continuously try to reestablish communication with the passive endpoint in the background. This background process will continue until the connection is repaired or until the endpoint is destroyed. While in the disconnected state, writes to the writer endpoint will continue to succeed until the writer is full, and reads from the reader endpoint will continue to succeed until the reader is empty. Once the writer is full or the reader is empty, read and write calls will block and return timeout indicators as appropriate. However, no errors will be returned from the read or write call itself.
While the endpoint buffers provide some level of protection from jitter due to unreliable networks, they will eventually fill up or empty if the network remains disconnected for too long. If your application needs to tolerate long network outages, you should either size your endpoint buffers to absorb the largest amount of downtime expected or implement appropriate logic to handle timeout conditions in a disconnected state. Also, although a stream is able to recognize it’s in a disconnected state, it can’t always differentiate whether the disconnection is the result of a network problem or an application crash or hang from the application containing the remote endpoint. In the case of a crash or hang, if the endpoint that crashed was the active endpoint, the passive endpoint that is still running will wait forever to receive a reconnection. Since the remote application is no longer responding, this message will never arrive, and the local application will make no further progress with the stream connection. If your application needs to tolerate and recover from crashes or hangs in the remote application, it’s recommended you implement your own watchdog timer by periodically checking the Connected property and taking appropriate action if you are unable to reestablish communication in a reasonable amount of time.
My HMI is the active link in this system. Right now I have my RT (passive side) waiting with infinite timeout for a connection on its endpoint. Should I change the code to try for 5 sec, then check the Connected property? I guess if Connected = TRUE, then I should destroy my endpoint and loop back to recreate it and wait; and if Connected = FALSE, then what? I've already failed to connect by the time I'm checking the value of the property, so what action should I take in that case?
02-09-2012 03:04 PM
Ignore my comment about the Flush VI above. I'm a moron and tried to connect it to the Reader stream refnum.
02-09-2012 04:08 PM
(I realize that I'm spamming my own thread, and that you only have to reply once a day. I'm just updating information as it comes along, because I have to get this working ASAP or abandon Network Streams in my project altogether.)
At some point I shut down my HMI app, and I'm now getting error -314235: "Remote endpoint was destroyed" on the cRIO. Destroying the local (RT) endpoint and trying to recreate it does not clear this error. I cannot figure out how to clear it and get a connection working again. Please let me know how to do so when this error pops up.
02-10-2012 01:40 PM
Before trouble-shooting each individual error as they come up, I want to be certain you are first using the example code provided with LabVIEW in the example finder under "Networking>>Network Streams". These should be ready to run. If these have issues, we know that there is something deeper going on. Let me know how they work. We'll get this resolved.
02-10-2012 01:48 PM
Hi Ben -
I have my application working now. Some other code (a polling RT FIFO) was consuming 100% of the CPU, which was messing with the network stream's connection manager. No need to worry about debugging the cause of the errors here.
I do still want to know what actions to take when any of the listed errors are thrown, though. A situation may arise when this cRIO is deployed to the field that causes an error to be thrown, and I need to know how to handle it so the cRIO doesn't crash, hang, or become unresponsive to network traffic.
03-19-2012 04:12 PM
I realize I'm coming late to the conversation, but perhaps some of this information will still prove useful. While network streams were designed to seamlessly handle multiple connection/disconnection cycles with the underlying TCP/IP connection over the network, they were not designed to seamlessly handle and survive multiple lifetimes or runs of the application instances owning the endpoints. While this is still possible, it will require more work on the implementer's part. An important thing to remember when using network streams is that once you've successfully connected the two endpoints, the destruction of either one of the endpoints (via normal shutdown or a crash) will require the destruction of the other endpoint. This means you can't simply reuse the endpoint on the RT target to communicate with multiple HMI sessions. Instead, you will need to destroy the endpoint on the RT target and create another endpoint (probably with the same name) for the next session.
To do this, I would recommend a state machine that fundamentally does:
I would avoid going down the road of trapping specific error codes and trying to write conditional logic for each one. The network streams API was designed such that for most/all cases, you shouldn't need to do this. If you have a healthy stream that can still move data, you won't get errors from the read or write call (aside from some corner case stuff like trying to read from a write only endpoint). If you do get an error from the read or write, it generally signifies your stream is dead and that you need to create a new one if you want to continue communicating with the remote application.
If you follow the above, I think you're 90% - 95% of the way there. As the article you linked to above mentions, there are additional considerations to account for if you need to tolerate application hangs/crashes from the remote endpoint. In these scenearios, you basically have one of three outcomes for the application that is still using the endpoint that didn't crash:
I should note that scenarios two and three can be greatly simplified if you don't care about detecting the crash/hang until another application tries to connect to the application that is still running. For example, restarting the crashed/hung application will recreate an endpoint and try to reestablish the stream connection. At this point, the application that is still running will realize the remote endpoint it was communicating with must have crashed and throw an error. At this point, the same state machine specified above should be sufficient for reestablishing the connection. I believe this part might have been more difficult in LV 2010 since the first attempt for the newly restarted application would also throw an error when trying to reconnect to the remote application. This error would occur until the endpoint in the live application was also destroyed which means you would have to loop on the Create call until it succeeded. In 2011, we changed the behavior so the Create call from the newly restarted application would cause the remote application to start returning errors, but we now continue to try and establish the connection for up to the timeout limit on the newly restarted application. If you use a state machine similar to the one above, you should be able to reestablish communication without having to write a loop around the Create call in the newly restarted application.
03-19-2012 05:29 PM
reddog, this is EXACTLY what I was looking for! Thank you for taking the time to write it all out! Incidentally, I ended up implementing "scenario 1" in your post (if anything goes wrong, kill the endpoint and recreate it), and it's incredibly robust. We've been deployed in a field application for 3 weeks, and no issue reports have come in yet.
04-03-2012 10:18 PM
Great answer. This is exactly what I needed to implement. Would you create the same state machine for both the writer and the reader endpoints?
Regards!