Awhile back, one of the forum posters suggested I share some of the techniques I have developed in my years as a LabVIEW programmer. This is the first in an occassional series. Race condition debugging is the topic of the day, since it was the topic which prompted the suggestion.
Anyone who has programmed LabVIEW for any period of time has probably run into a race condition. You used a global variable when you shouldn't. That local that used to be a simple scalar morphed into a cluster and then starting blapping your control booleans. You copied a cluster and didn't reinitialize it correctly. There are lots of ways to do it. But how to do you find it?
Race conditions often rely on an exact timing and execution sequence. A good clue that you have a race condition is that your code works correctly when you step through it with execution highlighting. You need a way to see what is happening in real time. I do this by sprinkling my code with the small VI attached below (LabVIEW 7.1 version, Windows only). This VI writes strings to the debug device every time it is run. These strings can be read by anything which will read the debug device, but the easiest way to watch them is using DebugView from the Microsoft SysInternals suite of applications. DebugView gives you timestamps with processor tick count precision, so you can accurately determine exactly when the code runs (Note that Linux and Mac users can get much the same functionality by writing a string to stderr. Since I have not actually done this, I cannot give details of use.).
Where do you put these function calls? If you have no clue where the problem is, start with your top level loop. Make sure you put identifying tags at each location. Include the values of the variables you are trying to find race conditions in. This should allow you to find the problem spots. At this point, start placing more tags in a sort of binary search for the problem. Experience has shown that you can usually find a race condition in under an hour using this technique, provided you can reproduce the race condition in the first place (which may be a real problem).
If you are using LabVIEW 8.0 or better, you can put a conditional compile structure around the error debugging code and leave it in your code so you don't have to remake it every time.
There are probably as many methods of debugging race conditions as there are LabVIEW programmers. How do you do it?
I have used an LV2 style global to capture values of the offending variable(s). This will work cross-platform. If the main program does not loop too fast I just build an array, otherwise a circular buffer is less likely to create its own problems. I often find myself tracking the state enum at the input to the case structure to answer the question "How did the state machine get to THAT state?"
I like your idea of adding the string tags and using the VI in multiple places. This will definitely speed up the debugging by showing both writers and the reader.
Thanks for sharing your wisdom.
Logging when things happen is a great way to start to track down race conditions. For logging purposes I use a queue-based logging system. There is a separate log viewer that's essentially a window with a string indicator with a built-in search mechanism, but there is also an option to dump it to file. As with the previous solution, it's cross-platform.
Unfortunately, logging on its own is not enough to be able to track down race condition caused by local variable abuse since there is no data dependency. There is no way to know whether the access of the local happened before or after the logging event. In this respect there is little that LabVIEW programmers can do other than to artifically create a data dependency which essentially removes the race condition that you were trying to track down in the first place. Given this, it would be great if LabVIEW had a built-in mechanism for logging read/write access to locals and globals, which in most cases are the primary causes of race conditions.
By the way, is "blapping" a technical term? I will let the reader look up that term.
I have used both action engine and queue based approaches in the past. What sold me on DebugView was that it is a separate process. It is not unusual for race conditions to cause hangs and crashes. DebugView is still around with a data dump after LabVIEW has gone away or hopelessly hung. The same would be true of a stderr dump in Linux or OS X. I have toyed with the idea of using TCP/IP to create a totally LabVIEW solution with the same advantage that is cross platform, but never found the time.
The point with the seperate process convinces me to use your approach/vi. It is really nasty debugging a program with some hardware related SDKs (mainly coming as dlls) which keep LabVIEW crashing at some point, even if not race conditions are the issues, but just the system requirement need the code executing faster than highlight execution mode or propably will take place in loop iteration 1234 or 5678.
Logging to a file still seem s the better way for a beta shipping version, or is there a way to get a LogFile of DebugView (assuming a user who clicks away any error messages, but will follow a step by step instruction to send you the file per email).
DebugView does not log to disk by default. However, you can either export to a log file or log continuously, depending on your needs. I usually export to keep the processor/time hit as low as possible.
DebugView also has good filtering, so you can use a lot of debug statements and filter on them so you are not overwhelmed with data.
And just for completeness, the linux/Mac OS X users can easy call into the "syslog" facility. This can be directed to a syslog on a remote host as well. It has a bunch of tools for different logging levels etc. It is saved to disk as well or if it is a remote system then you have the data in a crashing situation. It is a very robust solution with a lot of standard tools for configuring where the logging data is to be sent.
If the original debug vi had a "conditional disable" to work cross platform it would be complete.
We use a queue base logging which will go to a logfile. Analysis is done normally afterwards. Normally we log commands going through queues and changes of the state machines. The logging will flush the file after each entry. In the rare cases we have a crash the logging will point to the critical code even when the last log send to the log queue could not written to the file.
In cases when it is nesseccary to know the sequence of the debugged code and logging I put a flat sequence structure around the log VI and led the wire trough it.
We never thought about a separate logging program. In this case I would put the logging module in to a standalone app and make an interface VI.