LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Start Async Call Brutal Typedef Bug

Solved!
Go to solution

This is a nasty bug that I think is the cause for a lot of weird anomolies I'm seeing with user events, like where some don't get fired and if I probe the event refnum on a VI that was launched using the start async call node I get some weird value for the reference like 8450 or 5500 instead of some typical large integer. It also doesn't match the value I get when I initialize the reference. This happens only intermittently but I can reproduce the bug I am seeing on a smaller scale to some extent. It is not the exact same as what I am seeing in my actual project but I guarantee the two are related. I am also fairly confident this has to do with using LVlibs as well.

 

So...to reproduce some issues:

 

Unzip the attached code and open the project

Open Main.vi. It is hard to see because it's pink, but notice the coercion dot on the start async call node. This is expected at this point because I have a non-typedef cluster on the connector pane, but a TD cluster wired into it.

Now, open AsyncCall.vi

Drag the eventcluster.ctl from the project onto the front panel of the asynccall.vi

Ctrl+x on the typedef cluster that you just placed on the front panel

Select the non-typedef cluster by clicking on it

ctrl+v to replace the non TD cluster with the TD cluster and save

Go back to main.vi, notice the coercion dot didn't go away.

Open context help and notice that the ctrl types match, but it's like LV doesn't recognize this on the start Async Call node.

Delete start async call node, then replace it. Wire the cluster back up. Voila, no coercion dot.

 

Second issue -- same result but different method to get there.

 

Now that you have typedef connector panes matching and no more coercion dots because you've gone through the first steps of this "exercise", pull the EventCluster.ctl out of the library and save.

 

WOAH, look the coercion dots back, because the start async call node is still referencing the typedef cluster that it thinks should be under the library. This can be seen by deleting the cluster on main.vi then right clicking the start async call node on the connector pane and creating a new cluster constant

 

It's creates a greyed out control! Why? Well, let's open context help again. Whadda ya know, it's still looking for the control in Bug.lvlib which no longer exists.

 

 

Now, the issue I'm having in my full project which I can't post and can't reproduce on a smaller scale is updating the typedef causes the coercion dot. This means I cannot update my typedef cluster that contains all my events without going and replacing EVERY SINGLE start async call node EVERY time I add a new event. 

 

Major problem.

 

Please let me know if these steps to reproduce were not clear or you have trouble reproducing the issue. I am using LV2013 SP1. I opened the project in 2014 to see if it was resolved in a later version but saw the same thing.

Message 1 of 7
(5,319 Views)

@GregFreeman wrote:

This is a nasty bug that I think is the cause for a lot of weird anomolies I'm seeing with user events, like where some don't get fired and if I probe the event refnum on a VI that was launched using the start async call node I get some weird value for the reference like 8450 or 5500 instead of some typical large integer. It also doesn't match the value I get when I initialize the reference. This happens only intermittently but I can reproduce the bug I am seeing on a smaller scale to some extent. It is not the exact same as what I am seeing in my actual project but I guarantee the two are related. I am also fairly confident this has to do with using LVlibs as well.

 


This part sounds eerily familiar.  We just ran into a similar problem where registering for events on static control references would occasionally return 0 for a control reference only to change to a proper large integer number later.  This even though we read out the reference only once and put it in a cluster.  The value in the cluster changed with time......

 

The rest of your post doesn't ring a bell, but I can certainly relate to the "weird reference values" part of ths post.  And of course any attempt to reproduce it in code which would be suitable for posting fails.

0 Kudos
Message 2 of 7
(5,174 Views)
Solution
Accepted by topic author GregFreeman

I can repro with @GregFreeman's steps, and also confirm that I've seen this same issue at least since LV2012, but have not reported it having not been able to provide a minimal test case (thanks, @GregFreeman!)

 

Anecdotally, it appears the bug here is that type propagation sometimes makes an incorrect assumption/optimization as to whether the conpane of the Start Asynchronous Call node needs to be updated when the source changes.

 

A more obvious change -- say, adding/removing an input, swapping order, or changing datatypes altogether -- seems to always propagate correctly.

 

The incorrect optimization seems to be when a terminal maintains its same base datatype, yet changes type definitions -- or, if the type definition is re-parented or de-parented from an owning library.

 

@GregFreeman demonstrates the bug going from non-typedef to typedef, but it's actually much worse in the other direction -- when a link to an actual missing file is maintained.

 

The Start Asynchronous Call node appears to maintain a link list that's separate from that of the VI, and this separate link list is what appears to not be invalidated properly. For example, in this screenshot, I've illustrated from Greg's example that the node generates no compiler errors even after de-parenting and renaming the Typedef...

 

Screen Shot 2015-07-17 at 5.31.30 PM.png

 

 

... yet when we "Create Constant" on that offending terminal with the stale link list, we get a compiler error. Since, the greyed-out type highlighted in Context Help cannot be found, because `Bug.lvlib:EventCluster.ctl` no longer exists, yet the separate link list of this node was not notified:

 

Screen Shot 2015-07-17 at 5.27.19 PM.png

 

 

It's worth noting that `Bug.lvlib:EventCluster.ctl` does not appear in the link list of the VI at this point.

 

Oftentimes, no compiler errors are generated after this failure occurs, and as Greg reports you might end up with undefined behavior (such as suspicious-looking Refnums and events that seem to not fire) (and I'll add to this list a hearty helping of DAborts with total red-herring messages).

 

Also, you *might* receive cryptic linker errors during builds, but maybe not (in the screenshot above, you'll noticed I've added two builds, neither of which seems to have a problem building). (It appears that the stale link does travel with the source distro, even when "Disconnect Type Definitions" is selected during the build process. This is why I anecdotally believe this node maintains a link list separately from the VI link list, and that's perhaps part of the problem).

 

It's worth noting that during this refactor (de-parent and rename), all VIs and the control remained open and in memory, and all files were saved. No funny business where LabVIEW would be unable to update links in a file that was not in memory.

 

Another note -- in the original example, all of the source files were unifiles, and I can anecdotally add to the report that this bug is much more insidious when Separate Compiled Code is active on the source files. In this case, the source can appear to be be perfect -- no coercion dots, no stale links -- yet the code that's being executed can be broken. Said another way, what you see is not what you get, making debugging virtually impossible. (This particular bug is one of a few that makes "Clear Compiled Object Cache" become a normal checkpointed procedure during all application development)

 

Anyway, I wanted to draw attention to this issue, since this thread is not yet linked with a CAR, and it's a serious bug that yields undefined run-time behavior caused by a pretty normal refactor that now has a well-characterized minimal repro case.

Message 3 of 7
(5,042 Views)

Jack,

 

Thanks for chiming in with a more detailed analysis of what's going on.

Message 4 of 7
(4,967 Views)

This thread....  brings a tear to my eye.  If I had the skillz, I would hack everyone's NI account and heap up the kudos here.  (relax, I don't).

 

We have been suffering from very similar symptoms for at LEAST a year now and I think Jack's synopsis of undefined run-time behavior seems to sum up our experiences perfectly.  It makes pulling out your hair seem like such a sensible course of action at times.  Frustrating just isn't the correct word.

 

We don't actually use async calls directly, but we see VERY similar effects when working with shared VIs across different targets and using seperated source code.  It has gotten to the stage than we force a recompile of EVERTYHING before even trying to run our code in the development environment.  We've seen everything from VIs which we KNOW are running ont he remote target not being reserved for running to just plain unexplainable results like VIs being reserved for running but very clearly older versions still running ont he target.  Setting a probe on one of these incorrectly reserved VIs makes LabVIEW go bye-bye rather quickly.  It seems that there are some inconsistencies with the compiled object cache in the background but we've never been able to pin down the exact problem.  Our anecdotal evidence suggests that changes made to inlined VIS are more prone to this problem than others (changing code of an inlined on one target will NOT trigger VIs contaioning that VI on other targets to recompile.).

 

I don't know how debugging VIs on remote targets is implemented (perhaps using async calls?) but I just wanted to make sure our data point is included int his "undefined run-time behaviour".

Message 5 of 7
(4,920 Views)

Just to throw another wrench in things, I'm seeing the same issue if I have a user event data type of a user event refnum of type "typedef cluster." I have used this setup to fire an event that sends a reply refnum as part of its event data. I then use the event data node on the left side of the event structure to access this reply refnum. If the cluster that the reply refnum represents is moved into or out of a class, the type change doesn't propogate into the event structure. I had to delete and recreate the event case to solve the problem

 

Yes, you read that correctly -- this happens when moving typedefs in and out of classes. Probably because lvclass inherits from lvlibs.

 

I'll be here all day, folks.

0 Kudos
Message 6 of 7
(4,857 Views)

@JackDunaway wrote:

@I can repro with @GregFreeman's steps, and also confirm that I've seen this same issue at least since LV2012,

 

....

....

 

<SuperLongDetailed Post>


Seriously Jack how much does NI pay you?  I feel like I'm pretty good with user events and debugging but that was just crazy.

0 Kudos
Message 7 of 7
(4,820 Views)