LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
TurboPhil

Allow Inlining and Preallocation of Dynamic Dispatch Method VIs in LVOOP

Status: Declined

Any idea that has received less than 5 kudos within 5 years after posting will be automatically declined.

You don't have the option to set a VI to be inlined in the caller if it is called dynamically. Makes sense--the compiler doesn't know about it a priori, so it can't shoehorn it into the block diagram under the hood. This holds true for dynamic dispatch VIs, such as Override methods in LVOOP. In this case, though, it would seem that the compiler does have enough information to do the inlining---it could basically create case structures for each unique child class of the parent class being overridden.


Similarly, if the compiler were smart enough to do the behind-the-scenes case structure implementation, then it should also be able to preallocate clones for each instance of the Dynamic Dispatch VI.

 

The reason I bring this up is that I have a situation that screams for OOP implementation, but the dynamic dispatch portion needs to run in a very tight loop--several, tight loops, actually. In parallel. I am trying an OOP implementation of monitoring incoming data for Warning conditions; so I want a generic "Warning" class that has descendent classes for different conditions (i.e. monitoring for Limit violations, value change events, etc.). But, because the data throughput in the system is so high, I don't think I can trust the implementation to OOP--I think the subVI call overhead and jitter from sharing clones will be a bottleneck. 😞

10 Comments
tst
Knight of NI Knight of NI
Knight of NI

This can't work because child classes can be loaded dynamically. There is no way to guarantee you will know about all the child classes at edit time.

 

It might be possible to make it work if you say "I do know about all the classes at edit time and I'm willing to get an error if I load a new class at run time", but that would presumably require more effort to implement. Do you think it's worth it?


___________________
Try to take over the world!
AristosQueue (NI)
NI Employee (retired)

> I think the subVI call overhead and jitter from sharing clones will be a bottleneck. :smileysad:

 

Do you have any actual evidence that this is a concern for your application? It's generally not a good idea to optimize prematurely at the cost of good software architecture. The share clones only causes a UI thread hit if you actually allocate more clones. Otherwise, getting a clone is pretty light weight. And I have yet to see any application for which the dynamic dispatch overhead is a performance bottleneck outside of two of our older CRio models (and on those models, lots of weird operations are slow). Dynamic dispatching is pretty close to the same as a case structure would be -- it's just an index into an array, no name lookup or pattern matching or anything silly like that (search for information about how virtual tables are implemented in most OO languages for further details).

 

tst is right about the reasons why this is impossible with current LabVIEW. Having said that, I have sketched out elsewhere on these forums what would be needed to limit the dynamic loading -- it's far different from just adding the "sealed" keyword that other programming languages have -- sealed prevents any children. This would be a setting on the parent to prevent any *further* children other than those that are in memory when the class starts running. It's viable, just never been a priority to build.

TurboPhil
Active Participant

AQ--you're right, I am still in the planning/design phase, so I don't have any hard evidence in this case that the subVI call overhead/sharing clones will be a concern. But I do know from experience in other aspects of our large, distributed system, that every little optimization helps. In this particular application, microseconds count; even without the dynamic dispatch, I would be concerned if I were calling just a regular subvi without inlining (it's being called in a tight loop).

 

It's also interesting to hear that sharing clones is handled in the UI thread. I didn't realize that. That is another concern I have now, even before having anything testable.

 

I still plan to proceed with my OO design as-is to actually test it out and see how it compares to my (currently low) expectations. I'll report back if I am pleasantly surprised.

StephenB
Active Participant
I have some benchmarks I could dig up where I measured DD performance overhead. If i remember right, it was always <50uS a call and cRIO was about 2x as bad as PXI. not a big deal unless you're making a bunch of calls at a high loop rate (which unfortunately I was, so I chose not to use DD)
Stephen B
AristosQueue (NI)
NI Employee (retired)

> (which unfortunately I was, so I chose not to use DD)

 

Yeah, I hear that from time to time. What I haven't seen, yet, is someone who was writing a bunch of calls at a high loop rate who chose TO use dynamic dispatching AND THEN found that was a problem for them, with the exception of those two CRio modules I mentioned. Some of the overhead of a dyn disp call is reduced for multiple dyn disp calls in a row. And if not using dynamic dispatching means you're writing a case structure, then you're paying most of the dyn dispatch overhead cost anyway.

 

This should not be taken as me saying that it is not a problem. It might very well be a problem. But there are really high speed apps written in C++ that use dispatching, and our model, although not as straightforward, is the same basic idea.

TurboPhil
Active Participant

[Coming back to this after a long delay. Sorry.]

 

In my project, I am scanning telemetry data to throw Warnings based on various criteria. I have a parent class called Warning, which has child classes such as Limit, ValueChange, etc. I just made a quick benchmark VI to test the performance using dynamic dispatch versus an inlined subvi:

DD vs Inline.PNG

 

It looks like the overhead for dynamic dispatch/subVI call makes it ~30-50 times slower than if I just used an inlined VI. Granted, in this example, the inlined VI is only handling one possible case--effectively one child class--so I can't just replace it as-is. Next, I'll try just using the old-fashioned [enum + variant] method to achieve the multiple implementations provided by the LVOOP method....I suspect it will still outperform LVOOP. Yes, I know that it won't have a lot of the benefits of objects (data member protection, dynamic class loading, etc.), but for the purposes of this application, execution speed needs to take priority over those features.

Intaris
Proven Zealot

Well I am someone who wanted to use LVOOP for a specific application but it never came trough testing because the overhead for a single DD call is too expensive for my application (which is almost certainly an extreme case).


Hence my Kudos for this idea even though it is CURRENTLYnot possible.  Then again, that's what the Idea exchange is for.  I would love a flag to disallow dynamic loading of classes so that this kind of operation could be realised.

 

Shane.

TurboPhil
Active Participant

Coming back to this idea.

 

It seems even AQ has since admitted that dynamic dispatch overhead is a real phenomenon. So I'm wondering how much of that overhead is due to the shared clone lookup, and what performance benefits something like this idea might stand to offer....

Intaris
Proven Zealot

For the record (or whoever doesn't want to follow the links or doesn't have access to all the threads referred to)

 

Conventional wisdom tells us that the extra overhead of a DD call should be around the same as a case structure.  This is NOT the case.

 

In the threads referred to above I did a lot of testing of static vs Dynamic vs standard VI calls (all optimised for benchmarking - no error in or out etc.) and found a DD call to cost around 3.5 times that of a standard VI or a static Class call.  i.e. if on my machine a (with essentially only the VI call as overhead) standard VI call costs 86ns, a static class method will cost 98ns whereas a DD call costs 304ns (a whopping 3.1 times slower than a static LVOOP call or 3.5 times a standard nonLVOOP call).

 

In order to test the "overhead = Case structure" argument I did exactly that by mimicking dynamic dispatch by creating my own dispatcher (a case structure) around several different static non-inlined nonLVOOP calls.  Result: 92ns, 6 ns overhead vs the 206ns overhead of the actual DD LVOOP call vs the static LVOOP call.  That makes the overhead for a  DD call 34 (!!) times slower than a simple case structure.

 

I have since (in order to keep my timing budget intact on RT systems) started implementing all kinds of "pseudo" Dynamic dispatch with inlined static VIs from otherwise related classes.  This way the entire inheritance tree is known at compile time and cannot be expanded and as such can all be inlined, giving some lovely performance.

 

So to summarize, yeah it's real and it's killing me.  My code could be so elegant if DD calls weren't so darn expensive.  I don't care about the difference between non-inlined performance if we gain the option to inline (at the cost of dynamic loading of classes).

Darren
Proven Zealot
Status changed to: Declined

Any idea that has received less than 5 kudos within 5 years after posting will be automatically declined.