LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Big Performance Degradation in LabVIEW 2012

Solved!
Go to solution

RogerIsaksson wrote:
So when do you head for academia? Smiley Wink

Won't be until after USA starts funding our universities again. Until last year, I was one of the industry advisors on the cirriculum review board for my alma mater. That made me privvy to a whole lot of data. Engineering faculty haven't seen a raise in 12 years because of ongoing state funding cuts, and my school wasn't unique -- most (not all but most) other public universities have felt similar squeezes. And not just engineering... it's across the board into the arts, literature, social sciences, physical sciences, and business.

 

I've got as close to an academic role as I can have in industry. But there's no place for me in academia in the current "we aren't willing to pay for the education and research needed to keep the USA ahead" environment. There's lots of people who see the university system as bloated and needing to have the fat trimmed. That was probably true 15 years ago. We're at the edge of muscle leaning into bone now.

Message 101 of 111
(4,694 Views)

So George Carlin was right after all?

 

https://www.youtube.com/watch?v=4jQT7_rVxAE

 

Br,

 

/Roger

 

0 Kudos
Message 102 of 111
(4,630 Views)

A side-note.

 

I just started running our software on a  new PXI Realtime controller (8840) with LV 2015 and I see MUCH worse performance in code which utilises static inlined LVOOP calls on several nested objects.  I'm assuming this is why the code runs slower because other code runs at least as well as on our old system.

 

Previously the VI required 3-4 us to execute now it costs anywhere between 70us and 200us.  Ridiculous.  Showing buffer allocations doesn't offer any possible avenues of investigation.

 

I need to try to find out what exactly is causing the extreme slowdown.

0 Kudos
Message 103 of 111
(4,573 Views)

Thread necro and a little unrelated:

 

I thought I'd give feedback since I actually found out what the problem with my terrible LV 2015 performance was and more importantly, how I could fix it.

 

I have several aggregated objects (with only static methods) in my RT code.  I typically would unbundle them, call a method (accessor typically) and the re-bundle.

 

Turns out LV 2015 was not optimising this as an in-place operation.  Replacing the unbundle - bundle with IPE structures saved the day.  Apparently if the code complexity is large enough, the inplaceness of an unbundle - bundle can get lost in the noise.

Message 104 of 111
(4,427 Views)

I'm glad you got to the root of the problem.

 

The magic pattern for loose Unbundle-Bundle nodes is pretty limited in the conditions that it can work. It does not take much to introduce code that makes LV go, "Nope, I'm just not able to prove that this is a safe share, so I better make a data copy." That brittleness is a big part of why we introduced the Inplace Element Structure... the scope that it defines gives us certainty about your intentions as a user and allows us to do a lot more inplace than we could otherwise.

Message 105 of 111
(4,363 Views)

I was initially annoyed because I had originally actually REMOVED all IPEs and replaced them with unbundle bundle because someone said the compiler was now smart enough to optimise this. and that the IPE actually made things worse in some situations.

 

In LV 2012, it worked fine (as either IPE or bundle) but as soon as we moved to 2015, suddenly the unbundle - bundle became much slower (no longer opimised).  I could see my benchmarks improve bit by bit for each and every IPE I replaced.

 

I have started a thread in the main LabVIEW forum asking about such dependencies between complexity and optimisations.  As yet, there are no answers.  If anyone could stroll by and give some general tips, that might be of help to others who end up in a similar situation to me.  At least to me (And I've only being using LV for over 20 years now) this interdependence between optimisations and complexity was not completely clear.  Benchmarks for individual components are typically less complex than production code and as such can offer very different results especially when using lots of inlined VIs.

 

http://forums.ni.com/t5/LabVIEW/Compiler-optimisations-and-IPE/td-p/3302614

0 Kudos
Message 106 of 111
(4,342 Views)

> I was initially annoyed because I had originally actually REMOVED all IPEs and

> replaced them with unbundle bundle because someone said the compiler was

> now smart enough to optimise this. and that the IPE actually made things worse in some situations.

 

In some situations, sure. In most, the more info you give the compiler, the better it does, and the IPE is a gigantic hint about your intentions/desires: "I am digging this bit of data out, modifying it, then putting it back." 🙂 Don't try to outsmart the compiler in the general case. If you're trying to tune a specific piece of code, you can in theory benchmark* it both ways and then do that.

 

The compiler is both dumber and smarter than all of us. It isn't intellegent, but these days, its list of rules to apply far exceeds what any of us can really apply to all but the most trivial VIs... and those aren't the ones you're trying to optimize.

 

* Alas, in practice, you can't really benchmark anything on a desktop system. At some point, I want to post to the forums about the latest LV R&D discovery that further drives home just how impossible it is to benchmark anything meaningfully on a modern desktop system. Basically, there's so much caching and indirection going on that even on a stable system with no other applications running, and just running the same VI with identical inputs twice, you can get radically different numbers. Our latest discovery says that averaging N runs is actually not what you want to do because it turns out to be a set of quantized results. What you really want is to do N runs and then select the subset of runs that are all nearest the lowest quantization level and then average only those. And that's assuming you can control all of the other caching effects. This is a case of the operating system trying to be smarter than the compiler. 🙂

Message 107 of 111
(4,322 Views)

Let us not forget the various processor optimizations such as branch prediction and memory dependence prediction, etc.

Sometimes trying to write "smart" code ends up being counterproductive without taking the workings of a modern CPU into consideration.

Basically a modern desktop processor "self-modifies" the code in order for it to execute in a more optimal fashion.

 

http://www.agner.org/optimize/#manuals

 

Thus, devising a "smart" local code optimization, and benchmarking it in isolation, could end up detrimental for the global performance case.

As a simple example; using memory look-ups instead of using a mathematical operation. It is all cool as long as it does not end up squeezing something else out of the cache and *boom* a 10X reduction in global application performance since the processor now have to access (slow) RAM instead of executing a (fast) multiplication or division in a performance critical region of the program. So much for coding "smart". Smiley Frustrated

 

Roger

 

 

 

0 Kudos
Message 108 of 111
(4,291 Views)

@AristosQueue (NI) wrote:

and just running the same VI with identical inputs twice, you can get radically different numbers.


I have a nagging suspicion that there perhaps might be a more fundamental problem somewhere closer? Smiley LOL

 

 

0 Kudos
Message 109 of 111
(4,278 Views)

There's also a lot of disk caching going on int he background under Windows.  This canlead to HUGELY modified benchmark results.  Windows basically buffers everything you read from or wtite to disk (up to many GB) if you have the RAM available.

 

Actually clearing this buffer between runs can greatly help find a correct "baseline".

0 Kudos
Message 110 of 111
(4,264 Views)