LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

LVOOP vs. Reentrancy vs. Parallel Execution

Solved!
Go to solution

I wonder what would happen if you unbundle outside the loop rather than inside it?  Perhaps it's the object reading stuff that is not re-entrent.

0 Kudos
Message 11 of 18
(1,651 Views)

I get the feeling this may have a lot to do with thread swapping.

 

While it's always theoretically faster to process things on multiple cores, the actual implementation gets more difficult.  Getting software to run well across multiple cores with the thread swaping is very difficult.

 

As such, I would certainly expect to see a performance increase when using multiple cores (as you have) however, I would not expect the performance to increase 4 fold (again, what you're seeing).  It's simply because there is a large amount of overhead to split tasks among different physical processors, synchronize them when needed, and get data between them as necessary.

Chris
Certified LabVIEW Architect
Certified TestStand Architect
0 Kudos
Message 12 of 18
(1,650 Views)

Thanks for all the input, I did some experimenting on your suggestions:

 

1.) Replace square root operator by increment operator: gain in absolute processing time; savings non-reentr. / reentr 44%

2.) Integrate Wait 0ms into SubVIs; processing time increases as expected; savings non-reentr. / reentr 33%

3.) SubVI @ different Execution system (standard); gain in absolute processing time; savings non-reentr. / reentr 47%

4.) Use four object instances instead of implicit copies via wire branch incl. Execution system as in 3.) : same as 3.)

5.) Setup as in 4.) but replacing the in place structure in the SubVI with an ordinary unbundle / bundle: increase in Execution time; savings non-reentr. / reentr ~42%

 

I did not expect to reach savings of 75% but hoped for sth 65%ish. BTW no event structures in the code.

 

Any more suggestions? I'll be happy to try them out Smiley Happy

 

Cheers

Oli 

 

 

 

0 Kudos
Message 13 of 18
(1,641 Views)
Solution
Accepted by topic author Oli_Wachno

Put MORE code in the benchmark.  What you're essentially doing here (because the actual workload is almost zero per iteration) is measuring the overhead of multithreading as others have pointed out.

 

Put an interpolation step or something slightly more taxing in the code and you'll likely see a larger increase when spread over 4 cores.  This is always an important benchmarking thing - overhead vs. code execution.

 

Shane.

Message 14 of 18
(1,634 Views)

Shane you're right!

 

I played a little bit with array based functions in order to produce some processor load and found that the gain of using reentrant SubVIs is strongly depending on the type of operation you execute. Using a large (initialized) array and the Linear Fit.vi, I managed to speed up the reentrant version by ~ 69% compared to the non-reentrant one.

It really seems to be an issue of "the right operation" to minimize overhead.

 

I'm exited how it will turn out in the real world application

 

Cheers

Oli

0 Kudos
Message 15 of 18
(1,621 Views)

Please remember one thing......

 

Overhead stays constant no matter how complex the code is in your class assuming all other factors remain the same.  Increasing the execution time of your class operation will make the overhead:actual code ratio drop in your favour.  Only then will you see good scaleability over multiple cores.  The larger the difference between code execution time and overhead, the better your results will be.  Regarding the abolute value of the overhead, I'll let someone more informed on the subject answer that one.

 

Glad to be of help.

 

Shane.

 

Ps can you post the execution time numbers too?  I would actually expect more than 69% for a 4x core difference given the right environment.  I think you mean that the 4 Core example only takes a bit less than a third of the time of the original, right?  That would be kind of OK.

Message 16 of 18
(1,616 Views)

So here is what I changed the reentrant SubVI to:

Execute.jpg

For test I used an array size of 9000000

 

Reentrant I had 7797ms (mean value) Non-reentrant 27967ms. This is a ratio I can live with rather well!

 

0 Kudos
Message 17 of 18
(1,599 Views)

Well that looks kinf od good.

 

7797 ms for 4 cores versus (perfect scaling) 27967ms / 4 =6991 ms for perfect scaling.  Nice.  Factor 3.5x Speed-up for 4 cores.

 

Way to go LabVIEW. (And Oli of course).

 

Shane.

0 Kudos
Message 18 of 18
(1,571 Views)