LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Heavy bug in "exponential fit"??

I got alot better performance by setting Parallell loops to 2, even though i have 6 cores. Is the overhead that big or some other thing going on?

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 21 of 34
(1,277 Views)

@Yamaeda wrote:

I got alot better performance by setting Parallell loops to 2, even though i have 6 cores. Is the overhead that big or some other thing going on?


How much is "alot"? Can you give some actual numbers as a function of input sizes? Also try to force a recompile (ctr+run) before testing.

 

There is definitely a parallelism overhead. This is such fast code that the overhead could be noticeable for small inputs. (Personally, I think it should not be parallelized, because, as Jim said, this code is such a small fraction of the overall fitting procedure that it does not really matter. Sometimes other stuff needs to run in parallel too, so it is not  such a good idea to concertrate all available resources to one place. Herberts idea of changing the loop ordering only makes a big difference if we have a small number of parameters, which is typically the case. I am all for that change)

 

All that said, going from 4 to 2 parallel instances on my 4 core intel, I go from 11ms to 19ms (lenght=1000, npar=100).

 

Could also have to do with the CPU architecture (e.g. AMD vs Intel). cache sizes, etc. What processor do you have? I'll do some tests tomorrow on my 16 core rig... 😉 

 

I'll rewrite the benchmark to test for parallel performance...

0 Kudos
Message 22 of 34
(1,272 Views)

Here is my result going from 1-4 parallel instances (leght=1000, npar=100).

 

 

Message 23 of 34
(1,269 Views)

AMD Phenom x6 1090 @ 4GHz

 

Running the length 10k, npar=100

 

Herbert is 3.87 times faster than original.

 

Rewrite factor vs parallell setting:

Disabled: 11.1(!)

2: 5.4

3: 3.5(!)

4: 2.6(!)

5: 2(!)

6: 1.8(!)

 

It seems the array functions are inheritly parallell.

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 24 of 34
(1,259 Views)

@Yamaeda wrote:

Rewrite factor vs parallell setting:

Disabled: 11.1(!)

2: 5.4

3: 3.5(!)

4: 2.6(!)

5: 2(!)

6: 1.8(!)


Sorry, I don't understand. What are the units? What do the exclamation marks mean?

 

You seem to have a clear inverse relation between #of cores and resulting number.


@Yamaeda wrote:

It seems the array functions are inheritly parallell.


In what sense? I don't think they should be. For big matrix operations, we can use the MASM toolkit, which e.g. parallelizes matrix operations.

 

0 Kudos
Message 25 of 34
(1,255 Views)

The numbers are the speed factors your program shows, the Herbert version is 3.9 times faster than the original code, and your cleanup is 1.8 - 11.1 times faster than the original depending on the parallell setting. The ! denotes something interesting, like with a disables setting it's actually 3 times faster than Herberts and with 6 it's 50% as fast ...

 

Did that clear up the confusion?

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 26 of 34
(1,251 Views)

The factors in my program are "x times slower than the fastest". Can you show actual times in milliseconds instead?

 

So your 1.8 is actually 6.2x faster than the 11.1, in agreement with a 6 core chip.

0 Kudos
Message 27 of 34
(1,246 Views)

@altenbach wrote:

The factors in my program are "x times slower than the fastest". Can you show actual times in milliseconds instead?

 

So your 1.8 is actually 6.2x faster than the 11.1, in agreement with a 6 core chip.


Right, but that doesn't explain why your cleaned up code is 11 times slower than the original without parallellization, that makes no sense to me.

1 parallell/disabled

parallell1.PNG

 

2 parallell

parallell2.PNG

 

4 parallell

parallell4.PNG

 

6 parallell

parallell6.PNG

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 28 of 34
(1,236 Views)

I see. Yes, that's odd. You would think it should be very similar to Herbert's code if parallelization 1, and it is on my Intel processors.

 

I've had bad experiences with recent AMD processors, so obviously they react differently to different kinds of code. Do you have any kind of power management enabled?

0 Kudos
Message 29 of 34
(1,228 Views)

Yes, C1E and speedstep is active in bios. Could it be that it's tossed between cores or something? I could try and change processor affinity tonight and see if that changes things.

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 30 of 34
(1,199 Views)