LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

multicore toolkit


@rolfk wrote:

Your code example may actually have a possible flaw.

Yes, I mentioned that. I typically do an array sum or array max (after the timing sequence!) followed by autoinidexing of the resulting scalar. It does not seem to make a difference in my above code so I left it out.

0 Kudos
Message 11 of 24
(639 Views)

 


Hi altenbach

i have LabVIEW 2016Smiley Sad, please can u convert the attached AxB-Parallel002.vi to 2016   

 

0 Kudos
Message 12 of 24
(629 Views)

that is great,

indeed i optimized my vi to some what like urs Smiley Wink but i wired the iteration to case structure, of coarse ur vi is the best.

in the time speed result, could i save the result for each execution (each #of points)?? 

0 Kudos
Message 13 of 24
(619 Views)

@ssara wrote:

 

i have LabVIEW 2016Smiley Sad, please can u convert the attached AxB-Parallel002.vi to 2016   


Here's 2015 version that you should be able to open. Make sure you have the MASM toolkit installed. (I also added a dummy output to definitely force the compiler to do all calculations).

 

Bench2.png

Message 14 of 24
(605 Views)

try downloading again.

0 Kudos
Message 15 of 24
(597 Views)

@altenbach wrote:

try downloading again.


thank u altenbach for ur prompt feedback. It has been extremely useful.

but the beginning question is : can i optimize the speedup more?? 

can i use parallel for loop with MASM??

i used it with plain the exe time is less than Multicore! 

any explanation for the mul mechanism ? 

0 Kudos
Message 16 of 24
(590 Views)

@ssara wrote:

but the beginning question is : can i optimize the speedup more?? 

can i use parallel for loop with MASM??

 


Why are you attaching the same VI again?

 

It seems even with only one thread, the MASM toolkit shows a slightly more efficient algorithm to do the multiplication, even within only one thread.

 

If you need to do several different of these multiplications in parallel, you might benefit from using a parallel FOR loop, but then you should set the threads to one for each. Doing outer and inner parallelization does not make any sense. If your loop e.g. uses four parallel instances and each instance wants use four cores, you would run into a lot of contention if you only have four cores. It does not make a lot of sense.

 

Just repeating the same multiplications N times in parallel seems pointless. Once is enough! The compiler might actually decide to calculate it only once because the result does not change anyway, so maybe you are seeing a false speedup.

 

What is your final goal? What are your speed requirements?

0 Kudos
Message 17 of 24
(583 Views)

altenbach wrote:

 

Why are you attaching the same VI again?

i added a parallel loop in plain, the time delay increased about 40ms instead of decreasing


altenbach wrote:

 

Just repeating the same multiplications N times in parallel seems pointless. Once is enough! The compiler might actually decide to calculate it only once because the result does not change anyway, so maybe you are seeing a false speedup.

 


ok once is enough.
 

0 Kudos
Message 18 of 24
(578 Views)

@ssara wrote:
i added a parallel loop in plain, the time delay increased about 40ms instead of decreasing 

Then please give the VI a new name to avoid confusion!

 

Adding a FOR loop and autoindexing on the lower input is a completely different calculation than multiplying 2 2D matrices. With the FOR loop your are doing N matrix|vector multiplications and since the sizes don't match correctly, you are even getting error -20039. You are comparing the speed of the correct operations to the speed of a failed AND incorrect operation. Completely pointless!

 

 

0 Kudos
Message 19 of 24
(572 Views)

altenbach wrote:

 You are comparing the speed of the correct operations to the speed of a failed AND incorrect operation

oh am sorry i notice that and i corrected my vi in the add par for.vi in the previous post, in this vi the matrix A is divided in to 4 vector i.e the 4 rows in the result output are produced in parallel(is it right?), but  the time required is increased about 120ms (without par for was about 78ms), i expected to get less exe time.

is it the overhead time or what?

   


 

0 Kudos
Message 20 of 24
(558 Views)