キャンセル
次の結果を表示 
次の代わりに検索 
もしかして: 

multicore toolkit


@rolfk wrote:

Your code example may actually have a possible flaw.

Yes, I mentioned that. I typically do an array sum or array max (after the timing sequence!) followed by autoinidexing of the resulting scalar. It does not seem to make a difference in my above code so I left it out.

0 件の賞賛
メッセージ11/24
3,382件の閲覧回数

 


Hi altenbach

i have LabVIEW 2016スマイリー 悲しい, please can u convert the attached AxB-Parallel002.vi to 2016   

 

0 件の賞賛
メッセージ12/24
3,372件の閲覧回数

that is great,

indeed i optimized my vi to some what like urs スマイリー ウインク but i wired the iteration to case structure, of coarse ur vi is the best.

in the time speed result, could i save the result for each execution (each #of points)?? 

0 件の賞賛
メッセージ13/24
3,362件の閲覧回数

@ssara wrote:

 

i have LabVIEW 2016スマイリー 悲しい, please can u convert the attached AxB-Parallel002.vi to 2016   


Here's 2015 version that you should be able to open. Make sure you have the MASM toolkit installed. (I also added a dummy output to definitely force the compiler to do all calculations).

 

Bench2.png

メッセージ14/24
3,348件の閲覧回数

try downloading again.

0 件の賞賛
メッセージ15/24
3,340件の閲覧回数

@altenbach wrote:

try downloading again.


thank u altenbach for ur prompt feedback. It has been extremely useful.

but the beginning question is : can i optimize the speedup more?? 

can i use parallel for loop with MASM??

i used it with plain the exe time is less than Multicore! 

any explanation for the mul mechanism ? 

0 件の賞賛
メッセージ16/24
3,333件の閲覧回数

@ssara wrote:

but the beginning question is : can i optimize the speedup more?? 

can i use parallel for loop with MASM??

 


Why are you attaching the same VI again?

 

It seems even with only one thread, the MASM toolkit shows a slightly more efficient algorithm to do the multiplication, even within only one thread.

 

If you need to do several different of these multiplications in parallel, you might benefit from using a parallel FOR loop, but then you should set the threads to one for each. Doing outer and inner parallelization does not make any sense. If your loop e.g. uses four parallel instances and each instance wants use four cores, you would run into a lot of contention if you only have four cores. It does not make a lot of sense.

 

Just repeating the same multiplications N times in parallel seems pointless. Once is enough! The compiler might actually decide to calculate it only once because the result does not change anyway, so maybe you are seeing a false speedup.

 

What is your final goal? What are your speed requirements?

0 件の賞賛
メッセージ17/24
3,326件の閲覧回数

altenbach wrote:

 

Why are you attaching the same VI again?

i added a parallel loop in plain, the time delay increased about 40ms instead of decreasing


altenbach wrote:

 

Just repeating the same multiplications N times in parallel seems pointless. Once is enough! The compiler might actually decide to calculate it only once because the result does not change anyway, so maybe you are seeing a false speedup.

 


ok once is enough.
 

0 件の賞賛
メッセージ18/24
3,321件の閲覧回数

@ssara wrote:
i added a parallel loop in plain, the time delay increased about 40ms instead of decreasing 

Then please give the VI a new name to avoid confusion!

 

Adding a FOR loop and autoindexing on the lower input is a completely different calculation than multiplying 2 2D matrices. With the FOR loop your are doing N matrix|vector multiplications and since the sizes don't match correctly, you are even getting error -20039. You are comparing the speed of the correct operations to the speed of a failed AND incorrect operation. Completely pointless!

 

 

0 件の賞賛
メッセージ19/24
3,315件の閲覧回数

altenbach wrote:

 You are comparing the speed of the correct operations to the speed of a failed AND incorrect operation

oh am sorry i notice that and i corrected my vi in the add par for.vi in the previous post, in this vi the matrix A is divided in to 4 vector i.e the 4 rows in the result output are produced in parallel(is it right?), but  the time required is increased about 120ms (without par for was about 78ms), i expected to get less exe time.

is it the overhead time or what?

   


 

0 件の賞賛
メッセージ20/24
3,301件の閲覧回数