12-10-2017 07:25 PM - edited 12-10-2017 07:26 PM
@rolfk wrote:
Your code example may actually have a possible flaw.
Yes, I mentioned that. I typically do an array sum or array max (after the timing sequence!) followed by autoinidexing of the resulting scalar. It does not seem to make a difference in my above code so I left it out.
12-11-2017 04:47 AM
Hi altenbach
i have LabVIEW 2016, please can u convert the attached AxB-Parallel002.vi to 2016
12-11-2017 06:01 AM
that is great,
indeed i optimized my vi to some what like urs but i wired the iteration to case structure, of coarse ur vi is the best.
in the time speed result, could i save the result for each execution (each #of points)??
12-11-2017 02:08 PM - edited 12-11-2017 02:42 PM
@ssara wrote:
i have LabVIEW 2016, please can u convert the attached AxB-Parallel002.vi to 2016
Here's 2015 version that you should be able to open. Make sure you have the MASM toolkit installed. (I also added a dummy output to definitely force the compiler to do all calculations).
12-11-2017 02:45 PM
try downloading again.
12-11-2017 03:21 PM - edited 12-11-2017 03:22 PM
@altenbach wrote:
try downloading again.
thank u altenbach for ur prompt feedback. It has been extremely useful.
but the beginning question is : can i optimize the speedup more??
can i use parallel for loop with MASM??
i used it with plain the exe time is less than Multicore!
any explanation for the mul mechanism ?
12-11-2017 03:42 PM
@ssara wrote:
but the beginning question is : can i optimize the speedup more??
can i use parallel for loop with MASM??
Why are you attaching the same VI again?
It seems even with only one thread, the MASM toolkit shows a slightly more efficient algorithm to do the multiplication, even within only one thread.
If you need to do several different of these multiplications in parallel, you might benefit from using a parallel FOR loop, but then you should set the threads to one for each. Doing outer and inner parallelization does not make any sense. If your loop e.g. uses four parallel instances and each instance wants use four cores, you would run into a lot of contention if you only have four cores. It does not make a lot of sense.
Just repeating the same multiplications N times in parallel seems pointless. Once is enough! The compiler might actually decide to calculate it only once because the result does not change anyway, so maybe you are seeing a false speedup.
What is your final goal? What are your speed requirements?
12-11-2017 04:24 PM
altenbach wrote:Why are you attaching the same VI again?
i added a parallel loop in plain, the time delay increased about 40ms instead of decreasing
altenbach wrote:
Just repeating the same multiplications N times in parallel seems pointless. Once is enough! The compiler might actually decide to calculate it only once because the result does not change anyway, so maybe you are seeing a false speedup.
ok once is enough.
12-11-2017 04:47 PM - edited 12-11-2017 07:46 PM
@ssara wrote:
i added a parallel loop in plain, the time delay increased about 40ms instead of decreasing
Then please give the VI a new name to avoid confusion!
Adding a FOR loop and autoindexing on the lower input is a completely different calculation than multiplying 2 2D matrices. With the FOR loop your are doing N matrix|vector multiplications and since the sizes don't match correctly, you are even getting error -20039. You are comparing the speed of the correct operations to the speed of a failed AND incorrect operation. Completely pointless!
12-12-2017 02:43 AM - edited 12-12-2017 02:44 AM
altenbach wrote:You are comparing the speed of the correct operations to the speed of a failed AND incorrect operation
oh am sorry i notice that and i corrected my vi in the add par for.vi in the previous post, in this vi the matrix A is divided in to 4 vector i.e the 4 rows in the result output are produced in parallel(is it right?), but the time required is increased about 120ms (without par for was about 78ms), i expected to get less exe time.
is it the overhead time or what?