LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 
Reply

how to creat the best chunk size for parallel iteration of for loops?

Yes, I read the help of For Loop.

 

"LabVIEW partitions loop iterations into chunks consisting of loop iterations. With parallel iterations enabled, processors execute chunks simultaneously to improve execution speed. By default, LabVIEW schedules chunks by size from larger to smaller. Executing larger chunks first decreases scheduling overhead, while executing smaller chunks last decreases processor idleness. You should programmatically configure chunk size only if the For Loop would benefit from an iteration schedule different from the default, such as a schedule that executes smaller chunks before larger chunks."

 

Yes, I search the key word on NI Website.

 

http://www.ni.com/tutorial/9393/en/#toc1

"If you choose the Specify partitioning with chunk size (C) terminal schedule, you must wire a chunk size to the (C) terminal. Consider the total number of iterations when selecting the chunk size. If the chunk size is too large, it will limit the amount of parallel work available. If the chunk size is too small, it will increase the amount of overhead incurred by requesting the chunks.

For finer control over the chunk sizes, you can wire an array of chunk sizes to the (C) terminal. For example, if you know that the first iterations of the loop take longer than the last iterations, you may want to create an array with small chunk sizes at the beginning to prevent the first chunks from containing too many long iterations and with large chunk sizes at the end to bundle the short iterations together. If you wire too many chunk sizes, LabVIEW ignores the extra values. If you wire too few chunk sizes, LabVIEW uses the last element in the array to determine the size of the remaining chunks of iterations."

 

However, I am still puzzled with the chunk size definition. For example, I have a size of 10000 array should be multiplied with 5. So what is the best chunk size array or number to reduce the cost time as much as possible?

 

1.png

Message 1 of 14
(1,807 Views)

Pretend you are a scientist + Engineer.  Write a small test routine, try chunk sizes of 2, 5, 10, 20, 50, 100, 200, 500, 1000.  Time everything.  Measure everything.  Write it up and post it here, not only answering your own question, but providing useful information to the rest of us.

 

Bob Schor

Message 2 of 14
(1,776 Views)

The example above is very simple and it is only to explain what I want to do.

Actually, I have a more complex VI and already test it like you said many times. No clear results are found.

0 Kudos
Message 3 of 14
(1,766 Views)

Use En Masse operations, rather than indexing, multiplying, and then re-assembling an array.

Speed of En Masse Operations

 

Time things yourself and answer your own questions:

What Time is it?

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com


Blog for (mostly LabVIEW) programmers: Tips And Tricks

0 Kudos
Message 4 of 14
(1,764 Views)

What do you mean, "No clear results are found"?  

  • If you run the same test twice, do you get wildly different numbers?  [May suggest that the test code is seriously flawed].
  • If you run the test with different chunk sizes, are the results largely the same?  [May suggest that either the chunk size is irrelevant, in which case "it doesn't matter" and you can use whatever is convenient, or you aren't testing what you think you are testing].
  • If you run the test with different chunk sizes, are the results different, but with no discernable pattern?  [May suggest that you need to test with other values, or your test code is seriously flawed].
  • Is this an important question, i.e. is it vital that you optimize this code?  If the answer is "No", then don't worry about it, unless you are just curious (not in itself a bad thing).  If the latter, you'll get a lot more out of figuring it out for yourself.

Bob Schor

0 Kudos
Message 5 of 14
(1,740 Views)

I have a general rule that I follow when looking at such questions:

Let LabVIEW decide.

In other words, unless I am sure (and there are cases where I am sure) that I can arrange it better, then i give LabVIEW all the freedom that I can, trusting that it will do the best job.

A key point in the help text is this:

You should programmatically configure chunk size only if the For Loop would benefit from an iteration schedule different from the default,

 

Unless I have a strong reason to suspect that I can do better than the default, then I wouldn't try.  Just enable parallelism and watch it go.

 

If you DO have a strong reason to suspect that you can, then perhaps explaining it here would help both you and us to arrive at a good answer.

 

If your motivation is just curiosity, nothing wrong with that, but be aware that there is no universal answer.  What improves one condition might degrade another.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com


Blog for (mostly LabVIEW) programmers: Tips And Tricks

0 Kudos
Message 6 of 14
(1,736 Views)

Consider your example:

 

Unless I am mistaken, you are asking processor 0 to do 6000 of these multiplications, processor 1 to do 3000 of them, and processor 2 to do 1000 of them.

Unless your CPU is such that Processor 0 is much faster than the others, then there's no reason to do this.  The default will probably split it 2500 | 2500 | 2500 | 2500 and that's probably as good as you can get.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com


Blog for (mostly LabVIEW) programmers: Tips And Tricks

0 Kudos
Message 7 of 14
(1,731 Views)
Highlighted

@CoastalMaineBird wrote:

Consider your example:

 

Unless I am mistaken, you are asking processor 0 to do 6000 of these multiplications, processor 1 to do 3000 of them, and processor 2 to do 1000 of them.

Unless your CPU is such that Processor 0 is much faster than the others, then there's no reason to do this.  The default will probably split it 2500 | 2500 | 2500 | 2500 and that's probably as good as you can get.


You are slightly mistaken,  That code request that the for loop operates on a chunk of 6000 iterations, then a chunk of 3000 iterations, and finally the last 1000 iterations  the processor load is divied up "Automatically."

0 Kudos
Message 8 of 14
(1,714 Views)

The cost time is a function of chunk size. Cause chunk size could be a number or array which means the variable and the amount of variable are uncertain, we have the relation below:

CT = f(x1,x2,x3...xi)

here CT is the cost time, xi is the element of the chunk size array and i is the size of the array.

Duo to no clear suggestion is given by NI, I have to test infinite times and can't find the optimum.

If the CT is the function of only a number, like CT = f(x), then I can test the most used situation.

 

For example, (100,200,300) is the best chunk size found after testing 1000 times, but maybe the 1001 try will do better than (100,200,300). We don't know.

 

By the way, the cost time is vital for me and if the code can reduce 10-20% time, that would be very helpful. But sorry that it is impossible for me to share the code which is banned by my company.

0 Kudos
Message 9 of 14
(1,696 Views)

That's the question I am confused badly.

We user still do not know which way is LabVIEW to divide the itration from LabVIEW help.

 

Is there any NI engieer to make it clear?

The the code is running on quad core computer and chunk size array (6000,3000,1000) means:

A: 6000 iteration works on processor 0, 3000 on processor 2, and 1000 on processor 3. OR

B: 6000 iteration works first on processor 0-3, then 3000 iteration works on processor 0-3, and 1000 works last.

 

Before we find the best chunk size, we should figure the question above first.

0 Kudos
Message 10 of 14
(1,692 Views)