LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
jspinozzi

Iteration Parallelism NOT tied to processors

Status: Declined

Any idea that has received less than 2 kudos within 2 years after posting will be automatically declined.

I have code I want to run in parallel, I've confirmed that if I write the code 3 times to execute in parallel there is no problem doing it.  Yet when I try to put the code once in a for loop with iteration parallelism on, set to 3 parallel instances, it will only run 2 at a time because my target cRIO only has two processors.  I suppose this is a requirement if we want the code to be truly parallel. But in my case, I'm satisfied with them running pseudo-parallel using whatever behavior happens when I write the code 3 times in parallel and they appear to execute in parallel.  So like a timed loop, I'd like an option to set iteration parallelism to be either a) targeted to available processors OR b) just let them be launched in parallel asynchronously regardless of number of processors, and perhaps not truly parallel (with appropriate warnings).  See attached image

 

Note - Specific to Softmotion: I realize there are theoretically other ways to do this asynchronously built into Softmotion, but they did not execute as expected.  Yet this is just about the loop iteration parallelism.

18 Comments
X.
Trusted Enthusiast
Trusted Enthusiast

Loop parallelism is the wrong architecture for this kind of hardware control task. You could spawn clones handling one axis each, and provide them with their own parameters when you start a task (that can be done serially very quickly) in one way or another (global variable, queue, etc).

jspinozzi
Member

Not sure what you mean by wrong architecture... seems brilliant to me as handling loop operations like indexing array running the same code and recollecting the results is what the For Loop does, and it would correctly run those tasks back-to-back in a For Loop... then someone had the great idea to make it run those loop operations in parallel when possible, handled with just a setting on the loop.  I could certainly Run VI Asynch but why?  have to make sub-VI's I don't need otherwise, and more code to distribute the settings wait for completion, and collect the results from each VI.  Regardless ANY code of any nature that can go in a loop can be run in parallel (like calculations or file operations) but I see no reason they have to be targeted to processors, thus I'd like to see this setting to run pseudo-parallel. 

X.
Trusted Enthusiast
Trusted Enthusiast

You want the "parallelize" option to mean "start one thread per iteration" (with an upper limit on the number of threads).

Reading the "Performance Tips" section of this note: http://www.ni.com/tutorial/9393/en/#toc3, it seems to me that NI makes a distinction between threads for parallelized loop and "regular" threads. I think the idea is that they will give you ONE thread per core for your parallelized loop, preserving the rest of the available threads (in each core) for the remainder of the execution. Giving your more than one per core would probably be harder to manage. But I am totally out of my element here...

Intaris
Proven Zealot

I'm going to kudos this (in fact I already have - does that make me a liar?).

 

This is the kind of simple, intuitive interface a graphical programming language can give us.  I ahve also had times where I wanted to do a series of parallel asynchronous tasks (Like set an instrument setpoint and wait for completion -  assuming the code can even run in parallel, that's a given).  The option to simply place this code in a parallelilised FOR Loop is a great option.

 

Having said that, if NI can give us a different structure / option to do this as elegantly as the proposed idea, then I'm allf or that also.

robdye
NI Employee (retired)

This Parallel For Loop pattern is surprisingly popular given that the ParFor was principally designed for data parallelism. Using it to spawn small numbers of parallel tasks that essentially run their own infinite (or almost) loops turns out to be a handy way to avoid duplicating code and to make your program scale somewhat arbitrarily.

 

And indeed the ParFor does support this, and it works as long as the code inside the loops is completely "cooperative" and doesn't block other instances from making progress. I built a small example that demonstrates that it does work (see attached picture), so there must be something else going on in your code. And we probably should also ask how you determined that only two instances were running simultaneously. It is possible that all three are actually "running" but one is blocked from continuing because of resources the other two instances are not relinquishing.

 

Of course, the subVI within the loop must be reentrant, which I assume you have done otherwise you would not be seeing even two instances run in parallel.

 

Also, the LabVIEW execution system must also have enough threads available to run the parallel instances. LabVIEW generally allocates thread pools to be at least twice the number of logical cores (or four threads, whichever is greater). So on your dual core CRIO, we should have at least 4 threads available to run the code in your program. (Actually, if I recall correctly, LabVIEW RT may be configured to use 4x the number of cores.)

 

In addition, everything that you call within the loop must be "cooperative". This includes drivers that you may be calling, and the hardware that they utilize and control. Your diagram indicates that you are passing I/O task IDs into the subVI. It is possible that the driver calls you are making are only capable of driving two pieces of hardware simultaneously. Or perhaps you only have two instances of the hardware being driven and they are no capable of being multi-tasked.

 

Check these things and let us know what you find.

ParalleTasksExample.png

robdye
NI Employee (retired)

My apologies. In my previous post I failed to explain how your original diagram DOES run three instances in parallel, but the ParFor loop does NOT.

And, in fact, I'm not sure I can explain the behavior your describe. If you have the Desktop Execution Trace Toolkit, it may reveal more about what is going on.

I would ask for you to describe the actual behavior of both diagrams in more detail. Does the device under control move smoothly and simultaneously along all three axes in the first diagram, but not in the second? Over what time duration does this occur?

BTW, you should also check to make sure that the Iteration Parallelism configuration of the ParFor loop has at least 3 generated loop instances.

IterationParallelismDialog.PNG

jspinozzi
Member

Sorry guys I wasn't clear maybe.  The Loop Parallelism is genius and I use it all the time on Windows, where I do maybe 5 tasks in parallel, there are plenty of cores and they happen in parallel and I'm happy.  If I were to run 1000 math operations, it will do 8 at a time for an 8-core processor, do that 125 times, and execute faster and I'm happy.  But on a cRIO which has only 2 processor cores, I wanted to run 6 instances in parallel (6 threads) so that my 6 motors will home simultaneously but I can monitor the result of each separately, but instead it homes 2, then 2 more, then 2 more... because the threads are tied to the physical cores and there are only 2.  I assume that's done this way so they can be truly parallel.  But I can get exactly what I want when I copy the code 6 times and they're not dataflow dependent, so they execute in parallel, (or what appears to be parallel, since I assume they multiplex attention between the 2 processors, but as long as they all run at the same time I'm fine with that).  So the feature I'd prefer would be to setup a loop with the code in it, request 6 parallel iterations, it would say "but there are only 2 cores" and I would say "that's OK, do the best you can".  So the problem is that being bound to the available processor cores, I cannot do more than 2 parallel ForLoops at a time on my cRIO. I'd like the option to turn that off and have the magic of LabVIEW parallel execution just happen.

jspinozzi
Member

I re-read the comments above and can clarify further.  I can see two motors move until they're home, after that two more start homing, after that two more.  It's clear in the Help that this is intended behavior as it's tied to the number of logical processors (or maybe even physical processors, I can't remember), but it's for the cRIO it's 2 processors.

 

And the code is cooperative because if I copy the code 6 times and run it non-dataflow dependent then I get exactly the behavior I want.

 

So to be even more clear - the requested feature is to have an option to request more threads of a Parallel ForLoop than the number of processors.  I have no reason to think it's not possible because the system CAN support the behavior, I consider targeting physical processors a preference that I'd like to select like we do with TimedLoops.

AristosQueue (NI)
NI Employee (retired)

jspinozzi: maybe the loop frames just aren't yielding to each other. Let's say your frame of your For Loop has two nodes in sequence. Try putting a Wait Milliseconds primitive in between those nodes. It might force the frame to yield and allow the cooperation. Some of our targets are conservative in how they segment clumps (which are the atomic units of operation for cooperative multitasking).

Untitled.png

(Note: I'm strictly guessing here. Robdye is the expert on this stuff; any suggestions put forth by me are from general ideas about the way LV works, not from any specific understanding of the parallel For Loop or our cooperative multitask system.)

robdye
NI Employee (retired)

My long and eloquent response was just evaporated by the forum post snarfer. Smiley Mad

jspinozzi: I will message you with more info. I suspect this problem will be one of {bug | user_error | tech_lead_brain_fart}.