LabVIEW Idea Exchange

smmarlow · ‎09-15-2010

After deciding to post an idea for a "parallel" structure, a search revealed the idea for a Parallel Execution Structure has already been proposed by gvholland here.

I gave my kudos to this idea because I believe it would be very useful. In order to make a parallel structure even more useful, I propose adding some features that would make it more convenient for those of us who might use it in code that must execute in parallel for performance and functional reasons. It has been commented on the other thread that parallel code should be placed in subVI's, and I concur with this view. However, there are instances where this is either inconvenient or impracticable. Consider the following example:

An application needing to perform simultaneous PID control on 32 channels must execute in parallel (only 8 channels shown for clarity):

Now quadruple the number channels in this scheme, and you can have a pretty big diagram with lots of wires. Also consider the routine task of initializing that “clustosaurus” or “classosaurus” as in this example:

We've all probably tried the scheme wherein we put a case structure inside a FOR loop and wired the iteration terminal to the case selector, as in these examples:

That's clean and easy, and allows the user to create instances of the reentrant VI by duplicating cases. But that architecture forces the vi's or code to execute sequentially. The new parallel FOR loop can boost performance of these techniques, and create parallelism. But I would like a basic parallel structure that cleanly handles some routine tasks by adding some useful I/O nodes, ala the InPlace Element Structure.

I propose the following structure, or something similar:

This structure is drawn here with some proposed I/O nodes and tunnels. This is by no means the complete set of I/O that might exist, but rather a starting point.

Cluster unbundle/bundle node:

This node accepts only “brown” clusters, or clusters of Booleans. The elements are passed to each frame in corresponding index order, element 0 to frame 0, and so on. Once added to the structure, a single unbundle/bundle terminal pair appears in each frame. Much like a bundle function that has its center terminal wired, the bundle terminals may be left unwired. The values of unwired elements remain unchanged. Any cluster wired to this node must have the same number of elements as the parallel structure has frames. If not, the wire is broken.

Array index/replace node:

This node auto indexes an incoming array and provides a replace array element node on the right. Note there is no index value IO as with the IPE, since the parallel structure auto indexes the array and distributes/replaces the elements across the frames. If an array has less elements than the number frames at run time, the node returns default data for the undefined elements, exactly as an index array function does, but the structure returns a warning or error (I can’t decide which). The output array would always have the same number of elements as the structure has frames, or the same number of input (can't decide which) . The replace element node on the right must be wired in every frame, just as a replace array element structure must have all of its exposed elements wired.

Cluster unbundle/bundle by name node:

This node is tricky, but I decided to take a stab at it anyway. The node is created and visible on both sides of the structure. However, unlike the IPE, the unbundle/bundle terminals on either side can be of different sizes and element selections, and can optionally be unused on either side, or both sides, within the individual frames. Unused terminals appear with the same symbol as the center terminal of a bundle function, as shown in the proposal drawing. If an element is selected for bundling within a frame, then it is unavailable for bundling in all other frames.

Indexing and non-indexing tunnels:

Non-indexing tunnels function somewhat like they do on a sequence structure. Input tunnels provide data to all frames, non-indexing output tunnels may only be wired in one frame. Unlike sequences, however, the data arriving at output tunnels would be free to flow out of the structure immediately, which will seem weird, and violates the "whole structure must complete" convention. But remember, this is a parallel structure. Like sparks shooting off the bolts in the monster's neck while it's alive, it's gonna be be weird by default.

Indexing tunnels are different. Like the auto-indexing node, auto-indexing input tunnels distribute the array elements across the frames. If the array size is smaller than the number of frames, the frames either execute with default data, or the undefined frames don’t execute, and the structure returns an error or warning (help me define this). Auto-indexing output tunnels behave like output tunnels from case structures; either all frames must be wired, or the tunnel must be configured to use default data if unwired. Unlike the non-indexing output tunnel, data from this tunnel is not available until all frames have completed execution.

Error I/O Nodes:

There are error inputs/outputs for the structure as a whole, and for each individual frame. The structure error IO is situated in the lower left and right corners, naturally. The frame input and output terminals can both be optionally hidden or exposed in each frame, and also slide independently of each other up and down the left and right sides of each frame in which they are exposed. The structure distributes the incoming error among the exposed frame error input terminals, and merges the frame output error values to the structure output terminal, along with any messages generated by the structure itself.

So what do you do with this “Frankenstructure”? Well, here are a couple of the aforementioned examples rewired using this hypothetical beast:

Of course there could be other cool things, like a CPU core selector for the frames, etc. Just let your imagination, (or nightmare, depending on how you see it) run wild!

smmarlow · ‎09-15-2010

Sorry for the smushed example images at the end. The preview had them side-by-side.

JackDunaway · ‎09-16-2010

Two questions:

1. If the Process Variables are arrayed, why are parameters of the Process Variables (Setpoints and PID Gains) not also arrayed?

2. What is in cases 1 through 7 in the very bottom PID example?

Jim_Kring · ‎09-17-2010

Sorry for the terse response (to your extremely detailed idea), but what you've described/designs sounds a lot like a Parallel For Loop (which, admittedly, I haven't played around with yet). Cheers -Jim

Let's talk about the future of LabVIEW...

smmarlow · ‎09-22-2010

@JackDunaway Sorry the example is not more clear, but the additional frames contain all the reference initializations in the 'clustosaurus example pic', and there should be 0..7 frames instead of 9 frames total. As for why the parameters are not arrayed, I just wanted to show an example of how the structure could operate on a cluster, and because the original example pic used clusters. They might very well be arrayed. If they were, an indexing tunnel could be used. It's true you could use a parallel FOR loop for the PID example, but you would have to nest a case structure to hold the multiple instances of the reentrant PID vi. I don't like this solution, and believe LabVIEW should have a Parallel Exeuction Structure dedicated to the task.

@Jim Kring You are correct. It is a lot like the parallel FOR loop, but I believe it is (or would be) much more powerful. Please reconsider the descriptions of the I/O terminals, which borrow some of their functionality from the InPlace Element structure. The parallel loop structure cannot operate on most iteration dependent data (data passed from one iteration to the next in a shift register), because it cannot guarantee the order of exectution of the parallel iterations. Neither would this or any other parallel structure, but the I/O terminals, particulary the cluster unbundle/bundle by name, would help manage the code going in and out, and allow the parallel frames to operate on the same data structure in memory. How would you use a parallel loop structure to update the elements of an incoming cluster? Also, you have to nest a CASE structure in the parallel for loop if you want different code at different parallel iterations, a messy solution in my opinion. This proposed structure overcomes some of these problems with the specifically designed I/O terminals I have outlined. I admit they may be far fetched, but I wanted to make a start and maybe spark a discusion. A parallel programming language without a dedicated parallel execution strucure seems incomplete to me. I don't think a modified FOR loop quite makes it.

Thank you both for commenting on this idea. As two well respected developers in the community, I value your input.