From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Replace array subset in parallel for loop without increasing memory

I am working with a large set of data (if I am not careful I crash the program because the computer can’t allocate enough to Labview) in an array and would like to loop through and modify portions of the array at a time.

I have implemented the code just fine in a regular for loop, but can’t seem to find a way to parallelize it and still convince the compiler I am not going to step on myself with other loop iterations and maintain my memory footprint.

 

Basically the data is really a bunch of different arrays with a wide range of lengths but putting it all in a 2D array will drastically increase the amount of memory required.

So instead I have a 1D array of data (all concatenated back to back) and another array containing the length of each sub-array, and for ease of use a third array containing the starting locations of each sub-array.

 

In each loop I want to grab off a sub-array, do some manipulations to it, and then put it back in the same memory location. I have tried a few different approaches but not been able to find the right solution. I could use replace array subset and a shift register but I get a "dependence between loop iterations" error. I could also just peel off my various sub-arrays and concatenate them back together on the way out of the loop but I believe this will double the memory required.

 

I have included a sample code of one of my various attempts that is similar in nature but vastly simplified from what I am trying to accomplish.

 

Ultimately I am doing the work to make sure there are not actually any dependencies between the loop iterations but Labview doesn’t know that. Is there a better way to do this, or can I tell labview to ignore this warning and compile anyway (preferably only in this instance)?

 

Any help is greatly appreciated.

Thanks

TestParallel.png

 

 

0 Kudos
Message 1 of 53
(3,305 Views)

Hi tshurtz,

 

can I tell labview to ignore this warning and compile anyway (preferably only in this instance)?

No.

 

but can’t seem to find a way to parallelize it and still convince the compiler I am not going to step on myself

You cannot use shift registers for parallized loops. Point.

 

It seems your "array in" contains several chunks of different size (?) and you want to edit the chunks. From point of parallizing the loop you may change your data structure to an array of cluster of array, so you can apply autoindexing…

Best regards,
GerdW


using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019
Message 2 of 53
(3,276 Views)

Just remove the shift registers and In-Place elements and let LV handle the memory. I.e. let it autoindex and rebuild the array, often the compiler is smart enough to work in place anyway.

If it's that size of array memory isn't a problem. If you get problems later it's probably due to data duplication.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 3 of 53
(3,271 Views)

I'm with GerdW here: you should be using an array of clusters with an array in the cluster.  Then you just index the element you want, unbundle, do you manipulation, bundle and replace array subset.  You also will not need to maintain two other arrays, saving you even more memory.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 4 of 53
(3,258 Views)

How is an array of clusters of arrays smaller than a single array of values and an array of lengths? Each of the cluster's array will need a pointer to the start plus a length and each cluster in the outer array needs a pointer as well right? so there is more overhead, not a ton but I can't see it being better from a memory point of view. I have thought about this format in general but it is less desirable for many other portions of my code.

 

0 Kudos
Message 5 of 53
(3,249 Views)

I don't know this but I think it will not be able to reuse the memory in this case as the indexing of the large array cannot be done automatically, and the different loops think they will need the full array, so at least during execution the memory will double, and I really am pushing that threshold.

0 Kudos
Message 6 of 53
(3,247 Views)

Haven't tested, so I suggest you test this. Below is a hack to get parallelization without significantly changing your code too much. Have not tested memory footprint nor execution speed.

 

mcduff

 

snip1.png

 

 

 

0 Kudos
Message 7 of 53
(3,235 Views)

What makes you think that parallelization would give you any performance boost here? How many CPU cores do you have?

What if you would split the array in a few large chunks, process in parallel, and reassemble? You know that you could resize the "array split/replace" and sort several parts in parallel.

 

Can you explain the purpose of all this data shuffling? I assume that the real code make more massive computation than just sorting, right?

0 Kudos
Message 8 of 53
(3,232 Views)

@mcduff wrote:

Haven't tested, so I suggest you test this.


The IPE(DVR) will block concurrent access to the DVR data and each parallel instance will need to wait. The IPE(DVR) can only execute serially. Nothing gained. (Haven't tested it. Feel free to verify my claim ;))

 

See point #4 here.

 

QUOTE:  "The In Place Element Structure has a pair of nodes for Data Value Reference Read/Write for dereferencing and rereferencing the data, respectively. This structure blocks the execution of other structures using the same reference until it is finished and the data has been rereferenced."

Message 9 of 53
(3,215 Views)

Here's equivalent code that can be parallelized. It is also arguably easier to read. 😮

 

(I have not benchmarked if parallelization really gives a speedup. Not sure about the overhead of that concatenating tunnel ;))

 

SubsetProcessing.png

Message 10 of 53
(3,209 Views)