From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

how to shift elements in an array up or down without changing its size?

Well, yes. Arrays in LabVIEW are contiguous in memory, so having large data strctures can lead to out of memory errors due to memory fragmentation, especially if memory is constantly reallocated and large arrays shoveled around. The number of allocation dots (for arrays) when showing buffer allocations is one telltale sign if this is happening and my code has significantly fewer such dots. This was my only criterion when I wrote it.

 

If operating on large arrays, you have few other choices. How do you want to keep the "data small" if it is not. Everything else being equal, keeping a large contiguous array in a fixed memory position (e.g. a DVR or a simple shift register), never reallocate or copy, and do all operations in-place (maybe with a few scratch array of much smaller size (e.g. one row or column)) will most likely beat anything that has the same data scattered over many smaller data structures. Do you have an example to the contrary?

 

I mentioned that my code can also be fully parallelized, potentially gaining a few factors of two depending on hardware. It is possible that the penalty of the original code is less than expected because of better use of SIMD by the built-in array functions. It might be interestion to try it on a much older LabVIEW version (pre SSE2) to see how the compiler has changed. 😉

 

Yes, my code probably still has a lot of slack left. Feel free to improve it. As I said it was just a quick draft.

0 Kudos
Message 21 of 41
(1,167 Views)

Parallelizing code running on several processors will create memory traffic and synchronization overhead and there is no clear line between when it is beneficial and slows you down especially when you compute your way through the data in tight/fast loops. Finding this intersection can be tricky. Benchmarking your code helps.

 

This is why many modern games actually run faster on a single core CPU. Trying to parallellize stuff that better should be optimized to fit in the CPU cache is a better way forward. Specially those Xeons with an insane amount of cache. Heck, even this core i7 have up to 8MB of cache. Now, that's a bloody large array or picture.

 

Yes, it is contigous, then you do a transpose on your array, or iterate on a column basis, suddenly you find the runtime pointer-bouncing all around memory, eventualy trashing cache (fetching from RAM), and it runs slow(er).

 

Fragmenting large 2d/3d arrays into smaller row/column based chunks isn't what I ment. This is a blindingly obvious case of using multidim arrays as DVR's, just as in your example. I was thinking more in a general program structure terms.  

 

After a context switch your fast threads/loops/modules/classes/arrays should be "blitted" into the cache, if they have been trashed in previous operations, and executing there without any excessive trashing, such as what can be the case with convoluted/complex programs. Having aligned memory allocations helps the OS/processor to determine what to prefetch, etc.

 

Br,

 

/Roger

 

0 Kudos
Message 22 of 41
(1,164 Views)

A good read about cache handling that for sure can be applied to LV programs.

 

http://techpubs.sgi.com/library/dynaweb_docs/0640/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch06.htm...

 

Br,

 

/Roger

 

0 Kudos
Message 23 of 41
(1,159 Views)

Hi all,

Thanx a lot for the answers.

I had already got that using empty arrays in the "in-place element" structures was not a good idea because of memory reallocation 😞 At least, I've learned something today about inplaceness 😉 As I'm not involved in computational work, I'm not used with huge amounts of data ; actually, I've never coded with "in-place element" before...

On my 8-Core machine, the best performances are achieved with For loops without parallelization for left & right rotations.
For up & down rotations, even the initial code is faster than the one with non-parallelized For loops ! And the gain with parallelization is not clear compared to the in-place element structure (cf. attached test VI). But I'm also not used with parallelization problems...

Best regards,
HL

0 Kudos
Message 24 of 41
(1,131 Views)

Yes, a good general idea is keeping the code as simple as possible, avoid fancy stuff such as inplaceness, parallelism, unless you really need them from a CPU performance or memory requirements perspective. These principles are usually the final touches to "cool the hot spots" when your overall architecture is lean and optimal dataflow.

 

Br,

 

/Roger

 

0 Kudos
Message 25 of 41
(1,126 Views)

The compiler has become so sophisticated that it is really impossible to tell without doing the actual benchmarking. For example, just changing the array to DBL will change the ranking slightly.

 

That being said, I don't claim that my code is especially optimized. I am sure there are more efficient way possible. 😄

0 Kudos
Message 26 of 41
(1,126 Views)

altenbach wrote:

That being said, I don't claim that my code is especially optimized. I am sure there are more efficient way possible. 😄


 

OK, here's an UP version of the "for loop" variety that is about 4x faster. I am sure it can be further optimized.

 

Also the other transformations could probably be rewritten similarly. Try it! 😄

 

 

 

 

 

Download All
Message 27 of 41
(1,115 Views)

Is the code even correct? Before we can assess the performance, the code must work correctly.

I didnt get it to work for some cases. Maybe I just didn't do it right?

 

I leave it for you to test yourself. Attached is a VI.

 

Br,

 

/Roger

 

0 Kudos
Message 28 of 41
(1,110 Views)

Which code and which cases?

 

(I would eliminate the outer shift register to keep the result static for better comparison).

0 Kudos
Message 29 of 41
(1,107 Views)

@altenbach wrote:

Which code and which cases?

 

(I would eliminate the outer shift register to keep the result static for better comparison).


Based on yours I presume? I unhid the 2d array and put a loop delay to watch the rotation

 

Without structure rotate left & right. Try them, they aren't that many cases.

 

I attached it.

 

Br,

 

/Roger

 

0 Kudos
Message 30 of 41
(1,104 Views)