Auto-indexing is slow for arrays with >1 dimensions

roblab3 · ‎03-13-2014

Hi,

I was looking at the performance of operations on all individual elements in 3D arrays, especially the difference between auto-indexing (left image) and manual-indexing (right image, calling "Index array" and "Replace array subset" in the innermost loop). I'm a bit puzzled by the results and post it here for discussion and hopefully somebody's benefit in the future.

Left: auto-indexing; right: manual-indexing

In the tests I added a different random number to all individual elements in a 3D array. I found that auto-indexing is 1.2 to 1.5 times slower than manual-indexing. I also found that the performance of auto-indexing is much more dependent on the size the dimensions: an array with 1000x200x40 elements is 20% slower than an array with 1000x40x200 elements. For manual-indexing there is hardly any difference. The detailed results can be found at the end of this post.

I was under the impression that auto-indexing was the preferred method for this kind of operation: it achieves exactly the same result and it is much clearer in the block diagram. I also expected that auto-indexing would have been optimized in LabView, but the the tests show this is clearly not the case.

What is auto-indexing doing?

With two tests I looked at how auto-index works.

First, I looked if auto-indexing reorganizes the data in an inefficient way. To do this I made a method "fake-auto-indexing" which calls "Array subset" and "Reshape array" (or "Index array" for a 1D-array) when it enters _every_ loop and calls "Replace array subset" when exiting _every_ loop (as opposed to manual-indexing, where I do this only in the inner loop). I replicated this in a test (calling it fake-auto-indexing) and found that the performance is very similar to auto-indexing, especially looking at the trend for the different array lengths.

Fake-auto-indexing

Second, I looked if Locality of reference (how the 3D array is stored in memory and how efficiently you can iterate over that) may be an issue. Auto-indexing loops over pages-rows-columns (i.e. pages in the outer for-loop, rows in the middle for-loop, columns in the inner for-loop). This can't be changed for auto-indexing, but I did change it for manual and fake-indexing. The performance of manual-indexing is now similar to auto-indexing, except for two cases that I can't explain. Fake-auto-indexing performs way worse in all cases.

It seems that the performance problem with auto-indexing is due to inefficient data organization.

Other tests

I did the same tests for 1D and 2D arrays. For 1D arrays the three methods perform identical. For 2D arrays auto-indexing is 15% slower than manual-indexing, while fake-auto-indexing is 8% slower than manual-indexing. I find it interesting that auto-indexing is the slowest of the three methods.

Finally, I tested the performance of operating on entire columns (instead of every single element) of a 3D array. In all cases it is a lot faster than iterating over individual elements. Auto-indexing is more than 1.8 to 3.4 times slower than manual-indexing, while fake-auto-indexing is about 1.5 to 2.7 times slower. Because of the number of iterations that has to be done, the effect of the size of the column is important: an array with 1000x200x40 elements is in all cases much slower than an array with 1000x40x200 elements.

Discussion & conclusions

In all the cases I tested, auto-indexing is significantly slower than manual-indexing. Because auto-indexing is the standard way of indexing arrays in LabView I expected the method to be highly optimized. Judging by the tests I did, that is not the case. I find this puzzling.

Does anybody know any best practices when it comes to working with >1D arrays? It seems there is a lack of documentation about the performance, surprising given the significant differences I found.

It is of course possible that I made mistakes. I tried to keep the code as straightforward as possible to minimize that risk. Still, I hope somebody can double-check the work I did.

Results

I ran the tests on a computer with a i5-4570 @ 3.20 GHz processor (4 cores, but only 1 is used), 8 GB RAM running Windows 7 64-bit and LabView 2013 32-bit. The tests were averaged 10 times. The results are in milliseconds.

3D-arrays, iterate pages-rows-columns

pages x rows x cols : auto    manual  fake
   40 x  200 x 1000 : 268.9   202.0   268.8 
   40 x 1000 x  200 : 276.9   204.1   263.8 
  200 x   40 x 1000 : 264.6   202.8   260.6 
  200 x 1000 x   40 : 306.9   205.0   300.0 
 1000 x   40 x  200 : 253.7   203.1   259.3 
 1000 x  200 x   40 : 307.2   206.2   288.5 
  100 x  100 x  100 :  36.2    25.7    33.9

3D-arrays, iterate columns-rows-pages

pages x rows x cols : manual  fake
   40 x  200 x 1000 : 277.6   457       
   40 x 1000 x  200 : 291.6   461.5 
  200 x   40 x 1000 : 277.4   431.9 
  200 x 1000 x   40 : 452.5   572.1 
 1000 x   40 x  200 : 298.1   460.4     
 1000 x  200 x   40 : 460.5   583.8 
  100 x  100 x  100 :  31.7    51.9

2D-arrays, iterate rows-columns

rows  x cols  : auto     manual   fake
  200 x 20000 :  123.5    106.1    113.2    
20000 x   200 :  119.5    106.1    115.0    
10000 x 10000 : 3080.25  2659.65  2835.1

1D-arrays, iterate over columns

cols   : auto  manual  fake
500000 : 11.5  11.8    11.6

3D-arrays, iterate pages-rows, operate on columns

pages x rows x cols : auto    manual  fake
   40 x  200 x 1000 :  83.9   23.3     62.9 
   40 x 1000 x  200 :  89.6   31.9     69.0     
  200 x   40 x 1000 :  74.3   27.6     62.2 
  200 x 1000 x   40 : 135.6   76.2    107.1
 1000 x   40 x  200 :  75.3   31.2     68.6 
 1000 x  200 x   40 : 133.6   71.7    108.7     
  100 x  100 x  100 :  13.0    5.4      9.9

VI's

I attached the VI's I used for testing. "ND_add_random_number.vi" (where N is 1, 2 or 3) is where all the action happens, taking a selector with a method and an array with the N dimensions as input. It returns the result and time in milliseconds. Using "ND_add_random_number_automated_test.vi" I run this for a few different situations (auto/manual/fake-indexing, interchanging the dimensions). The VI's starting with "3D_locality_" are used for the locality tests. The VI's starting with "3D_norows_" are used for the iterations over pages and columns only.

Norbert_B · ‎03-13-2014

@roblab3 wrote:

[...] I found that auto-indexing is 1.2 to 1.5 times slower than manual-indexing. I also found that the performance of auto-indexing is much more dependent on the size the dimensions[...]

Without reading all that stuff, that is expected behavior.

Your approach using auto-indexing essentially creates a copy of the complete array. Using shift registers, the in-placeness algorithm can take care that modification of the array elements is done in the array itself without the overhead of allocating and managing a copy.

Therefore i expect a difference which is depending on the size of the array (the bigger, the higher the difference).

Norbert

EDIT: As you simply modify data in an existing array, auto-indexing is the wrong way to approach this task. I am not aware that NI recommends this approach anywhere, but it is for sure stated somewhere as "you can do it like this".

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.

Darin.K · ‎03-13-2014

Autoindexing is preferred and likely as fast or faster than manual indexing. What you have (re)discovered is the important lesson that preallocating an array and replacing elements is faster than building an array. That part is dominating the time difference I suspect, not the autoindexing.

Edit: on my phone so I missed some of the post, by autoindexing I was referring to reading the elements.

roblab3 · ‎03-13-2014

Hi Norbert,

Thanks for your quick reply. It does make sense that replacing an element in place is faster than managing and allocating copies. I can however not say that I expected this behavior:

I'm an experienced user (but not an expert) and I had the strong impression that auto-indexing is just another way of indexing. Colleages had the same impression. I also couldn't find any documentation about how auto-indexing works behind the scenes or how that impacts the decision when to use auto-indexing.
LabView also strongly encourages auto-indexing by enabling it by default when wiring an array into a for-loop and by making everything involving indexing rather tedious.

Thanks,

Robbert

Norbert_B · ‎03-14-2014

Robert,

the copy-thing is not specific for auto-indexing. It is common for all tunnels. A tunnel is first of all a unique data space.

This sounds hard, but there is an optimization in the compiler trying to reduce the number of copies the VI will create. This optimization is called "in-placeness".

The in-placeness algorithm checks, if the wire passing data to the is connected to anything else ("branch"). If there is no other connection then the tunnel, chance is high that the tunnel won't create an additional copy.

Speaking of loops, tunnels always copies. The in-placeness algorithm cannot opt that out.

You can do a small test to elaborate on this: Simply wire "0" (or anything less than the array sizy of the input array) to the 'N' terminal.......

Norbert

PS: Auto-indexing is perfect for a "by element" operation of analysis without modification of the element in the array. If you want to modify values, use shift registers and Replace Array Subset.

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.

Yamaeda · ‎03-14-2014

One test iis to autoindex the reads, but use the replace for the writes as an extra comparison.
/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

LabVIEW

Auto-indexing is slow for arrays with >1 dimensions

Auto-indexing is slow for arrays with >1 dimensions

Re: Auto-indexing is slow for arrays with >1 dimensions

Re: Auto-indexing is slow for arrays with >1 dimensions

Re: Auto-indexing is slow for arrays with >1 dimensions

Re: Auto-indexing is slow for arrays with >1 dimensions

Re: Auto-indexing is slow for arrays with >1 dimensions