Request: Make these VIs faster for large arrays

altenbach · ‎01-07-2011

Ben wrote:

Spoiler

psssst.. Christian. I think you can stop now.

Ben

Hehe, I am actually just getting started... just kidding. 😮

Fortunately, all the algorithms I used here were already developed and polished over the last 10+ years, so writing these Vis took literally minutes. It was also nice to have a known good working (though slow) code for quick verification. One problem was the fact that the input matrix was square, so I had to delete a few columns to veryify that I did not mess up the index order. (well, there is a 50% chance to get it right the first try and a 100% change to get it right on the second try 🐵 That's probably the only debugging I did 😉

If somebody want to take a stab at this, here are some ideas for improvments.

Currently, the matrix datatype and output are both I32. The input could possibly be U16 (depending on the upper limit on the number of elements (The sum should not overflow!), and the output could be selected e.g. as I8 or I16, depending on how many possible addressable positions actually exist. In any case, the memory footprint could be reduced by 50%, possibly with a speedup.
The data could even be stored with 8 position/byte and manipulated using lookup tables. That would probably be slightly slower but would use much less memory.
As I said already, the problem could easily be split into independent parts to keep multiple CPU cores busy, with probably a near n-fold speedup (minus some small overhead for splitting and reassembling).
I suspect that Lucither's code has a higher penalty with debugging enabled. With debugging disabled his code might gain more relative to my code. I have note tested. If several algorithms are within a factor of two in speed, I don't worry and go with the simpler code. 😉

LabVIEW Champion.

dthor · ‎01-07-2011

@altenbach wrote:

One problem was the fact that the input matrix was square, so I had to delete a few columns to veryify that I did not mess up the index order.

In the dataset I gave it's square, though in reality it rarely is. Nice check 🙂

The input can be changed to U8, as it's just an array of single-digit numbers describing what is going to be done to that location (not probe, probe, ink, smash to bits, etc.). The two outputs should be I16, I can't believe I didn't see that before. I'll change those.

altenbach · ‎01-07-2011

dthor wrote:
The input can be changed to U8, as it's just an array of single-digit numbers describing what is going to be done to that location (not probe, probe, ink, smash to bits, etc.). The two outputs should be I16, I can't believe I didn't see that before. I'll change those.

If you change the input to U8, you need to do the counting differently, because summing U8 would overflow at 256 elements.

It can be done without converting the array to a larger representation (which would nullify the size advantage of the original U8 input. ;))

LabVIEW Champion.

Lucither · ‎01-07-2011

Thanks for the extra advice Altenbach. Had another look at my vi this morning with fresh eyes. I managed to incorporate the relative calculation into my main loop:

After benchmarking again i get an average time of 80ms!!!! compared to the 306ms i was getting and the 255ms of yours (Thats with the same data input as shown above). I was so shocked that i immediately thought i had made a mistake. To verify i quickly inserted your vi next to mine and 'Equalled' the outputs. We both had the same output which put my mind at ease (of course, we both could have wrong outputs )

Next step for me is to look into CPU management as you touched on earlier. Have never done that, so there is another thing to learn. Thanks again for your input, have learnt a lot about array management.

Rgs,

Lucither.

------------------------------------------------------------------------------------------------------
"Everything should be made as simple as possible but no simpler"

altenbach · ‎01-08-2011

Yes, looks pretty clean. Good job. 🙂

There are still a few extra operations in the inner loop that are not needed, for example since we look for relative movements, the two "+1" in the inner loop can be eliminated. Instead do a "-1" on the column size before the loop and on the relative seed, also before the loops. Same result.

LabVIEW Champion.

altenbach · ‎01-08-2011

Lucither wrote:
After benchmarking again i get an average time of 80ms!!!! compared to the 306ms i was getting and the 255ms of yours (Thats with the same data input as shown above).

Ok, we seem to have similar hardware, you code also gives me 80ms. (mine is a core2Duo T7600). However, don't give up yet.

I made a few minor changes to your code and I got it down to <10ms (another 8+ times faster!!).

Beat that! 😄

(Since I really (really!) want to spoil your weekend, I'll wait a little bit before posting the code... ;))

LabVIEW Champion.

Lucither · ‎01-08-2011

**bleep** you!!!, Just got back in. Give me at least another hours

------------------------------------------------------------------------------------------------------
"Everything should be made as simple as possible but no simpler"

Lucither · ‎01-08-2011

Right, I havent give in just yet. Have got mine down to 55ms.:

The main thing i did to lower the speed was over-write the 'Last Value' with the latest value rather then using 'Build Array' This gave me a vast drop. All the other changes i have made have had little impact. Am really struggling to see how

I made a few minor changes to your code and I got it down to <10ms (another 8+ times faster!!).

Is possible. How minor are these changes? My head is hurting. Dont reveal the answer just yet, am still mulling over it. Maybe a pointer wouldnt go amiss though

Rgs,

Lucither

------------------------------------------------------------------------------------------------------
"Everything should be made as simple as possible but no simpler"

altenbach · ‎01-08-2011

Well, you must be in Hawaii, because I am in California and it's past 1am. 😮

It is quite a minor change and I do not understand the reason for the speed gain, but you might be on the right trail. Focus on the inner loop, of course.

(Also note that I am using the 2010 improved compiler, which could also make a difference. If I downconvert my fastest (9ms) version from LabVIEW 2010 to LabVIEW 8.2, it takes about 20ms (or about half the speed). I have not tested how LabVIEW 2009 stacks up but it could be slighly slower than 2010. So, as a first step, try to go below 20ms or so... 😉

(Doing a very "similar" adjustment to my original code version brings that one down to ~24ms on the same (100x loop) input data. That's another obscure hint....)

LabVIEW Champion.

Lucither · ‎01-08-2011

Well you have a good night. I actually live in Thailand, its only 5 in the evening here. Im about to walk the dog (Hopefully a flash of inspiration might come to me during that), i will then re-assess after this.

Rgs,

Lucither.

------------------------------------------------------------------------------------------------------
"Everything should be made as simple as possible but no simpler"

LabVIEW

Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays

Re: Request: Make these VIs faster for large arrays