I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

mk@opc · ‎07-31-2015

Hello everybody,

In my VI I have some code that looks like this --

Each array is 2D sized 512 by 640. The time it takes to run this entire chain is around 28miliseconds. Seems fast, but I need to cut it down to around 10ms! I have tried to make them 1D arrays, but that only saves around 2ms. I even tried to chop up the 1D array into half and duplicate the code and then combine at the end (a half-hearted attempt at multi-core optimization) and that showed zero improvement...

I suppose my last option (please let me know if there are other options) is to compile a DLL that will do this for me.

I am a bit confused on how I am supposed to implement the array multiplies in C, however.

Do I have to run through for(;;;) loops and multiply each element one-by-one??? It seems like that would run much slower than whatever LabView is doing internally to do this element-wise multiplication.

Thanks!!

MK

nathand · ‎07-31-2015

If you're not sure how you'd do it in C, then I'm not sure why you think that moving it to a DLL will make it faster. In C, you'll either need to do the operations sequentially in a for loop, as you mention, or find a library for parallelizing that operation. There's no built-in function in C for multiplying arrays.

One thing you might try in your LabVIEW code is to do the multiplies in a for loop, where you do one addition on each iteration and store the result in a shift register. Also store the current power of the original array in a shift register. That should cut down on the number of copies of the array that you need, which may in turn speed up the code. So, you'd initialize the "sum" shift register with the first array, and the current array power shift register with the results of the subtraction (or, alternatively, with an array of the same size, filled with 1's, depending on the order in which you want to do things). Multiply that array by the first multiplier, add it to the sum, and multiply the power array by the original (subtraction), storing it back to the power shift register. Repeat until you're done.

mk@opc · ‎07-31-2015

Do you think the for loop will be faster or not? That's a good idea, to find a parallelized option, maybe on my GPU...

I will give the shift register method a try, that actually sounds like it might work well.

Thanks for the tips!!! Have a great weekend 🙂

mk@opc · ‎07-31-2015

Well, I got it to work with the for loop in the dll, but there is no noticeable improvement in runtime... darn

mk@opc · ‎07-31-2015

@nathanD,

Is this the configuration you had in mind, I must be missing something because my output is wrong... but also it runs almost twice as slow!!!

Thanks again for any help 🙂

nathand · ‎07-31-2015

Yes, that's what I had in mind, with a couple of changes:

1) in place of the "zeros" array, use the array that gets added at the end (outside the loop) as the initial value of the shift register

2) convert the result of the subtraction to double-precision outside the loop, rather than allowing the automatic coercion to do it inside the loop (possible this gets optimized out, I don't know).

3) don't use a case structure inside the for loop. Use either a 3-D array, auto-indexed on the for loop, or an array of clusters where each cluster contains a single element, that element being a 2-D array.

4) this last one is a bit obscure, but swap the position of the inputs to the "add" node inside the for loop. I seem to recall reading that math nodes will reuse the upper input as the output when possible, and in this case you want that to be the shift register.

I also wouldn't bother with the "ones" array as a control; use the "initialize array" function instead. You might change the other arrays to constants, if they won't change (or at least if the user won't change them from the front panel).

I don't actually know that this will be faster, it was just a theory worth testing. It would be interesting to take a look at the buffer allocations in both versions. Can you upload your code, preferable for LabVIEW 2012?

Another things you can do to make it run faster: in the VI properties, under the Execution category, disable debugging or set to subroutine priority (which also disables debugging).

mk@opc · ‎08-03-2015

Hi Nathan,

With your tips, I was able to get it down to 12ms execution time!! Superb!! For other people reading this, the biggest improvement came from changing the arrays to constants.

Thanks again 🙂

MK

nathand · ‎08-03-2015

Well, now you've got me curious - what happens if you replace the array controls in your original version with constants? (If you have time to test it.)

I suppose for testing purposes I could duplicate your code fairly quickly since the actual array values should be irrelevant. I'm always interested in how to speed up code, and which tricks work well.

mk@opc · ‎08-03-2015

The original code dropped down to 19ms from 22ms, the shift register and auto indexed array certainly helped a lot too

the final code looks like this:

By the way, If I make it a subroutine I can't run it!! Do I have to call the .vi from another .vi? That makes it run a lot slower (took like 80ms...) so I just set it to "high priority" and disabled debugging too.

nathand · ‎08-03-2015

mk@opc wrote:

By the way, If I make it a subroutine I can't run it!! Do I have to call the .vi from another .vi? That makes it run a lot slower (took like 80ms...) so I just set it to "high priority" and disabled debugging too.

Thanks for the timing numbers. Can you show the larger code that shows how you were measuring the performance?

Yes, a subroutine can only be called from another VI. Running it as a subVI should introduce minimal overhead if you're doing it properly. One thing that will help is to set all the input terminals to "required". It seems like a good candidate to make into a subVI.

LabVIEW

I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.

Re: I want to convert my large array-multiply logic into a faster called library, need some advice on the C code.