odd memory/performance bug

Dave_Thomson · ‎01-20-2010

I ran across an odd performance bug. I have submitted it to NI Tech Support and it has been assigned to CAR 202394, but I post it here as a warning and curiosity. It appears in at least 8.6.1 and 2009. Example VIs are attached.

Briefly, the VI in question looks at a large data set of 4 channels, 1,000,000 points in each channel. It looks for events in Ch 0, and optionally, in Ch 1, where an event is a signal that exceeds a threshold. This VI is a state machine, so in the init case, I index out Ch 0, and optionally Ch 1, so that they can be used in further states without having to index them again each time. I put them on their own shift registers. To save memory, if Ch 1 triggers are not used, I put an empty array on that shift register. The performance issue appears depending on how I select between the Ch 1 array and the empty array.

Open SP2 Scan Data.vi. Run it. (Required inputs are saved as defaults.) It's pretty quick. Change the "Secondary Trigger" parameter from -1 (not used) to 1. Run it. Even without a timing loop, you can see it takes a lot longer. Open SP2 Scan Data good.vi and run it. Change the "Secondary Trigger" parameter from -1 (not used) to 1 and run it again. Still pretty quick. Using the Profiler, you can see that the "good" version is 10 times faster than the other, when Ch 1 triggering is enabled.

The difference in the code is that the "good" version indexes Ch 1 from the big array outside of the case structure. The other has the index inside the case structure. The "bad" version should, at first glance, be superior, since when Ch1 triggering is not used, the extra copy of data is never generated.

In any case, it is interesting that both versions work. The "good" version demonstrates that it is possible to put the array on the shift register and use it efficiently. The performance degradation happens in later states, not in the state that puts the array on the shift register. But HOW it is put on the shift register makes a huge performance difference. Once a copy of the data has been made and has been assigned to the shift register, why should it matter how that copy was generated?

Here's a thread that has some similarities, but I can't quite connect the dots:

http://forums.ni.com/ni/board/message?board.id=170&thread.id=191622&view=by_date_ascending&page=1

Regards,

DaveT

-------------------------------------------------------------
David Thomson Original Code Consulting
www.originalcode.com
National Instruments Alliance Program Member
Certified LabVIEW Architect
Certified Embedded Systems Developer
-------------------------------------------------------------
There are 10 kinds of people: those who understand binary, and those who don't.

Ben · ‎01-21-2010

That is a bug in the inplaceness algorith alright!

If you wire the array THROUGH the case structure, it will recognize that it can get a copy from the buffer in the SR.

Without wiring it through, LV is making a copy and THEN taking the subset (speculation).

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

Dave_Thomson · ‎01-21-2010

Ben,

Thanks for pointing that out! I tried your suggestion (as you probably already had) and found it to be quite correct. That's a variation I hadn't tried before.

Apparently, I don't understand in-placeness well enough yet. The thing that really bothers me at this point is that the VI slows down in all the subsequent state machine cases - after the 1D sub-array has been placed on the shift register. E.g. there are three or four different ways to organize the code in the "Initialize" case to index the 2D array and put the 1D array conditionally on the shift register. All of them succeed in putting a copy of the 1D data on the shift register. But the the rest of the state machine suffers AFTER the data is put on the shift register. Once the shift register has been loaded, a naive assumption would be that the rest of the code would run the same, regardless of how the shift register was loaded. There must be some very subtle memory management going on. The shift register isn't really loaded, but rather some memory is reserved and an in-placeness algorithm is invoked to use that memory, but then when it gets to the later state machine cases, it can't use that inplaceness so it makes more copies... Something like that?

DaveT

-------------------------------------------------------------
David Thomson Original Code Consulting
www.originalcode.com
National Instruments Alliance Program Member
Certified LabVIEW Architect
Certified Embedded Systems Developer
-------------------------------------------------------------
There are 10 kinds of people: those who understand binary, and those who don't.

Ben · ‎01-21-2010

Hi Dave,

I stopped looking* after I tried the change I shared.

I don't think anybody outside the Ivory Tower fully understand the inplaceness algorithm, mainly because it is undocumented out the tower and teh tower reserve the rights to change it any time it wants.

There is another case where you are passing the 1-d arrays to a sub-VI. I think the wire-through of the arrays feeding it is worth trying.

Aside from that look for the classic "Data copies on wire branches".

Ben

* I gave up looking for things after I found them years ago. It was anticlimaxtic.

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

Brian_Powell · ‎05-21-2010

I would be curious to know whether this change (adding a "copy dot" (officially, "Always Copy") primitive) improves performance as well.

If you bring up the help window and look at the data types of the wires before and after the copy dot (or between your original VI and Ben's modification), you will see that one is a subarray, and one is a real array.

A subarray can't be inplace to a real array (which is what the constant is).

When you fork the wire inside the case structure (Ben's modification), the Array Index can't create a subarray output (since someone else is consuming the original 2D array). This is effectively the same as what my version with the copy dot is doing; it's just explicit in the conversion from subarray to real array.

I didn't do any actual performance tests to compare Ben's version to mine. I'd be interested if one way is faster.

Why you would think this is obscure is beyond me. 😉

Brian

Message Edited by Brian Powell on 05-21-2010 02:42 PM

LabVIEW

odd memory/performance bug

odd memory/performance bug

Re: odd memory/performance bug

Re: odd memory/performance bug

Re: odd memory/performance bug

Re: odd memory/performance bug