LabVIEW Idea Exchange

spatry · ‎07-15-2013

I've been working with large arrays, and I've found that wire branches are killing my performance. In order to alleviate this I've scattered inplace structures all over, however, the only way I have to access the array size without incurring a copy of the array is to track it separately, and access that size property, this seems pretty wasteful.

I can think of two ways to implement this. The first is to add an array size block to the inplace element structure, this would be awkward to use. So, I suggest the the array size node be made inplace, as mocked up below.

Thanks

AristosQueue (NI) · ‎07-18-2013

> AQ: Ok, I hear you. That just makes it quite unpredictable when a data copy will actually occur.

The data copy occurs before the VI even begins running. The Replace is run during constant folding. Thereafter, the values of the buffers are available every time the VI executes without needing to make additional copies.

SteenSchmidt · ‎07-18-2013

> The data copy occurs before the VI even begins running. The Replace is run during constant folding.

> Thereafter, the values of the buffers are available every time the VI executes without needing to make additional copies.

Ok, so it is a static malloc then. No problem for me in that case. My concerns are runtime performance not absolute memory footprint.

1) Do the "black squares" always mean a static malloc, or will it sometimes by a dynamic one at runtime?

2) Will this memory be allocated when you load the VI, or when you invoke it to run? I'd guess the latter to not waste memory when you invoke reentrant clones and never get to use the master reflection...

3) Will each and every "black square" result in a definite malloc, while it's the actual data copy (usage of that buffer) that may or may not happen?

4) You have spoken about "half black squares" earlier (3x6 pixels if I remember correctly) - those are definite static data copies, right?

/Steen

CLA, CTA, CLED & LabVIEW Champion

AristosQueue (NI) · ‎07-18-2013

1) They mean space has been reserved for a piece of LabVIEW data. This could be the value of a control/indicator, or a copy of an upstream value, or an allocation for some sort of pointer sharing scheme (i.e. a subarray, which avoids copying a whole array by allocating a small bit of data that says, "that's me, over there, from index a to index b"). When that data copy is made (and how many times that copy is made during a program's execution) is LabVIEW's discretion.

2) The space for the top-level data size will be allocated as part of load. So an int32 will allocate 4 bytes... a string will allocate space for a pointer. Any hair hanging off of those points (i.e. a non-empty string value) will be allocated as needed by the compiler, perhaps at load, perhaps at constant folding (when run begins) and perhaps during the execution of the VI. The buffer allocation shows where space exists for any value of the type. The specific value that exists there is updated as the program loads and executes. Buffer allocation dots do not say *anything* about when the value in that buffer allocation changes.

3) No, each black square will not be a malloc. For one thing, there's black squares for int32s and doubles. For another, there are places where buffers are available in case they are needed to compensate for parallelism or dynamic invocations (think Call By Reference).

4) No. The half-black are the pointer spaces, left as null generally but reserved in case we need to make a copy for a dynamic dispatch VI if the actual VI we dispatch into does not match the memory layout of the parent VI -- the caller is optimized to the layout used by the parent VI. Most override VIs end up matching the parent. Occassionally -- bordering on rarely -- an override will do something exotic and we have to make an extra data copy of something that is considered read-only in the caller in order to give the override VI a memory address it can modify freely.

RyanLV · ‎07-18-2013

Ok, as a developer on the Compiler, I'm here to clear up a lot of the confusion in this thread and back it up with what's actually going on in the situations, not just speculation.

Comments on spatry's original diagram:

AristosQueue and Darren are correct with their assesment of the original diagram. The array is not being copied.
AristosQueue is also correct in saying that "the semantics of a wire branch are that it copies" is not quite correct. That is an appropriate way of thinking about diagrams from a semantic point of view, but the compiler is free to eliminate copies as long as it preserves the values nodes will see. Nodes such as Array Size do not modify the array, which is why no copy needs to be made in that example. Some nodes, such as Replace Array Subset, do modify the array which may cause a similar branch to make a copy. So the lesson is branches sometimes cause copies. I would love to be more specific on when it will cause a copy, but because there are many factors that go into that decision I can't. My advice would be to use the Show Buffer Allocations feature when you suspect a branch is causing a copy resulting in a performance problem.

D_Smith's diagrams are a bit trickier... and harder to explain if you don't know some further details and some somewhat arbitrary decisions the compiler makes.

First off, the example with the sequence structure:

Altenbach is correct that the copy is because of constant folding. The Init Node is constant folded (It has its input and output wires partially surrounded with the grey insulation when Tools | Options... | Block Diagram | Constant Folding | Show constant folding of wires/structures are selected). There are often copies made on the border of constant folded and regular code. The computed values on this border must be cached in a buffer so that the constant folded code doesn't have to be rerun. The reason I say often is because as long as the regular code doesn't modify the value, then no copy needs to be made.
The comment "Remember that the dots merely mark possible buffer allocations," is misleading. The dots always show buffers. However, there are a few nodes where the buffers are only conditionally used (for example, Index Array has a buffer that's only used when the index is out of bounds). Most of dots, however, represent buffers that are always used. Unfortunately we also show some false positives. For example, the buffer after a Reverse 1D Array represents an actual buffer, but it is a constant sized buffer not proportional to the size of the array.

D_Smith's example without the sequence structure is much more interesting. It highlights some common problems experienced with the Show Buffer Allocations feature.

Without the sequence structure, all three nodes are constant folded. The border of the constant folded code is at the Array Indicator. How many copies of the array are there? That depends on a couple of factors

Is debugging enabled? If so, LabVIEW will hold onto a copy of all of the intermediate steps. This is solely to show those values on the off chance that the wires are probed. Debugging does not play well with a few optmizations that LabVIEW tries to perform, and the interplay between debugging and constant folding is one example. SteenSchmidt figured this out indirectly by changing constants to controls which removed constant folding from the equation. This brings me to a couple of great guidelines: Turn off debugging to improve performance. With the caveat that debugging is really useful while you're still developing and debugging issues. That being said, when investigating performance issues, by profiling and/or using Show Buffer Allocations, make sure the the VIs are configured correctly.
Getting passed that hurdle, there's also the question of whether the Array Indicator is on the connector pane. If it is, then a copy will be made because LabVIEWs SubVI calling mechanism. If it's not on the connector pane, then the array is only shown on the VIs Front Panel, which doesn't cause any allocations on the Block Diagram. There are a couple of copies made of the array due to UI reasons, that are beyond the scope of this post.
It wasn't in any of these diagrams, but I just wanted to point out another optimization that can also cause confusing results. Dead Code Elimination will cause the compiler to not generate any code nor buffers for nodes that don't use their outputs (or cause side effects). So if you were to delete the Indicators, then (again as long as debugging is off) all the dots will go away.

SteenSchmidt asked a great question: "How does LabVIEW select between making an extra data copy and serialization when solving a race condition of parallel operations? Say an Index Array and a Replace Array Subset in parallel operating on the same input array, they could use the same instance of the array if the Index Array was performed before the Replace, but if the Replace happened before the Index, then you'd need a data copy at the wire branch before these two parallel ops. Is this hardcoded or will it depend on some parameters at runtime?"

In the situation where the array is being modified by one node and it could possibly be read by another node in parallel, LabVIEW will always make a copy. Depending on the situation, this is sometimes advantageous. The loss in time of copying the array is gained by doing possibly expensive things in parallel. Sometimes the copy is more expensive. LabVIEW's compiler tries to guess right, but will sometimes be wrong.
I should clarify what I mean when I say 'could possibly be read in parallel.' LabVIEW does not necessarily execute nodes in parallel even if there are no data dependencies. In D_Smith's example, LabVIEW will always execute the Array Size and Replace Array Subset in sequence. This is because it knows that they are relatively cheap operations and the overhead of doing context switches would hurt performance. In general, LabVIEW uses a heuristic to figure out which nodes to exeucte in parallel. When deciding buffer allocations, the compiler takes into account this heuristic and will only cause the copy when it says the nodes may be run in parallel.

In response to SteenSchmidt's last questions, AristosQueue answers were generally on the mark. But here's a little added info. LabVIEW's Memory model is a fickle thing: each VI allocates all buffers at once as it loads. But, the "buffer" for an array contains a pointer. The data that pointer points to may be dynamically allocated each time the node is executed. Usually the first time the node is executed the dynamic allocation is made. Then additional executions of the node can reuse that first dynamic allocation as long as the array's size doesn't change. This is what I imagine is happening when you say that your application has reached "equilibrium." However, if a node sees alternately big and little arrays, then each execution will require a trip to the memory manager, which will cause jitter for the application. While Show Buffer Allocations can be helpful, there is a much better tool for debugging this kind of performance problem, The Real Time Execution Trace Toolkit. http://sine.ni.com/nips/cds/view/p/lang/en/nid/209041 The resources tab on that page has more information.

Dragis · ‎07-19-2013

This is a great set of information RyanLV. The next question is, how hard would it be LabVIEW to show a different version (color, pattern, etc.) for buffers that are only needed because of debugging to make it easier to separate the two cases?

AristosQueue (NI) · ‎07-19-2013

Dragis: I suspect that would be quite hard because, in the simplest case, we would have to dual compile every VI, once with debug on and once with it off -- a full compile would be the only way to see the results of all the optimizations and simplifications that LV can apply. And even doing that dual work would not necessarily help you because in many cases, the dots would not disappear but would move around instead. The simplest case is a VI with no subVIs. When we turn off debugging on a subVI, it might change the inplaceness of the subVI's connector pane, which would impact a caller. When we compile a caller, would we show its diagram using the current debugging state of the subVI or would we show it assuming the whole hierarchy was compiled with debugging off?

smithd · ‎07-23-2013

Stephen--Thanks for the clarification.

Edit: just saw the 3rd page--thanks to everyone for the discussion 🙂 Huge huge thanks to Ryan!

Now...I want to ask the hard question...How much of this is documented? If its documented, where? To my mind, this is the sort of information that any user should be able find, and yet four decently advanced LV users were not quite right about everything. Is this our (the users) failing, or is it our (NI's) failing for not providing the documentation?

AristosQueue (NI) · ‎07-24-2013

TL;DR summary: I don't think we have a problem of documentation. Nor do I see this discussion as any sort of failure on anyone's part.

Details: We do document the basics of buffer allocations, and we have a LV course that users can take if they want more. When we talk about buffer allocations in general, we can provide information. But the reasons for any specific buffer allocation require a full knowledge of the LabVIEW compiler, every optimization, every calling circumstance, and unless you're a trained member of the compiler team, you're unlikely to work it out from inspection of code and a list of rules. There are heuristics we can provide for those users who need to tweak their code's performance, but nothing algorithmic. And the details shift -- often radically -- from one LV version to the next.

In this thread, we're talking about some pretty bleeding edge details of a particular buffer allocation. It is not a subject to be documented in the same way as other features because it isn't meant to be human comprehensible -- it's a deep analysis that the compiler does, weighing many many factors. I have no problems walking customers who ask through it when the topic comes up, but without a full debug rig set up to delve into the compiler for each and every VI, it is hard for mere humans to say why any given buffer dot exists in any particular place. And the complexity will continue to rise in future LV versions. Ultimately, memory optimization is not a task for human beings to perform, and as time goes by, fewer and fewer customers have reasons to care. It's always (at least until AI exist) going to be important to be able to do the deep spelunking, but more and more of those cases will be done hand-in-glove with a member of LV R&D on behalf of customers with particularly severe performance problems in edge and corner cases. This same situation exists in every other programming language with managed memory that I can name. For example, the tome for the .NET CLR is about three inches thick, much of which is spent on the details of memory management. LabVIEW's situation is a bit easier since we use a pre-planned allocation/deallocation system instead of a dynamic garbage collector, but it still doesn't fall to the level of "human can do this."

SteenSchmidt · ‎07-24-2013

My 2 cents;

I find the current level of compiler documentation satisfactory. For the exact same reasons that Stephen points out. The compiler changes and must be allowed to do so. the resources that would be tied up in maintaining a public document describing each and every corner case (if such a beast would even be managable) is much better spent at development, in my opinion. And when something turns out to be a problem, or just works unexpectedly, the recent years it has never been a problem for me getting a satisfactory answer from you guys at NI. This time the knots just happened to get sorted out in the Ideas Exchange

My philosophy is really that when I find a gold ore I use it sparingly. The gold ore in this case is the deep level at which NI is prepared to inform us, the users, when we ask nicely. Demanding everything documented would kill that ore in a second.

The candidness of NI has actually made me much more accepting of the flaws and limitations of LabVIEW, as I understand much better the prioritations behind the design. To the extent where it probably annoys some of my customers - I often tell them something to the extent of "Don't worry, it'll get sorted eventually, let's work around it for now." Me being a more and more experienced software developer myself also helps. So all in all, yes, someone might be furious because some information wasn't available in written form, but those people would by all probability not have read that information in the first place anyway, had it existed. That's my experience.

/Steen

CLA, CTA, CLED & LabVIEW Champion

smithd · ‎07-24-2013

Sorry, looking back my question was just slightly more inflammatory than I meant it to be. I guess a better way of putting it would be: "is this something NI should document so that users can better understand it or are we crazy for looking this closely at the optimization and we, as the end users, should generally leave labview be"...or something like that.

Anyway, I think you answered my actual question, even if I didn't quite ask it 😉

For my part, I generally agree with all of your points, I just wanted to get everyone else's take on it.

LabVIEW Idea Exchange

Inplace Array Size