02-22-2007 10:10 AM - edited 02-22-2007 10:10 AM
Message Edited by shoneill on 02-22-2007 05:11 PM
02-22-2007 10:43 AM - edited 02-22-2007 10:43 AM
Interesting. I never run across this because my sanity always prevented me from resizing arrays inside clusters. 😉
What is surprising is the fact that LabVIEW 8.20 claims that the entire inner loops are "folded" (see image).
Message Edited by altenbach on 02-22-2007 08:43 AM
02-23-2007 01:45 AM
02-23-2007 06:15 AM - edited 02-23-2007 06:15 AM
Hi Shane,
I spent about 2 hours looking at your example and could not figure it all out.
The differnce between 6.1 and latter I can not address since I no longer have 6.1 at home.
To get a proper understanding I will have to compare the performance with the "show buffer allocations" display and work up individual test were we can compare various methods and get some numbers on each variation.
I will try to return to this Q this week-end if my schedule permits.
This is what I can say now.
We are measuring to mant variables in your examples.
In the attached 7.1 VI I have moved the indicator updates to outside the time structure. I can not rule out LV attempting to update the GUI for test 1 while test 2 is running.
After doing that I get these two rather dramatic effects
The red circle note the beffer allocations
To continue I would alos like to to test the cluster performance using in-place operations. the build array and other non-in-place operators are forcing us to measure the amount of time required for LV to allocate larger buffers and this is blurring our ability to measure the cluster work alone. I don't even know if the inplaceness algorithm is even involved.
Those are my thought for now. I'll post more if I run across any other discoveries.
Just as perplexed as you are,
Ben
Message Edited by Ben on 02-23-2007 06:16 AM
Message Edited by Ben on 02-23-2007 06:16 AM
02-23-2007 08:47 AM - edited 02-23-2007 08:47 AM
Message Edited by shoneill on 02-23-2007 03:48 PM
02-23-2007 11:14 AM
Hi Shane,
For naked cluster try wiring the cluster around to the for loop to the bundle so the same buffer can be re-used.
I also think that if LV 8 sees a constant wired to the replaced, it may fold the code (see Christian observation) Replace the array elements with the index (just to defeat constant folding).
I am not going to be able to turn my attention to this riddle for quit a while.
My sister-in-law has gone on to meet the "Supreme Wire-Worker" yesterday so my attention will be demanded elsewhere.
Please share what you find,
Your brother in wire,
Ben
PS The answer is probably staring us in the face when we show buffer allocations.
02-23-2007 12:16 PM
02-24-2007 12:01 PM - edited 02-24-2007 12:01 PM
Hi Shane,
Attached is a revised version of your “Cluster pointers 8 6.1.vi” saved as 7.1.
The changes I made were;
1) The GUI updates could happen while other tests were running, move it to happen after all testing was done.
2) Add a default case so case “0” does not get special treatment.
3) Remember result (data) from each method so that the output buffer work is the same for all. Note: I believe LV will skip transferring data to an output tunnel data buffer of indexing is not enabled until the last iteration.
4) Used index value to as replace element to prevent constant folding clouding the measurements.
5) Wired the cluster around on method 1 to tell LV it was OK to re-use the input buffer as our output.
After a few runs I noticed my No-op was taking about as much time as my other best. This implied that the control logic ( selecting which method) and over-head (filling input and output buffers) was dominating the measurements. I tweaked the test parameters to invoke the control logic less often and beat the code we are trying to characterize harder. I saved my defaults (warning: Due to method 4 a test run takes forever).
This is how I read it.
All methods use an input Buffer “A” and an output buffer “B”. This includes the default method. The default method required about 300 ms on my machine to fill the input buffer and transfer it to the output buffer.
All methods that only required an input and output buffer ran about the same speed. I suspect under my default settings, the measurement time are indicative of the time required to fill input buffer and fill output buffer. To get a better measurement of the time require to execute each method I will have to tweak my measurement parameters again. Since method 4 is so inefficient I will stop using it my tests. Before I forget abou this method I will venture some guesses about why this si so bad.
The SR is realized by working in the input buffer. Each iteration copies the contents of all of the buffers in “A” to “C” and back to “A” again. No wonder this takes so long!
Now for a suprising issue.
Compare your method #2 and #5
And then #3 and #6
The differnces appear to be tht in the case of unbundled vs unbundled by name. In the case of the unbundled by name we pick-up an extra buffer copy to fill the buffer that is allocated for the SR.
Q : Why isn’t buffer “A” used to support the SR for the “by name” version?
Summary;
Building a cluster with large arrays (method 4) is costly.
Something weird is happening with unbundled/bundle by name.
Further study will be required to measure the performance of the buffeb A-B versions.
I’ll post more when I know more.
Ben
Message Edited by Ben on 02-24-2007 12:01 PM
02-24-2007 12:14 PM - edited 02-24-2007 12:14 PM
And if you deletemethod 4 and wire the cluster through you eliminate the need for the output buffer!
And of course the performance jumps due to less buffer copying.
Ben
Message Edited by Ben on 02-24-2007 12:14 PM
02-25-2007 08:42 AM