I'm validating Ben's finding on our grid* right now... may take over night depending upon where I end up in the request queue (nightly run already kicked off). It doesn't surprise me, though. I can easily imagine no one having gone back to optimize the loop code when the feedback nodes were introduced. That's a low level enough operation that it would be hard for it to show up in a hot-spot analysis, even if it is being pounded on, just because the window is small and you'd have to know that all the bits of assembly code are all generated by the same source. Those are notoriously hard to identify with anything other than a person going, "You know, I wonder if we thought of this..." So, thanks, Ben, for being the wonderer. 🙂
* the grid is a bank of 40 different machines that we load the same code on that allows us to take the results in aggregate to rule out spurious timings caused by hiccups in an OS or system configuration options; also allows us to retest LV every night to make sure no changes injected performance slowdowns unexpectedly in key benchmarks.
So, the first batch of data is in. Definitely NOT seeing any sort of 10x time difference in LabVIEW 2015. Yes, the Feedback Node comes in fastest, but that's 1 nanosecond faster on average out of 100+ nanosecond operation. I've attached my benchmark VIs. (VIs are saved back to LV 2012.)
a) what version of LV are you testing?
b) is there something particular you're doing with the shifted/fedback data that might be relevant?
|Test name ▾||Metric Type||Build: 15.0b62 (13226)||Iterations per run||Total test time|
|Feedback Node||Time||112.85ns +/-0.000s||1,048,576||1.31m|
|For Loop||Time||114.63ns +/-0.000s||1,048,576||1.34m|
|While Loop||Time||113.52ns +/-0.000s||1,048,576||1.32m|
Second (and final) batch of data is in... this is from our higher performance machines. Still very tiny advantage to Feedback Node (0.3 nanoseconds at worst), but at this point, not something I think we'd spend much time rooting through the compiler to ferret out. Still nothing like a 10x difference.
|Test name||Metric Type||Build: 15.0b62 (13227)||Iterations per run||Total test time|
|While Loop||Time||70.48ns +/-0.000s||1,048,576||43.89s|
|Feedback Node||Time||70.31ns +/-0.000s||1,048,576||44.27s|
|For Loop||Time||70.60ns +/-0.000s||1,048,576||43.88s|
FYI, just to avoid questions:
The last column is "total test time". I probably should not have included that here. You'll notice that in the second run, the Feedback Node test had a longer total running time than the other two, despite being shorter for the iterations. That's the jitter I was talking about earlier when I described our grid -- something hiccuped on one or more machines giving us a total test time for that one that was longer. The total time is the sum of how long every run took. The instance time (column 3) is the average of each... we have +/-0.00 because we had enough iterations to remove the error bars on that averaging computation (the computation was correct for more than the two significant digits reported).
If I ran these tests repeatedly, I would expect that which of the three had the longest total test time would be random. If after many repetitions on the grid, one of them was consistently the worst, that would indicate that something in the feature itself is introducing the jitter -- i.e., it has the best average run time, but the worst worst-case time. In such a case, any given run has a small but non-zero chance of showing the jittery one as the worst average time just because it happened to jitter a lot. This is why we monitor performance night after night, even when there are no edits made to the code base.
It is subtleties like this that are the reason I don't trust benchmarks not done on our grid or one of the very few customers that I believe can take these things into account. This stuff is hard unless you have lots of training and/or the right tools. 🙂
Thanks for comming around and cleaning up that leaving no misconception about the relative performance of the SR/FBN
I asked the numbers get re-run to check LV 2014. Same results as 2015. I figured we should rule out that this wasn't something new.
When the FN came out it was faster. Then the claim is that the compiler team went in and specifically optimized the SR case for the LV2 global construct and the speed was a tossup from there out. Combined with the subsequent compiler optimizations like loop unrolling, it seems like it would take a pretty big blunder to have a speed mismatch between the two constructs.
Speaking of things that are likely to get me to distrust benchmarking results, was debugging really enabled during all of the tests?
The VIs have debugging enabled. It is optionally switched off by the grid.