LabVIEW Idea Exchange

usrfrnly · ‎04-20-2016

I only mean that this should apply to the sub vi's that come with LabVIEW. I was putting together a vi that is execution time sensitive. I had a choice between the IMAQ Histogram and IMAQ Histograph. I could get the result i needed from wither one but I was forced to try each, run a few times, and clock each one. There are many such "which of these two similar options is fastest" choices we make for every program and knowing which upfront would be very helpful.

AristosQueue (NI) · ‎04-21-2016

usrfrnly:

I'm going to lay out a bit of backstory and then ask you some questions to see if you have ideas on how best to present this kind of information.

Benchmarking is hard. Any absolute number of "this function will run X milliseconds faster than this other function" is impossible because of the differences in CPUs, so any discussion of benchmarks gets into relative performance. But even those relative numbers are becoming less reliable. The classic example of relative performance is the algorithms for sorting arrays. Suppose n is the number of elements that need to be sorted. Merge Sort is measured as both worst case and best case perfomance n log n. How does that compare against the Insertion Sort whose worst case is n*n but whose best case is n? Turns out that if your data is already sorted and you're just adding one new element, then the Insertion Sort wins, but if your data is mostly unsorted, Merge Sort wins. Generally. But there are other algorithms with other tradeoffs. So when someone asks, "What is the best sort algorithm that I should use?" they have to consult a table like this one (search for "Array Sorting Algorithms" in this page).

All of that is just for the basic mathematics of an algorithm on an ideal Turing Machine. Once real-world hardware and operating systems get invovled, the problem is harder. Here at NI, we have our performance grid. It is a bank of 30 identical machines, absolutely as identical as we can get the hardware and OS to be. We do performance testing of LV every night in case any changes have occured. But between any two machines, we can see wildly different performance characteristics depending on things that aren't under LV's control: which sector on a spinning harddrive LV gets installed can have a huge impact on load -- 80% time swings! Strong argument for SSD drives! How fragmented memory becomes depends on OS scheduling and that changes performace profiles. So we take averages across our 30 machines to be able to say that, in general, a given change has hurt or helped performance.

I do not know the particulars of the Histograph and the Histogram. I presume that there is a reason that both are included in the IMAQ tool suite. So my guess is that they have different configuration options, which probably means that their performance characteristics differ depending upon configuration options and the type of data going into them.

Given all of that, you can I hope understand why specifying "this is going to have best performance" is very hard to record generally. That does not mean we should not try, but it does mean that any guidance we give may be wrong for your application.

So, my questions for you:

1. If we had some sort of lookup table for this kind of question, what kind of input could you supply? For example, do you know the memory fragmentation that your application is causing? Could you quantify that enough to use it as one axis of a table? Can you think of other application aspects that could impact system performance that you would be able to specify?

2. Suppose we published a table that said something like, "This is generally the best one but not on all targets/systems." How useful would that be to you? For the really intensive systems, you would still have to do all the same benchmarking. Would such a wishy-washy table save you time in less intensive situations?

3. What format would you want for that information? I linked to the big-O algorithm page earlier. That notation is very clear to me, but that's because I've had years of computer science courses explaining what goes into a big-O computation. Is that notation useful to you or is there some other notation that you'd prefer? How far would we have to go with getting the computer science out of that sort of table to make it useful to our users?

These aren't easy questions to answer.

usrfrnly · ‎04-21-2016

Thank you for taking the time to make such a deeply reasoned and well explained reply. Perhaps I shouldn’t have posted the idea without fleshing it out or at least explaining it a little more but I thought that, by putting it in the wild, it would end up better vetted by the wisdom of the masses than the wisdom – such that it is - of yours truly and I think your reply proves that theory. You raise very good questions that I honestly cannot answer but please allow me to redirect the conversation slightly in the hopes of getting a good, implementable idea for the exchange here.

First, I only intend for this to apply to the functions and canned sub vi’s that NI provides with LabVIEW – the ones that you can place on the block diagram from day 1.

What would be the best figure of merit for comparison of vi execution cost\complexity?

The above reply points out how misleading execution time can be because of the variables outside of the LabVIEW environment. Does this make execution time a bad metric or does, by its inclusion of so many variables make it more relevant? I would argue that listing a range of observable execution times on some base platform would be very informative. Maybe you list two ranges, one for SSD, one for HD. This might be of interest for programs that are more sensitive to consistency. There’s probably a wealth of information for NI in that as well. No one will expect these to be exact numbers and I believe that the value to the developer isn’t in the reported number itself, but in the relative magnitudes.

Alternatively, CPU cycles might be a better metric for comparing vi execution cost\complexity. Honestly, I didn’t use that in my original post because, in my ignorance, I thought that would be the harder and more wildly variable than execution time.

Ultimately, I do not know what he best metric is but, when there are four ways to accomplish the same task, it would be nice to have some idea of which has the highest cost.

Intaris · ‎04-21-2016

Would a middle-way be an option?

What I find myself doing for many of our time-critical RT code is that I create a benchmark utility which allows me to switch on /off certain parts of the code and allows me to investigate the effects of small differences in code.

NI already does a whole ream of benchmarking on their "standardised" machines. How about giving access to some "standard" benchmarks with a caveat that results may vary greatly from machine to machine. Let's face it, a lot of LabVIEW users use their own programs from within the IDE so machine variability is not a huge problem for these users.

Sounds like it would be opening a huge can of worms.

usrfrnly · ‎04-21-2016

Anything would be an option. What we got now is bupkis so anything is a big step forward. To me, access to the "standard" benchmarks with a big YMMV would be AWESOME!

Intaris · ‎04-21-2016

It would be cool if NI could then benefit from X thoudsand users running the benchmark VIs but I fear the effort of qualifying the results may significantly outweigh the benefits.

AristosQueue (NI) · ‎04-21-2016

> How about giving access to some "standard" benchmarks with a caveat that results may vary greatly from machine to machine.

I think we'd be fine with that, just I would not have guessed that was useful information to publish.

> What we got now is bupkis so anything is a big step forward.

If you're sure... I would have assumed (prior to this conversation) that bad data (by which I mean data that doesn't directly apply to your timing situation) would be anti-helpful by making you think you could skip your own performance testing.

Note that wholesale publishing such numbers isn't something NI could do quickly. We don't benchmark most individual VIs, just the key ones that we know are used frequently and whose code tends to shift. We also benchmark several large applications to check for overall system performance slips. Individual VIs generally get tested on the grid only if an individual developer has a reason to want to tune something. We would have to proactively produce numbers for relevant functions.

It might be better to narrow the scope of this request to specific functions that have alternate implementations, like "Please provide benchmarks for Histogram and Histograph specifically." Are there other pairs of nodes that are reasonable substitutes for each other? Of course, as soon as I say that, I realize we're going dangerously close to the "which of the 30+ communication technologies should I use for my VI?" question that has been the subject of much discussion over the last five years. For those who haven't heard, some years ago, an NI developer cataloged all the different LV APIs for sending data from one place to another and tried to provide some guidance on when to use one versus the others. He came up with at the time 22 APIs. Since then, more have been found and added to the list. Each has a reason for existing. Whether those reasons are good reasons is an open question. 🙂

> It would be cool if NI could then benefit from X thoudsand users running the benchmark VIs but

> I fear the effort of qualifying the results may significantly outweigh the benefits.

I've wondered about that myself, but, as I said, the more I've learned about performance testing, the more I've learned that we only seem to be able to tune meaningfully for the broad general cases, and any apps that need specific tuning for extreme high speed have to do their own evaluations anyway.

Darren · ‎05-01-2017

Any idea that has not received any kudos within a year after posting will be automatically declined.

LabVIEW Idea Exchange

Show average or relative execution time of canned sub vi's in menu