General Histogram.vi is way too slow

X. · ‎09-06-2012

I had noticed it years ago and had written my own CIN to speed up things.

Recently, I migrated some of my code from a 32 bit machine to a 64 bit one and of course that broke my CINs. Rather than reinstalling a compiler on the new machine and recompiling thing (high maintenance project anyhow), I decided to check out the dreaded General Histogram.vi, hoping that by now (LV 2012), it may have been optimized.

Well it has not and is in fact about 2 orders of magnitude (100 times) slower than my basic C-code. Sure, I am histogramming large arrays (1-10M), but still...

I transcribed my basic C code into G as such:

and was happily surprised to find out that the G-code is in fact just as fast if not even faster than my old CIN.

Note that I do not pretend that the code above does exactly what the General Histogram.vi is doing, or is optimized, fool-proof or a model of G programming. I was in a rush and may get back to it later.

Needless to say, I have my solution for a 32 bit to 64 bit migration, but I wonder why the General Histogram.vi is so inefficient.

Please NI, have a look into it.

X. · ‎09-08-2012

This is open to discussion but the attached (LV 2012) VIs show, I think, how bad the situation is.

Again, my main complaint is that the General Histogram.vi (GH) is not efficient when performing a simple (and quite general IMO) type of histogramming task, namely histogramming data for which a min and max value of the histogram and the bin size are provided (pretty much what I do exclusively).

To do that with the GH, one has to compute the number of bin N = (max - min)/bin size, which is an acceptable additional step. The code then builds an array of bin boundaries (which is another -optional- input) and then proceeds with TESTING FOR EACH INPUT ARRAY VALUE which bin it belongs to (at least this is my interpretation of the steps described in the help). That is way too inefficient and comes at an humongous cost (see numbers below).

The workaround is simple enough (use something like the Double Histogram (G).vi I am including in the project, or equivalent), but it might affect unsuspecting new LabVIEW users in a very detrimental way (and add to the misperception that LV is slow).

Benchmark results (Total execution time for # Loops on a Dell Precision PWS 380, 3.4 GHz 2-core pentium, Windows XP)

# Elements Bin Size # Bins # Loops GH Simple G Histogram.vi

1E5 0.1 10 1E3 25 s 4.5 s 5 s

1E5 0.01 100 1E3 175 s 4.2 s 4.9 s

1E6 0.1 10 1E3 252 s 51 s 51 s

1E6 0.01 100 1E3 1,727 s 48 s 49.9 s

As can be seen, the Simple G code performance only depends on the number of elements, not on the number of bins (as it should, since the number of operations is independent on the number or size of bins), whereas the General Histogram.vi does (as it should if it does what I suspect it does, even if that is intelligently, that is by sorting the bins and doing an optimized search).

Note that the simpler "Histogram.vi" is not satisfactory (although it performs about as fast as the Simple G VI in this benchmark -but is calling a DLL), as it does not allow providing a min and max value and a bin size (those are automatically determined by the min and max value of the input array and the number of requested bins - I am doing just that for the "Simple G" benchmark, but in general, you want more flexibility).

My suggestion to NI would be to modify the GH so that if the user doesn't provide any "Bins" array or "inclusion" inputs, they switch to an optimized version. Or alternatively, they modify "Histogram.vi" to allow for a min and max value and bin size inputs (there is now way to work around the limitation of the current "Histogram.vi" so there is no other options than using the GH). At the very least, mention in the GH help that the computation cost is scaling something like logarithmically with the number of bins (and of course linearly with the number of input elements)!

If I may add one thing: I would suggest to have the Array input polymorphic (as I wsa doing with my CIN versions), as users' arrays can be pretty large (and thus casting them might be costly and time consuming).

Note: I am using maximum optimization by default

LabVIEW

General Histogram.vi is way too slow

General Histogram.vi is way too slow

Re: General Histogram.vi is way too slow