Statistics on the fly

CoastalMaineBird · ‎06-20-2014

I have 200+ channels coming in continuously, at a frame rate of 10 Hz.

Client requires an AVERAGE value of each channel, between a START time and a STOP time, possibly several minutes apart.

What I do is set a COUNT to 0, and clear a 200-chan SUM[ ] buffer at START time.

For each sample, if the averaging is now in progress, I add the current sample[ ] to the SUM[ ] buffer and increment the count.

Sometime after the STOP time the average is required.

So I take the SUM[ ] and divide by N, and that's the average for 200 channels.

The only memory required is for 200 channels, regardless of the duration. I don't need to keep evary sample around.

That works just fine.

Now the client wants to add MIN, MAX, StdDev, and Variance to the list of stats needed.

MIN and MAX are easy: I just compare each sample[ ] to the existing MIN[ ] and MAX[ ] arrays and keep the smaller and larger.

But the definition of variance is SUM(Xi - Mean)^2 / N. (StdDev is the square root of that).

Doesn't that mean that I have to have every single sample in hand when it's time to compute it?

I can't process X(i) - Mean until I know what the MEAN is, but I don't know that until the end..

Any way to avoid storing every single sample?

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

aputman · ‎06-20-2014

Look at the point by point functions in the Signal Processing pallete. That should give you what you need.

aputman
------------------
Heads up! NI has moved LabVIEW to a mandatory SaaS subscription policy, along with a big price increase. Make your voice heard.

CoastalMaineBird · ‎06-20-2014

I saw those, but it looks to me like:

1... The StD Dev one needs a SAMPLE LENGTH input. I don't know the length until I'm done.

2... It appends each sample to an internal array, which I would like to avoid.

Am I missing something?

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

aputman · ‎06-20-2014

Use 0 for the sample length and it runs until you click stop.

You have to maintain the values somewhere if you want to obtain stats over your whole data set.

aputman
------------------
Heads up! NI has moved LabVIEW to a mandatory SaaS subscription policy, along with a big price increase. Make your voice heard.

JÞB · ‎06-20-2014

Try Varience PtByPt.vi set samples to 0 (no array internal) mean shows up on an output too and its only a sqrt to get to stddev from varience. Of course StdDev PtbyPt adds the sqrt for you and still no array intenal with sample =0

"Should be" isn't "Is" -Jay

aputman · ‎06-20-2014

Here is a reentrant VI that I made real quick. I don't know if reentrancy is the best way to handle 200 channels (probably not).

aputman
------------------
Heads up! NI has moved LabVIEW to a mandatory SaaS subscription policy, along with a big price increase. Make your voice heard.

DSPGuy · ‎06-20-2014

Here is an alternate algorithm for mean and variance that does not keep any sample history. It is the Knuth ( from Welford) algorithm from https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance.

-Jim

altenbach · ‎06-20-2014

These discussion always remind me of something entirely different:

(well, it late Friday afternoon..... :D)

LabVIEW Champion.

altenbach · ‎06-20-2014

@DSPGuy wrote:

Here is an alternate algorithm for mean and variance that does not keep any sample history. It is the Knuth ( from Welford) algorithm from https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance.

Here is the corrected link (removed the trailing decimal point).

This probably similar to the algorithm implemented in the ptbypt variance under the condition of "infinite horizon" (set sample lenght=0 as already mentioned by others). No programming needed.

(In any case, I would do a custom reentrant subVI based on Jim's demo that also includes min/max and all the other metrics you want).

If the 200+ channels come in as an array, you could get away with a single subVI that maintains an array of 200+ means, variances, etc., one for each channel. Probably more efficient and more scalable. Just use Jim's, but replace all scalars with arrays. Good luck!

LabVIEW Champion.

Intaris · ‎06-23-2014

Just adding my voice to the discussion:

Under this link (search for the "online algorithm" part) there are some neat ways to calculate the Standard deviation of essentially unknown sample sizes online (with minimal memory footprint).

There are several methods provided, each with slightly different characteristics when it comes to rounding errors, but I've used these to great effect in the past. You can also generate a running standard deviation with only a little effort and use this as a system variable.

Shane

D'OH. I see I just replicated a link Altenbach already mentioned. Still, there are worse things in the world than copying the great wizard.

LabVIEW

Statistics on the fly

Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly

Re: Statistics on the fly