Community Documents

cancel
Showing results for 
Search instead for 
Did you mean: 

On the "Std Deviation and Variance" VI

Overview

There is a potential trap in using the "Std Deviation and Variance.vi" of the Mathematics >> Probability & Statistics Palette.

Description

I have argued elsewhere that among other Math VIs, the Std Deviation and Variance.vi has a default setting that may not necessarily be the one the user may want to use. It is possible to use the VI without specifying the "Weighting" input, which is not a required input, and defaults to the "Sample" definition.

As illustrated in this example, the only useful option is "Sample", as the "Population" option does not really compute the "Population Variance", which formally would require knowledge of (and therefore inputting) the population mean.

The example generates N (default = 10) Poisson random variables with mean Lambda (default = 1) and computes the Variance of that sample using 3 methods:

- "Sample" option of the Std Deviation and Variance.vi

- "Population" option of the same VI

- True "Population" variance formula, using the known population mean (= lambda)

The mean is subtracted from these values and the results are stored and the process is repeated over and over again until you stop the VI.

The idea is that for a Poisson variable, "Variance = Mean", loosely speaking, therefore any difference is of interest.

Each set of values is histogrammed and the overall mean of these distribution is calculated (the <Mean - Variance> indicators at the bottom right).

In other words, unless for whatever reason you are interested in a biased result, you do not want to use the "Population" option (luckily, this is not the default). However, if you want to use the TRUE definition of the "Population" definition of the variance, you have to compute it yourself (for instance as shown in the attached VI), because Std Deviation and Variance.vi will be of no help to you.

Steps to Implement or Execute Code

  1. Run the VI
  2. Observe how the <Mean - Variance> of the "Sample" calculation is unbiased, as is the "True Population" calculation, while the "Population" option of the Std Deviation and Variance.vi gives a biased result ( = lambda/n).

Requirements

Software

LabVIEW 2013

Hardware

None

Additional Images or Video



Comments
X.
Trusted Enthusiast
Trusted Enthusiast
on

To make this example self-contained (and clarify some of the arguments of the NI discussion forum thread), I will briefly summarize the situation.

As of LabVIEW 2013, the "Std Deviation and Variance.vi" VI in the Probability & Statistics palette, takes two inputs (none nof which are required):

Variance VI.png

An array X (DBL) and an option telling the VI how to compute the variance. The VI is unlocked and pure G, therefore it is easy to verify what it does. Unfortunately, it DOES NOT do what it claim it does in the "code comment" pasted on the diagram.

Here is a copy of the definitions supposedly implemented in the code (only the standard deviation definitions are provided, but the variance is simply the square of that):

1. Sample std deviation:

Sample Std Deviation definition.png

2. Population std deviation:

Population Std Deviation Definition.png

Note the n - 1 in the first definition an the n in the second.

But note that this is not the only difference!

The sample definition uses the sample mean (x)

  • , while the population definition uses the population mean (μ).
  • The sample mean is what you can compute given a sample:

    Sample Mean.gif

    While the population mean... well, you need to provide it, as there is no way to compute it from a sample. Problem is, there is no "Population Mean" input.

    So what does NI do to compute what they call "Population Variance"?

    They use this formula:

    NI Population Variance.gif

    Note that it is neither the definition of the population variance showed in 2. above (it uses x not μ) nor that, of course, of the sample variance (shown in 1. above), which has a denominator of n - 1 (not n).

    In short, it is an "hybrid", which, as shown in the Wikipedia page on Variance (scroll down to the section that discusses the diffference between population and sample variance), and as illustrated in the example VI I have provided, results in a biased value.

    And as such is useless.

    And misleading.

    A sensible VI would provide an (optional) additional input called "Population Mean" (μ) that would be used when the "Weighting (Population)" option is chosen. If left wired AND "Weighting (Population)" is chosen, there would be two options: either return an error (not very graceful, but acceptable) or default to the Sample Variance calculation.

  • Obviously the bar should be above the x, not below... But that's all I am able to come up with using the limited editor provided.
  • X.
    Trusted Enthusiast
    Trusted Enthusiast
    on

    I recently noticed the presence of two "standard deviations" in Microsoft's Statistics Mode calculator:

    Screen Shot 2015-04-01 at 11.29.26.png

    The Help describes them as follows:

    Screen Shot 2015-04-01 at 11.30.34.png

    Based on the discussion above, you would expect that the sigma with a "n-1" index would be the sample standard deviation, not the population one (which should be the sigma with a "n" index).

    And in fact, the two results support these definitions, i.e., the sigma_n key returns the "population" standard deviation as "interpreted" by NI in their own implementation, that is they use the sample mean to do the computation WHICH IS NOT THE CORRECT WAY OF CALCULATING the population standard deviation (see above). And the sigma_n-1 key returns the sample standard deviation.

    So the Help file is naming those functions in reverse, and accessorily, one computation is incorrect.

    X.
    Trusted Enthusiast
    Trusted Enthusiast
    on

    Note added in proof:

    I recently started using the Skewness and Kurtosis.vi from the same palette.

    This VI also offers an option to compute the sample or population values of these statistics (of which there are additionally multiple definitions in the litterature, to not simplify anything).

    Since it is written in pure G, it is easy (though painstaking) to figure out what it does.

    And of course, in the "Population" calculation, the mean is calculated using the standard estimator (used for the sample definition).

    In other words, here again, the notion that the population mean NEEDS to be provided by the user in order for the computed quantity to have the correct statistical properties is missed.

    If this can be a consolation, the same confusion is shared by other programming languages as well...

    Sigh!

    Contributors