From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.
We appreciate your patience as we improve our online experience.
From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.
We appreciate your patience as we improve our online experience.
at the end i will explain what the ellipse means.
recipe 1
x
andy
, which are the coordinates of the points in your scatterplot. from those columns, calculate the following variables:sumx = sum(x)
sumy = sum(y)
sumxx = sum(x*x)
sumyy = sum(y*y)
sumxy = sum(x*y)
n = number of points
xbar
,ybar
) and the variance and covariance:xbar = sumx/n
ybar = sumy/n
varx = sumxx/n
vary = sumyy/n
covarxy = sumxy/n
dx
anddy
where:dx = x-xbar
dy = y-ybar
sumdxdx = sum(dx*dx)
sumdydy = sum(dy*dy)
sumdxdy = sum(dx*dy)
theta = 0.5 * arctan(2*sumdxdx / (sumdydy*sumdxdx))
which is the angle that the ellipse is "rotated" from the horizontal, and also:c = cos(theta)
s = sin(theta)
X
andY
(if you can't use capitals change the names!) which should be the same scatterplot, but with the rotation removed:X = c*dx - s*dy
Y = s*dx + c*dy
sumXX = sum(X*X)
sumYY = sum(Y*Y)
varX = sumXX/n
varY = sumYY/n
a = sqrt(varX)
b = sqrt(varY)
xbar
,ybar
, and is rotated by the angletheta
. see the explanation below for what this means and how to generate other (larger) ellipses.using just the
x
andy
columns, and appropriate formulae above, you can calculate everything using:vardx = varx - xbar*xbar
vardy = vary - ybar*ybar
covardxdy = covarxy - xbar*ybar
varX = c*c*vardx - c*s*covardxdx + s*s*vardy
varY = s*s*vardx + c*s*covardxdy + c*c*vardy
traditional statistics often assumes that noisy data is distributed as a "gaussian" or "normal" distribution (this is justified by the famous "central limit theorem" that says you get this distribution when life is complicated). the process above is equivalent to fitting a model for that distribution when it describes two, correlated variables (x and y). the final values
a
andb
are the "standard deviations" of the underlying, uncorrelated distribution (ie with the rotation removed).section 15 of numerical recipes describes things in more detail (especially section 15.6).
if your data really do follow this distribution then a and b give the relative sizes of your ellipse. you can scale that ellipse to include any given fraction of the distribution.
disclaimer - i am not 100% certain about this next bit: in particular, the usual ellipse plotted is one that contains 68% of the data (it's a convention), which you would get by multiplying a and b by the square root of 2.3 (see the reference above to numerical recipes).
posted by andrew cooke at 6:44 AM on April 14, 2006