# VARIANCE in IDL

• Subject: VARIANCE in IDL
• From: ashmall(at)my-dejanews.com (Justin Ashmall)
• Date: Tue, 23 Feb 1999 12:11:54 GMT
• Newsgroups: comp.lang.idl-pvwave
• Organization: Imperial College
• Xref: news.doit.wisc.edu comp.lang.idl-pvwave:13705

```Dear All,

I have a question regarding the variance as calculated by IDL - I expect to
get thoroughly flamed by some statistician types but I'm keen to know if I'm
wrong!

I always thought the definition of variance was the mean of the squares of the
differences from the mean, i.e.:

VARIANCE = { SUM [ (x - mean_x)^2 ] } / N

and this is what I *thought* I was getting from IDL - it wasn't until I was
testing a prog to calculate the means and variances of rows and columns of an
array that I spotted that IDL's variance has N-1 as the denominator:

VARIANCE = { SUM [ (x - mean_x)^2 ] } / N-1

Now I realise the latter ( let's call it Var(n-1) ) is the best estimate of
the variance of the overall population, if my data is a sample from that
population, but that's not what I want (or expect) from the variance function.

More worrying is the fact that this isn't mentioned in any way in the on-line
help for the VARIANCE function (although the equation does appear in the help
on the MOMENT function). Perhaps a keyword to the function would be in order
so you could select if you wanted "population estimate" or "sample" variance
at the very least.

A simple example is given calculating Var(n) and Var(n-1) on the numbers
1,2,3,4,5. The mean is obviously 3 but I would say the variance is 2.0
(Var(n)), not 2.5 as given by IDL (Var(n-1)).

I'd be interested to hear if my definition of variance is correct and whether
other people made the same assumption regarding variance as myself.
Incidentally, I use IDL 5.1.1.

Thanks,

Justin

```