Wednesday, March 13, 2013

Covariance of Samples and Populations

Recently I was asked by a customer why the covariance values obtained by the function CalcCovar of the SDL Suite deviate from the covariance values calculated by the function COVAR of Microsofts Excel(tm).

The explanation for the difference is quite simple: Excel assumes that you are dealing with populations, while the SDL Suite assumes that you are working with samples (which is much more realistic).

In order to convert the covariance of a population to the covariance of a sample you have to multiply the covariance by a factor of n/(n-1), with n being the number of values used for the calculation.

Let me add a personal remark: I never use the statistics package of Excel as it has many built-in flaws and peculiarities which guide the inexperienced or unwary user into the wrong direction.

To be specific: in many cases Excel assumes that a set of, for example, five measurements describes a population and not a sample - an assumption which is rather daring and will mislead millions of world-wide users of this package. However, Microsoft is working to improve on these issues (see the Office blog)....

No comments:

Post a Comment