When analyzing a sample drawn from a population, the goal is often to infer the variability of the underlying distribution. The standard deviation is the primary metric for quantifying this dispersion, yet its estimation from limited data is frequently misunderstood. An unbiased estimator targets a specific mathematical property where the expected value of the statistic equals the true parameter, a concept that is crucial for valid statistical inference. For standard deviation, however, the journey toward unbiasedness is more complex than it is for the variance, involving subtle corrections that bridge the gap between the theoretical ideal and practical application.
Understanding the Core Challenge
The most common source of confusion lies in the difference between estimating the variance and estimating the standard deviation. The sample variance, calculated by dividing the sum of squared deviations by \(n-1\), is an unbiased estimator of the population variance. This adjustment, known as Bessel's correction, compensates for the fact that the sample mean is closer to the data points than the true population mean. While this provides an unbiased result for the squared quantity, the mathematics of the square root function introduces a downward bias when we take the square root to return to the original units of measurement.
The Jensen's Inequality Explanation
The theoretical foundation for this downward shift is Jensen's inequality, a fundamental concept in convex analysis. Because the square root function is concave, the expected value of the square root of a random variable is less than the square root of the expected value of that variable. Applying this to statistics means that \(E[\sqrt{s^2}] < \sqrt{E[s^2]}\). Since the sample variance \(s^2\) is unbiased for the population variance \(\sigma^2\), the square root of \(s^2\) will consistently underestimate the true standard deviation \(\sigma\). This inherent mathematical property ensures that the naive estimator is biased regardless of the sample size.
Correction Factors for Unbiased Estimation
To achieve an unbiased estimate of the standard deviation, statisticians apply a correction factor, often denoted as \(c_4(N)\), to the sample standard deviation. This factor is a function of the sample size \(N\) and is derived from the expected value of the chi distribution. The formula adjusts the raw standard deviation so that its expected value aligns with the population parameter. For small samples, the correction is substantial; for example, with \(N=3\), the factor is approximately 0.886, meaning the uncorrected standard deviation is roughly 11% too low. As the sample size grows, the factor approaches 1, making the bias negligible for large \(N\).
Sample Size (N) | c4(N) Factor | Approximate Correction
3 | 0.886227 | -11.4%
10 | 0.9727 | -2.7%
30 | 0.9914 | -0.9%
100 | 0.9993 | -0.07%