Beware the F-test (or, how to compare variances).
The authors thank Van Valen and Miller for their inspirational previous work on this topic and the referees who helped us clarify the submission significantly. DHod is supported by NERC standard grant NE/L007770/1 and by NERC International Opportunities Fund NE/N006798/1 and DHos by the Leverhulme Trust (RF-2015-001).
Reason for embargo
This is the author accepted manuscript. It is currently under an indefinite embargo pending publication by Elsevier Masson.
Biologists commonly compare variances among samples, to test whether underlying populations have equal spread. However, despite warnings from statisticians, incorrect testing is rife. Here we show that one of the most commonly employed of these tests, the F-test, is extremely sensitive to deviations from Normality. The F-test suffers greatly elevated false-positive errors when the underlying distributions are heavy-tailed, a distribution feature which is very hard to detect using standard Normality tests. We highlight and assess a selection of parametric, jackknife and permutation tests, consider their performance in terms of false positives, and power to detect signal when it exists, then show correct methods to compare measures of variation among samples. Based on these assessments, we recommend using Levene’s Test, Box-Anderson Test, Jackknifing or Permutation Tests to compare variances when Normality is in doubt. Levene’s and Box-Anderson tests are the most powerful at small sample sizes, but the Box-Anderson test may not control Type I error for extremely heavy-tailed distributions. As noted previously, do not use F-tests to compare variances.
This is the author accepted manuscript.
Awaiting citation and DOI
- Biosciences