- Not merely making the commonplace observation that any particular threshold is arbitrary
- For example, only a small change is required to move an estimate from a 5.1% significance level to 4.9%
- Statistical significance is not the same as practical importance, dichotomization into significant and non significant results encourages the dismissal of observed differences in favor of the usually less interesting null hypothesis of no difference, and any particular threshold for declaring significance is arbitrary
- Bring attention to this additional error of interpretation
- Students and practitioners be made more aware that the difference between “significant” and “not significant” is not itself statistically significant
- Changes in statistical significance are not themselves significant
- Introductory courses regularly warn students about the perils of strict adherence to a particular threshold such as the 5% significance level
- Automatic use of a binary significant/non significant decision rule encourages practitioners to ignore potentially important observed differences
- Focus only on the less widely known but equally important error of comparing two or more results by comparing their degree of statistical significance
- We might think that “everybody knows” that comparing significance levels is inappropriate, but we have seen this mistake all the time in practice
- In making a comparison between two treatments, one should look at the statistical significance of the difference rather than the difference between their significance levels
- comparisons of the sort, “ X is statistically significant but Y is not,” can be misleading
Two examples of errors
Homosexuality and the number of older brothers and sisters
- In these data, homosexuality is more strongly associated with number of older brothers than with number of older sisters. However, no evidence is presented that would indicate that this difference is statistically significant
Health effects of low frequency electromagnetic fields
- The researchers in the chick-brain experiment made the common mistake of using statistical significance as a criterion for separating the estimates of different effects, an approach that does not make sense. At the very least, it is more informative to show the estimated treatment effect and standard error at each frequency