normality - Global Statistical Consult

What does the normality assumption mean?

The normality assumption states that the random component of a statistical model follows a normal distribution.
In most applied analyses, this refers to the distribution of model residuals rather than the observed outcome itself.

A normal distribution is symmetric and bell shaped, with most observations concentrated around the mean and progressively fewer observations in the tails.

Statistical methods that rely on the normality assumption

Many commonly used statistical methods rely on normality, primarily through assumptions about model errors or joint distributions.

Regression models

Linear regression
Analysis of covariance models
Linear mixed effects models

Mean comparison tests

One sample, two sample, and paired t tests
Analysis of variance models
Repeated measures analysis of variance

Multivariate methods

Multivariate analysis of variance
Discriminant analysis
Canonical correlation analysis

Other parametric methods

Principal component analysis under normal theory inference
Factor analysis using maximum likelihood estimation
Structural equation modelling

Univariate and multivariate normality

Univariate normality concerns the distribution of a single variable or residual.
This is the form of normality most commonly assessed in practice.

Multivariate normality concerns the joint distribution of two or more variables.
Many multivariable methods depend on this assumption even when individual variables appear approximately normal.

Apparent univariate normality does not guarantee multivariate normality, and failure to recognise this distinction
can invalidate multivariable inference.

How to assess normality

Normality assessment should prioritise graphical diagnostics over formal hypothesis tests.
This approach is discussed in detail in
Assessing univariate and multivariate normality. A guide for non statisticians.

Graphical methods

Histograms with normal overlays
Quantile quantile plots
Residual diagnostics
Scatter and contour plots for multivariate assessment

Formal statistical tests

Shapiro Wilk test
Kolmogorov Smirnov test
Anderson Darling test

Formal tests should be interpreted cautiously due to their sensitivity to sample size.

Consequences of violating normality

Incorrect type I error rates
Reduced statistical power
Misleading confidence intervals

The severity of these consequences depends on the magnitude of deviation, sample size, and analysis method.

Alternatives when normality is violated

Data transformation

Log transformation
Square root transformation
Box Cox transformation

Non parametric methods

Wilcoxon signed rank test
Mann Whitney U test
Kruskal Wallis test

Model based alternatives

Generalized linear models
Robust regression
Bootstrap based inference

Practical guidance

Focus on residuals rather than raw data
Use graphical diagnostics first
Consider sensitivity analyses

Key references

Oppong FB, Agbedra SY. Assessing univariate and multivariate normality. A guide for non statisticians. Mathematical Theory and Modeling. 2016;6(2):26–33.
Altman DG. Practical Statistics for Medical Research.
Harrell FE. Regression Modeling Strategies.

Summary

Normality is a model based assumption that applies to residuals and joint distributions rather than raw outcomes.
Understanding which methods rely on this assumption and how to assess it appropriately is essential for valid statistical inference.