What does the normality assumption mean?

The normality assumption states that the random component of a statistical model follows a normal distribution.
In most applied analyses, this refers to the distribution of model residuals rather than the observed outcome itself.

A normal distribution is symmetric and bell shaped, with most observations concentrated around the mean and progressively fewer observations in the tails.

Statistical methods that rely on the normality assumption

Many commonly used statistical methods rely on normality, primarily through assumptions about model errors or joint distributions.

Regression models

  • Linear regression
  • Analysis of covariance models
  • Linear mixed effects models

Mean comparison tests

  • One sample, two sample, and paired t tests
  • Analysis of variance models
  • Repeated measures analysis of variance

Multivariate methods

  • Multivariate analysis of variance
  • Discriminant analysis
  • Canonical correlation analysis

Other parametric methods

  • Principal component analysis under normal theory inference
  • Factor analysis using maximum likelihood estimation
  • Structural equation modelling

Univariate and multivariate normality

Univariate normality concerns the distribution of a single variable or residual.
This is the form of normality most commonly assessed in practice.

Multivariate normality concerns the joint distribution of two or more variables.
Many multivariable methods depend on this assumption even when individual variables appear approximately normal.

Apparent univariate normality does not guarantee multivariate normality, and failure to recognise this distinction
can invalidate multivariable inference.

How to assess normality

Normality assessment should prioritise graphical diagnostics over formal hypothesis tests.
This approach is discussed in detail in
Assessing univariate and multivariate normality. A guide for non statisticians.

Graphical methods

  • Histograms with normal overlays
  • Quantile quantile plots
  • Residual diagnostics
  • Scatter and contour plots for multivariate assessment

Formal statistical tests

  • Shapiro Wilk test
  • Kolmogorov Smirnov test
  • Anderson Darling test

Formal tests should be interpreted cautiously due to their sensitivity to sample size.

Consequences of violating normality

  • Incorrect type I error rates
  • Reduced statistical power
  • Misleading confidence intervals

The severity of these consequences depends on the magnitude of deviation, sample size, and analysis method.

Alternatives when normality is violated

Data transformation

  • Log transformation
  • Square root transformation
  • Box Cox transformation

Non parametric methods

  • Wilcoxon signed rank test
  • Mann Whitney U test
  • Kruskal Wallis test

Model based alternatives

  • Generalized linear models
  • Robust regression
  • Bootstrap based inference

Practical guidance

  • Focus on residuals rather than raw data
  • Use graphical diagnostics first
  • Consider sensitivity analyses

Key references

  • Oppong FB, Agbedra SY. Assessing univariate and multivariate normality. A guide for non statisticians. Mathematical Theory and Modeling. 2016;6(2):26–33.
  • Altman DG. Practical Statistics for Medical Research.
  • Harrell FE. Regression Modeling Strategies.

Summary

Normality is a model based assumption that applies to residuals and joint distributions rather than raw outcomes.
Understanding which methods rely on this assumption and how to assess it appropriately is essential for valid statistical inference.