Missing Data Mechanisms

Understanding the mechanism behind missing data is crucial for selecting appropriate analytical methods.

MCAR

MAR

Missing At Random

Missingness depends on observed data but not on the missing values themselves. Requires careful handling.

——————————-

Example: In a study measuring cholesterol levels, older patients are less likely to attend follow-up blood tests. The missing measurements depend on the patients’ age which is observed, but not on the actual cholesterol values themselves.

MNAR

Statistical analysis method

Choose the right method based on your missing data mechanism and research context.

Listwise Deletion (Complete Case Analysis)

Listwise deletion, also known as complete case analysis, involves analysing only observations that have complete data on all variables included in the analysis. Any case with one or more missing values is excluded entirely.

Applicability by Missing Data Mechanism

  • โœ” Missing Completely at Random (MCAR)
    ๐ŸŸข Recommended
    Produces unbiased estimates, but reduces sample size and statistical power.
  • โš  Missing at Random (MAR)
    ๐ŸŸก Use with caution
    May introduce bias. Valid only if complete cases represent a random subset of the data.
  • โœ– Missing Not at Random (MNAR)
    ๐Ÿ”ด Not recommended
    Likely to produce biased estimates and should generally be avoided unless missingness is minimal.

Advantages

  • โœ” Simple to implement and available in all statistical software
  • โœ” Preserves observed relationships without introducing imputed values
  • โœ” Appropriate when data are MCAR and sample size is sufficiently large
  • โœ” No additional assumptions beyond the MCAR mechanism

Limitations

  • โš  Can substantially reduce sample size, leading to loss of statistical power
  • โš  Produces biased estimates when data are MAR or MNAR
  • โš  May result in unrepresentative samples if missingness is systematic
  • โš  Inefficient use of available information

Implementation Tips

โ„น Most statistical software applies listwise deletion by default. Analysts should verify whether complete case analysis is being used implicitly. Monitor the number of excluded cases to assess the impact on sample size and potential bias.

Pairwise Deletion (Available Case Analysis)

Pairwise deletion, also known as available case analysis, uses all available data for each individual analysis. Statistics are computed using all cases that have observed values for the specific pair of variables involved, rather than requiring complete data across all variables.

Applicability by Missing Data Mechanism

  • โœ” Missing Completely at Random (MCAR)
    ๐ŸŸข Acceptable
    Retains more data than listwise deletion and produces unbiased estimates.
  • โš  Missing at Random (MAR)
    ๐ŸŸก Use with caution
    May introduce inconsistencies. Bias depends on the pattern of missingness.
  • โœ– Missing Not at Random (MNAR)
    ๐Ÿ”ด Not recommended
    Likely to produce biased and inconsistent estimates.

Advantages

  • โœ” Uses more available data than listwise deletion
  • โœ” Maintains larger sample sizes for individual analyses
  • โœ” Can be more efficient when data are MCAR
  • โœ” Simple to implement in many statistical software packages

Limitations

  • โš  May produce correlation or covariance matrices that are not positive definite
  • โš  Different analyses may rely on different subsets of data
  • โš  Sample sizes vary across statistics, complicating interpretation
  • โš  Can lead to logically inconsistent results
  • โš  Standard errors are difficult to estimate correctly

Implementation Tips

โ„น Pairwise deletion must be explicitly specified in many software packages. For example, in R this can be set using use="pairwise.complete.obs". Use caution when applying this approach in multivariate models, and clearly document which observations contribute to each analysis.

Mean or Median Imputation

Mean or median imputation replaces missing values with the mean or median of the observed values for that variable. It is a simple single imputation approach often used for exploratory analyses.

Applicability by Missing Data Mechanism

  • โœ” Missing Completely at Random (MCAR)
    ๐ŸŸก Use with caution
    Can reduce bias in means but distorts variability and relationships.
  • โš  Missing at Random (MAR)
    ๐Ÿ”ด Not recommended
    Produces biased estimates and attenuated associations.
  • โœ– Missing Not at Random (MNAR)
    ๐Ÿ”ด Not recommended
    Likely to produce misleading results.

Advantages

  • โœ” Very easy to implement
  • โœ” Retains full sample size
  • โœ” Useful for quick descriptive summaries

Limitations

  • โš  Underestimates variability
  • โš  Distorts correlations and regression coefficients
  • โš  Treats imputed values as if they were observed
  • โš  Generally unsuitable for inferential analysis

Implementation Tips

โ„น This method should be limited to exploratory analyses. Avoid using it for hypothesis testing or modelling unless clearly justified.

Regression Imputation

Regression imputation replaces missing values using predictions from a regression model fitted to observed data. Each missing value is replaced with a single predicted value.

Applicability by Missing Data Mechanism

  • โœ” Missing Completely at Random (MCAR)
    ๐ŸŸข Acceptable
    Can recover mean structure but still underestimates uncertainty.
  • โš  Missing at Random (MAR)
    ๐ŸŸก Use with caution
    Depends strongly on correct model specification.
  • โœ– Missing Not at Random (MNAR)
    ๐Ÿ”ด Not recommended
    Likely to produce biased results.

Advantages

  • โœ” Uses relationships between variables
  • โœ” Retains full sample size
  • โœ” Simple extension of standard regression models

Limitations

  • โš  Underestimates variability and standard errors
  • โš  Overstates precision of estimates
  • โš  Does not reflect uncertainty in the imputed values

Implementation Tips

โ„น Regression imputation should generally be avoided for final inference. If used, clearly acknowledge its limitations and consider stochastic or multiple imputation alternatives.

Multiple Imputation (MI)

Multiple imputation replaces each missing value with several plausible values drawn from a predictive distribution. Analyses are performed on each imputed dataset and results are combined to reflect uncertainty due to missing data.

Applicability by Missing Data Mechanism

  • โœ” Missing Completely at Random (MCAR)
    ๐ŸŸข Recommended
    Produces unbiased estimates and valid inference.
  • โœ” Missing at Random (MAR)
    ๐ŸŸข Recommended
    Widely accepted as the preferred approach.
  • โš  Missing Not at Random (MNAR)
    ๐ŸŸก Use with caution
    Requires additional assumptions or sensitivity analyses.

Advantages

  • โœ” Accounts for uncertainty in missing values
  • โœ” Produces valid standard errors and confidence intervals
  • โœ” Flexible and widely supported
  • โœ” Suitable for complex models

Limitations

  • โš  Requires careful model specification
  • โš  Computationally more intensive
  • โš  Can be challenging to implement correctly

Implementation Tips

โ„น Include all variables related to missingness and the analysis model in the imputation process. Always check convergence and perform diagnostics.

Maximum Likelihood (ML) and Full Information Maximum Likelihood (FIML)

Maximum likelihood based approaches estimate model parameters directly using all available data without explicitly imputing missing values. FIML is commonly used in structural equation and longitudinal models.

Applicability by Missing Data Mechanism

  • โœ” Missing Completely at Random (MCAR)
    ๐ŸŸข Recommended
    Produces unbiased and efficient estimates.
  • โœ” Missing at Random (MAR)
    ๐ŸŸข Recommended
    Performs well when model assumptions are met.
  • โš  Missing Not at Random (MNAR)
    ๐ŸŸก Use with caution
    Requires explicit modelling of missingness.

Advantages

  • โœ” Uses all available information
  • โœ” No need to create imputed datasets
  • โœ” Statistically efficient

Limitations

  • โš  Relies on correct model specification
  • โš  Less flexible than multiple imputation in some settings
  • โš  Not available for all types of analyses

Implementation Tips

โ„น Ensure the analysis model is correctly specified and assess model fit carefully. Report assumptions clearly.

Sensitivity Analysis

Sensitivity analysis evaluates how results change under different assumptions about the missing data mechanism. It is particularly important when MNAR cannot be ruled out.

Applicability by Missing Data Mechanism

  • โš  Missing Completely at Random (MCAR)
    ๐ŸŸก Optional
    Not typically necessary under MCAR, but can provide additional confidence.
  • โœ” Missing at Random (MAR)
    ๐ŸŸข Recommended
    Good practice to verify that conclusions are robust to departures from MAR.
  • โฌ› Missing Not at Random (MNAR)
    ๐Ÿ”ต Essential
    Critical when MNAR is suspected. Helps quantify the impact of untestable assumptions.

Advantages

  • โœ” Makes assumptions explicit
  • โœ” Improves transparency and credibility
  • โœ” Helps decision makers interpret uncertainty

Limitations

  • โš  Does not identify the true missing data mechanism
  • โš  Results depend on chosen scenarios

Implementation Tips

โ„น Predefine sensitivity scenarios where possible and report results alongside primary analyses rather than as an afterthought.

Method MCAR MAR MNAR Key Notes
Listwise Deletion ๐ŸŸข Recommended ๐ŸŸก Caution ๐Ÿ”ด Not recommended Simple but can greatly reduce sample size
Pairwise Deletion ๐ŸŸข Acceptable ๐ŸŸก Caution ๐Ÿ”ด Not recommended Uses more data but may give inconsistent results
Mean or Median Imputation ๐ŸŸก Caution ๐Ÿ”ด Not recommended ๐Ÿ”ด Not recommended Distorts variability and relationships
Regression Imputation ๐ŸŸข Acceptable ๐ŸŸก Caution ๐Ÿ”ด Not recommended Underestimates uncertainty
Multiple Imputation ๐ŸŸข Recommended ๐ŸŸข Recommended ๐ŸŸก Caution Preferred general purpose approach
Maximum Likelihood / FIML ๐ŸŸข Recommended ๐ŸŸข Recommended ๐ŸŸก Caution Efficient if model is correctly specified
Sensitivity Analysis ๐ŸŸก Optional ๐ŸŸข Recommended ๐Ÿ”ต Essential Assesses robustness to assumptions; critical under MNAR