Mendelian Randomization

What is Mendelian Randomization

= a statistical and Causal inference#Instrumental Variables (IV) method used to assess causal relationships between a risk factor (exposure) and an outcome
It is often used in biomedical and social science research
It can be univariable or multivariable

How does it work

Intuition (why "Mendelian")

Because genes are randomly assigned at conception (following Mendel's laws), they are generally independent of confounders.
-> If the genetic variants associated with the exposure are also associated with the outcome, this provides evidence for a causal effect.

The logic is similar to a “natural randomized controlled trial”.

Step by step

Identify instrumental variables (IVs)

select genetic variants (usually SNPs) that are strongly associated with the exposure.
mathematically, for each SNP $G$ and exposure $X$ :

X = α + γ G + ϵ_{X}

where $γ$ represents the effect of the SNP on the exposure.

Check independence from confounders

ensure that $G$ is not associated with confounders $C$ of the exposure-outcome relationship:

C o v (G, C) = 0

Estimate the effect of exposure on outcome

use the genetic instrument to estimate the causal effect $β$ of the exposure $X$ on outcome $Y$ :

Y = β X + ϵ_{Y}

since $X$ is partially determined by $G$ , the instrumental variable (IV) estimate of $β$ can be computed as:

{\hat{β}}_{I V} = \frac{C o v (G, Y)}{C o v (G, X)}

Interpret causal effect
- if ${\hat{β}}_{I V}$ is significantly different from zero, it suggests a causal effect of the exposure on the outcome.
- this estimate is less likely to be biased by confounding or reverse causation due to the random allocation of genes.

Key Assumptions

Relevance: genetic variants are associated with the exposure
Independence: genetic variants are independent of confounders
Exclusion restriction: genetic variants affect the outcome only through the exposure

Advantages

reduces confounding compared to observational studies
can help distinguish correlation from causation
useful for exposures that cannot be randomized experimentally

Limitations

weak instruments can bias results
pleiotropy (genetic variants affecting multiple traits) can violate assumptions
requires large sample sizes for sufficient statistical power