Shapiro-Wilk normality test
data: rnorm(60)
W = 0.98675, p-value = 0.7603
Julia Piaskowski
February 24, 2026
“My data is not normal. I did a Shapiro-Wilk test on the data and it failed that test. What transformation should I use?”
\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]
\(Y_i\) = dependent variable (there is only 1)
\(X_i\) = independent variable(s) (there may be many)
\(B_0\) = model intercept, the overall mean of \(Y\)
\(B_1\) = how \(Y\) changes with \(X\)
\(\epsilon_i\) = model residual, the gap between the predicted value for \(Y_i\) and its observed value
\[ \epsilon_i \sim N(0, \sigma)\]
The residuals are normally distributed with a mean of zero and some standard deviation that we will estimate during the model-fitting process.
All model residuals share this same distribution and distributional properties! That is, they are identically distributed.
The residuals are independent of each other (i.e. they are uncorrelated with each other)
The residuals should have no correlation with other model parameters (e.g. treatment effects)
Technically in a linear model, \(\bar{y_{i.}}\), the mean value for a population, follows a normal distribution for a set of known parameters:
\[ \bar{y_{i.}} \sim 𝑁(x_i \beta, \sigma^2/r_i) \]
We often cannot observe this distribution well with our limited number of reps. But we if repeated a study an infinite number of times, a normal distribution for \(\bar{y_{i.}}\) would likely be observed.
Often, the residuals are normally distributed, and hence model expectations are met!
Linear models are considered robust to modest departures from normality; that is, the estimates will be fine and the results from ANOVA will be correct.
The Shapiro-Wilk test is not great for asessing normality; it’s very sensitive to the size of the data set; large data sets may fail just based on sampling variation, small data sets pass because there are too few observations to adequately test.
The function rnorm() can be used to generate random values that follow a normal distribution, that we can test for normality using shapiro.test().
Shapiro-Wilk normality test
data: rnorm(60)
W = 0.98675, p-value = 0.7603
In probability theory, the central limit theorem states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution.
–Wikipedia
]
Non-parametric hypothesis tests (No distributional assumptions)
wilcox.test()kruskal.test()wilcox.test(..., paired = TRUE)Calculate the test statistic from your data (e.g. difference in treatment means).
Simulate data in which the test statistic is null (e.g. the difference in treatment means is zero).
Find out how often your test statistic occurs in the simulated data; the probability of this occurring is your p-value.
sim_func <- function(){
N = nrow(Alfalfa)
Alfalfa$yield2 <- sample(Alfalfa$Yield, size = N, replace = FALSE)
try(perm_mod <- lme(yield2 ~ Variety + Date,
random = ~1|Block/Date,
data = Alfalfa))
if(exists("perm_mod")) return(perm_mod)
}
perms <- replicate(500, sim_func(), simplify = FALSE)
perm_anova <- lapply(perms, anova, type = "marginal") |> # Conduct anova
dplyr::bind_rows() |> # some data formatting
tibble::rownames_to_column(var = "variable") |>
mutate(variable = stringr::str_extract(variable, "[A-Za-z]+")) |>
dplyr::filter(variable != "Intercept") # remove the intercept (optional step) variable numDF denDF F-value p-value
1 Variety 2 46 1.076281 0.3492884
2 Date 3 15 1.607674 0.2294616
3 Variety 2 46 1.369906 0.2643028
4 Date 3 15 1.373157 0.2891074
My cut-off is usually an absolute values of 4 for studentized residuals (99.99% of data falls with 4 units of the mean). This removes very large values and usually, not too many. Only very large data sets need a larger cut-off.
Zero Inflated Models and Generalized Linear Mixed Models with R (2012) by Alain Zuur, Anatoly Saveliev, and Elena Ieno
GLMM FAQs by Ben Bolker (more resources on mixed models from Dr. Bolker)