Applied ANOVA in R

Navigating the R linear model wilderness

Last updated on Oct 14, 2022 35 min read R, ANOVA, linear models

xkcd commentary on p-values

Introduction

ANOVA in R is a unfortunately a bit complicated. Unlike SAS, ANOVA functions in R lack a consistent structure, consistent output and the accessory packages for ANOVA display a patchwork of compatibility. The result is that it is easy to misspecify a model or make other mistakes. The information below is intended to serve as a guide through the R ANOVA wilderness.

Packages Needed

There are many packages to load. Here is a (very) brief summary of what each package does.

Package	Purpose
car	Anova() function to extract type III & II sums of squares
lme4	mixed models
nlme	mixed models, non-linear models, alternative covariance structures
emmeans	for extracting least squares means and contrasts
lmer test	improved summary functions of lmer objects
dplyr	data organization
forcats	for managing categorical data
agridat	has many agricultural data sets
agricolae	has options for many common agricultural experimental designs

library(car)
library(lme4)
library(nlme)
library(emmeans) #in older version of R, you may need to install "multcompView" separately to access full functionality of the emmeans package
library(lmerTest)
library(dplyr)
library(forcats)
library(agridat)

Formula Notation

There are some consistent features across ANOVA methods in R. Formula notation is often used in the R syntax for ANOVA functions. It looks like this: $Y ~ X, where Y is the dependent variable (the response) and X is/are the independent variable(s) (e.g. the experimental treatments).

my_formula <- formula(Y ~ treatment1 + treatment2)
class(my_formula)

## [1] "formula"

my_formula

## Y ~ treatment1 + treatment2

Often the independent variables (i,e, the treatments or the x variables) are expected to be factors, another type of R object:

my_var <- c(rep("low",5), rep("high", 5))
class(my_var) #check what variable type it is

## [1] "character"

Although “my_var” is not type factor, it is type “character” which is automatically converted to a factor in lm(), lmer(), lme() and many other linear modeling functions. There are some packages that do not follow this convention, so it’s helpful to read function documentation, especially if you get unexpected results.

Variables like year, which are often imported as a number or integer, do need to be converted to a factor or a character variable prior to analysis. Otherwise, they will be interpreted as a number in linear modelling and treated as a covariate, e.g, 2020 would be 2,020. Here is one way to do this conversion:

my_factor <- as.character(my_var) # convert to a character
class(my_factor) # check variable type to confirm

## [1] "character"

my_factor <- as.factor(my_var) # convert to a factor
class(my_factor) # check variable type again to confirm

## [1] "factor"

The choice of whether to convert a categorical variable to a character or factor depends on the comfort of the user with these structures and package requirements.

Sometimes, there is a need to alter the order of treatment levels (that is, how R sees those levels). The default behavior of R is to order categorical levels alphanumerically. However, sometimes there are reasons you may not want this (for example, you want to set a particular reference level as the first factor level).

Below is one example of how to reorder factor levels in a variable. The first step is to see which levels are present in the variable and how they are ordered:

levels(my_factor)

## [1] "high" "low"

Once that is known, you can use that information to manually set the levels and their order. Note that spelling of each level much match what is actually present in the variable. Unmatched levels in the variable will be set to NA automatically by R in the following step.

my_factor <- factor(my_factor, levels = c("low", "high")) 
levels(my_factor) # check the new ordering

## [1] "low"  "high"

Knowing the level order is important because in the implementation of ANOVA in R, the first level is treated as the reference level. Manipulating factors is a challenging task in R. The package forcats contains a collection of accessory functions for managing factors (“forcats” = for categories). The tutorial uses the forcats function fct_drop().

More on formulas:

The formula first shown, Y ~ treatment1 + treatment2, includes main effects only. Other formula notation includes the symbols : and *, indicating notation for interaction only and main effects plus the interaction term, respectively.

formula(Y ~ treatment1:treatment2) # interaction only

## Y ~ treatment1:treatment2

formula(Y ~ treatment1*treatment2) # interaction plus main effects

## Y ~ treatment1 * treatment2

These two formulas are equivalent:

formula(Y ~ treatment1 + treatment2 + treatment1:treatment2) 
formula(Y ~ treatment1*treatment2)

Perhaps you can see from these examples that formulas are a really just a collections of characters (that is, a string) and exist independent of any data set. Later, we will need to link these formulas to a data set to actually construct a linear model and conduct statistical analysis.

ANOVA for fixed effects models

Here is a function for reporting the number of missing data in each column. There are other ways to do this, but I find this function easy enough to write and use.

count_na <- function(df) {
  apply(df, 2, function(x) sum(is.na(x))) 
}

Completely Randomised design

First, load the data set “warpbreaks” (a data set from base R). This is an old data set with variables for wool type (A and B) and tension on the loom (L, M or H). The response variable is “breaks”, the number of times the wool thread breaks on industrial looms.

I always like to have a quick look at the data before running any statistical tests. So, here we go:

data(warpbreaks)
count_na(warpbreaks)

##  breaks    wool tension 
##       0       0       0

str(warpbreaks)

## 'data.frame':	54 obs. of  3 variables:
##  $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
##  $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
##  $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

warpbreaks$wool <- factor(warpbreaks$wool, levels = c("A", "B", "C"))

table(warpbreaks$wool, warpbreaks$tension)

##    
##     L M H
##   A 9 9 9
##   B 9 9 9
##   C 0 0 0

hist(warpbreaks$breaks, col = "gold")

boxplot(breaks ~ wool, data = warpbreaks, col = "orangered")

boxplot(breaks ~ tension, data = warpbreaks, col = "chartreuse") #why not have colorful plots?

This data set has 2 treatments. We don’t know if there is an interaction between the variables, yet. A good start is to run a linear model using lm() function, the linear regression function. As a reminder, ANOVA is a special case of the linear regression model where the predictors (the experimental treatments) are categories rather than a continuous variable.

# run standard linear model for main effects only
lm.mod1 <- lm(breaks ~ wool + tension, data = warpbreaks)

# extract type III sums of squares from that model
Anova(lm.mod1, type = "3")

## Anova Table (Type III tests)
## 
## Response: breaks
##              Sum Sq Df  F value    Pr(>F)    
## (Intercept) 20827.0  1 154.3226 < 2.2e-16 ***
## wool          450.7  1   3.3393  0.073614 .  
## tension      2034.3  2   7.5367  0.001378 ** 
## Residuals    6747.9 50                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# run a linear model with main effects and interactions
lm.mod2 <- lm(breaks ~ wool*tension, data = warpbreaks)

# ...and type III sums of squares 
Anova(lm.mod2, type = "III")

## Anova Table (Type III tests)
## 
## Response: breaks
##               Sum Sq Df  F value    Pr(>F)    
## (Intercept)  17866.8  1 149.2757 2.426e-16 ***
## wool          1200.5  1  10.0301 0.0026768 ** 
## tension       2468.5  2  10.3121 0.0001881 ***
## wool:tension  1002.8  2   4.1891 0.0210442 *  
## Residuals     5745.1 48                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

FYI

functions only shown as an example and not actually run.

# this function runs type II sums of squares: 
Anova(lm.mod2, type = "II")
# this function runs type I sums of squares: 
anova(lm.mod2)

A few comments on types of sums of squares:

As a reminder, the type of sums of squares used in statistical tests can impact the results and subsequent interpretation. Type I, sums of squares tests for statistical significance by adding one variable to the model at time (and hence is also called “sequential”). If there is any unbalance in the treatments, the type I sums of squares are dependent on the order variables are added to the model and hence is often not the best choice for many agricultural experiment. Type III sums of squares (also called “partial” or “marginal”) evaluates the statistical significance of variable or interaction, assuming that the other variables are in the model. This is a decent default option. The last option is Type II sums of squares, which is the best option when you are sure there are no interactions between variables. If there is complete balance among treatments (each treatment is observed the same number of times with no missing data), then there is no need to concern yourself with these different types of sums of squares.

Compare Models

# conduct an F test comparing the models
anova(lm.mod1, lm.mod2)

## Analysis of Variance Table
## 
## Model 1: breaks ~ wool + tension
## Model 2: breaks ~ wool * tension
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     50 6747.9                              
## 2     48 5745.1  2    1002.8 4.1891 0.02104 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# also, consider doing a stepwise approach for finding the best model:
step(lm.mod2)

## Start:  AIC=264.02
## breaks ~ wool * tension
## 
##                Df Sum of Sq    RSS    AIC
## <none>                      5745.1 264.02
## - wool:tension  2    1002.8 6747.9 268.71

## 
## Call:
## lm(formula = breaks ~ wool * tension, data = warpbreaks)
## 
## Coefficients:
##    (Intercept)           woolB        tensionM        tensionH  woolB:tensionM  
##          44.56          -16.33          -20.56          -20.00           21.11  
## woolB:tensionH  
##          10.56

Model diagnostics

plot(lm.mod2) #this will produce 4 plots of residuals

shapiro.test(resid(lm.mod2)) #standard shapiro-wilk test.

## 
## 	Shapiro-Wilk normality test
## 
## data:  resid(lm.mod2)
## W = 0.98686, p-value = 0.8162

# this variable could be analyzed with a log-normal model instead

Least squares means & contrasts

The emmeans package is a flexible package for extracting the estimated marginal means (in SAS, the “least squares means”) from different linear models. It is compatible with a large number of R linear modelling packages.

Here is some code for extracting the marginal means and conducting contrasts.

# extract least squares means for 'tension'
(lsm <- emmeans(lm.mod2, ~ tension))

## NOTE: Results may be misleading due to involvement in interactions

##  tension emmean   SE df lower.CL upper.CL
##  L         36.4 2.58 48     31.2     41.6
##  M         26.4 2.58 48     21.2     31.6
##  H         21.7 2.58 48     16.5     26.9
## 
## Results are averaged over the levels of: wool 
## Confidence level used: 0.95

emmeans(lm.mod2, "wool")

## NOTE: Results may be misleading due to involvement in interactions

##  wool emmean   SE df lower.CL upper.CL
##  A      31.0 2.11 48     26.8     35.3
##  B      25.3 2.11 48     21.0     29.5
## 
## Results are averaged over the levels of: tension 
## Confidence level used: 0.95

All pairwise comparisons within each level of tension:

contrast(lsm, "pairwise")

##  contrast estimate   SE df t.ratio p.value
##  L - M       10.00 3.65 48   2.742  0.0229
##  L - H       14.72 3.65 48   4.037  0.0006
##  M - H        4.72 3.65 48   1.295  0.4049
## 
## Results are averaged over the levels of: wool 
## P value adjustment: tukey method for comparing a family of 3 estimates

Conduct custom contrasts comparing ‘Low’ tension versus ‘Medium’ and ‘High’ and ‘High’ versus ‘Medium’ and ‘Low’.

# see the order of each level in a factor
levels(warpbreaks$tension)

## [1] "L" "M" "H"

# construct a list of constructs 
# each item must be same length as the the number of levels present in the variable tension
# use numbers and fracions to indicate the contrasting levels
# the numbers must sum to zero 
cList <- list(LvMH = c(1, -0.5, -0.5), # low vs high + medium
              HvLM = c(0.5, 0.5, -1))  # high vs low + medium

# check that each contrast sums to zero
lapply(cList, sum)

## $LvMH
## [1] 0
## 
## $HvLM
## [1] 0

# perform custom contrast and include a Bonferroni adjustment
summary(contrast(lsm, cList, adjust = "bonferroni"))

##  contrast estimate   SE df t.ratio p.value
##  LvMH        12.36 3.16 48   3.914  0.0006
##  HvLM         9.72 3.16 48   3.078  0.0069
## 
## Results are averaged over the levels of: wool 
## P value adjustment: bonferroni method for 2 tests

Randomised Complete Block Design (RCBD) - fixed effects model

This example uses rapeseed yield data from multiple locations, years and cultivars. Within a single location or year, the replication is often balanced.

Load Data and examine:

data(shafii.rapeseed) # from the 'agridat' package

rapeseed1987 <- shafii.rapeseed %>% filter(year == 87) %>% 
  mutate(block = fct_drop(rep), Cv = fct_drop(gen), loc = fct_drop(loc))

str(rapeseed1987)

## 'data.frame':	216 obs. of  7 variables:
##  $ year : int  87 87 87 87 87 87 87 87 87 87 ...
##  $ loc  : Factor w/ 9 levels "GGA","ID","MT",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ rep  : Factor w/ 4 levels "R1","R2","R3",..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ gen  : Factor w/ 6 levels "Bienvenu","Bridger",..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ yield: num  961 1329 1781 1698 1605 ...
##  $ block: Factor w/ 4 levels "R1","R2","R3",..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ Cv   : Factor w/ 6 levels "Bienvenu","Bridger",..: 1 1 1 1 2 2 2 2 3 3 ...

count_na(rapeseed1987)

##  year   loc   rep   gen yield block    Cv 
##     0     0     0     0     0     0     0

table(rapeseed1987$Cv, rapeseed1987$loc) #experiment has 1 rep per block

##           
##            GGA ID MT NC OR SC TGA TX WA
##   Bienvenu   4  4  4  4  4  4   4  4  4
##   Bridger    4  4  4  4  4  4   4  4  4
##   Cascade    4  4  4  4  4  4   4  4  4
##   Dwarf      4  4  4  4  4  4   4  4  4
##   Glacier    4  4  4  4  4  4   4  4  4
##   Jet        4  4  4  4  4  4   4  4  4

hist(rapeseed1987$yield, col = "gold")

boxplot(yield ~ Cv, data = rapeseed1987, col = "orangered")

boxplot(yield ~ loc, data = rapeseed1987, col = "chartreuse")

Analyse experiment:

# for this example, the analysis will only be done for a single year
# block is nested within location
# if each block had a unique name, 'Error(block)' would suffce
shaf.aov <- aov(yield ~ Cv*loc + Error(block), data = rapeseed1987)

summary(shaf.aov)

## 
## Error: block
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  3 336565  112188               
## 
## Error: Within
##            Df    Sum Sq  Mean Sq F value   Pr(>F)    
## Cv          5   3203992   640798   2.645 0.025111 *  
## loc         8 318197192 39774649 164.165  < 2e-16 ***
## Cv:loc     40  22707425   567686   2.343 0.000103 ***
## Residuals 159  38523267   242285                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

emmeans(shaf.aov, ~ Cv | loc)

## Note: re-fitting model with sum-to-zero contrasts

## loc = GGA:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   1442 245 161      959     1926
##  Bridger    1363 245 161      880     1847
##  Cascade    1505 245 161     1021     1988
##  Dwarf      1295 245 161      811     1779
##  Glacier    1681 245 161     1197     2164
##  Jet        1091 245 161      607     1575
## 
## loc = ID:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   1242 245 161      759     1726
##  Bridger     947 245 161      463     1430
##  Cascade     773 245 161      290     1257
##  Dwarf       932 245 161      448     1415
##  Glacier    1111 245 161      627     1595
##  Jet        1064 245 161      580     1548
## 
## loc = MT:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   2616 245 161     2132     3100
##  Bridger    2828 245 161     2345     3312
##  Cascade    2916 245 161     2433     3400
##  Dwarf      3452 245 161     2968     3935
##  Glacier    3307 245 161     2823     3790
##  Jet        3660 245 161     3177     4144
## 
## loc = NC:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   1001 245 161      517     1485
##  Bridger    1064 245 161      581     1548
##  Cascade     745 245 161      262     1229
##  Dwarf      1014 245 161      530     1497
##  Glacier    1229 245 161      746     1713
##  Jet        1674 245 161     1190     2157
## 
## loc = OR:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   4556 245 161     4072     5039
##  Bridger    2530 245 161     2046     3013
##  Cascade    3336 245 161     2852     3819
##  Dwarf      3932 245 161     3448     4415
##  Glacier    4185 245 161     3702     4669
##  Jet        3220 245 161     2736     3703
## 
## loc = SC:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   2500 245 161     2016     2983
##  Bridger    2705 245 161     2221     3189
##  Cascade    2119 245 161     1635     2602
##  Dwarf      1894 245 161     1410     2377
##  Glacier    2717 245 161     2234     3201
##  Jet        2833 245 161     2349     3316
## 
## loc = TGA:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   1258 245 161      774     1741
##  Bridger    1868 245 161     1384     2351
##  Cascade    1708 245 161     1224     2191
##  Dwarf       873 245 161      389     1356
##  Glacier    1453 245 161      970     1937
##  Jet         954 245 161      470     1438
## 
## loc = TX:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu    838 245 161      354     1322
##  Bridger    1069 245 161      585     1553
##  Cascade     735 245 161      251     1218
##  Dwarf       988 245 161      505     1472
##  Glacier     952 245 161      468     1435
##  Jet        1408 245 161      925     1892
## 
## loc = WA:
##  Cv       emmean  SE  df lower.CL upper.CL
##  Bienvenu   4375 245 161     3891     4859
##  Bridger    4604 245 161     4120     5087
##  Cascade    4464 245 161     3981     4948
##  Dwarf      3974 245 161     3490     4458
##  Glacier    4740 245 161     4256     5224
##  Jet        4344 245 161     3861     4828
## 
## Warning: EMMs are biased unless design is perfectly balanced 
## Confidence level used: 0.95

ANOVA for mixed models

(models with random and fixed effects)

Random effects are often those treatments levels drawn from a large population of possible treatment levels and there is interest in understanding the distribution and variance of that population. This in contrast to fixed effects, where the inferences are restricted to the treatment levels tested.

Blocking factors and Year are often considered random factors because a researcher is not interested in particular years or a blocking factor. When there is unbalanced replication, the variance components should be estimated with maximum likelihood or REML, which implies use of the packages “lmer” and/or “nlme”.

Randomised Complete Block Design (RCBD) - mixed effects

The “shafii.rapeseed” data set will be used for this section.

Analyse experiment using a mixed model:

This uses the function lme() from the package “nlme”. Functionally, it is very similar to calling lme4::lmer(). The degrees of freedom are different (lmer() is using Satterthwaite’s approximation), but the p-values are the same.

# turn year into the factor "Year"
shafii.rapeseed$Year <- as.factor(shafii.rapeseed$year)
# create a blocking variable that is unique for each location-by-year combination
# so R doesn't conflate "R1" from one location/year with another location/year
shafii.rapeseed$Rep <- as.factor(paste(shafii.rapeseed$loc, shafii.rapeseed$year, shafii.rapeseed$rep, sep = "_"))

shaf.lme <- lme(fixed = yield ~ gen*loc + Year,
                  random = ~ 1|Rep,
                  data = shafii.rapeseed, method = "REML")

# view sum of squares table 
# when anova() is called for an lme object, the function called is actually anova.lme()
anova(shaf.lme, type = "marginal") # "marginal" is equivalent to type III sums of squares

##             numDF denDF   F-value p-value
## (Intercept)     1   470 16.204597  0.0001
## gen             5   470  1.092341  0.3637
## loc            13    92 13.074492  <.0001
## Year            2    92  2.035054  0.1365
## gen:loc        65   470  2.575753  <.0001

Anova(shaf.lme, type = "3")

## Analysis of Deviance Table (Type III tests)
## 
## Response: yield
##                Chisq Df Pr(>Chisq)    
## (Intercept)  16.2046  1  5.686e-05 ***
## gen           5.4617  5     0.3622    
## loc         169.9684 13  < 2.2e-16 ***
## Year          4.0701  2     0.1307    
## gen:loc     167.4239 65  5.579e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# FYI: use "anova(model.lme)" for type I sums of squares

# lmer notation
shaf.lmer <- lmer(yield ~ gen*loc + Year + (1|Rep),
                  data = shafii.rapeseed, REML = T)
anova(shaf.lmer, type = "marginal")

## Marginal Analysis of Variance Table with Satterthwaite's method
##           Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## gen      1860586  372117     5 470.00  1.0923    0.3637    
## loc     57901484 4453960    13 159.37 13.0745 < 2.2e-16 ***
## Year     1386524  693262     2  92.00  2.0351    0.1365    
## gen:loc 57034691  877457    65 470.00  2.5758 5.499e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(shaf.lmer, type = "3")

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: yield
##                Chisq Df Pr(>Chisq)    
## (Intercept)  16.2046  1  5.686e-05 ***
## gen           5.4617  5     0.3622    
## loc         169.9684 13  < 2.2e-16 ***
## Year          4.0701  2     0.1307    
## gen:loc     167.4239 65  5.579e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Diagnostics, model building

plot(shaf.lme)

qqnorm(shaf.lme, abline = c(0, 1))

Least squares means

# for cultivar 
(lme.means.cv <- emmeans(shaf.lme, "gen"))

## NOTE: Results may be misleading due to involvement in interactions

##  gen      emmean  SE df lower.CL upper.CL
##  Bienvenu   2432 112 92     2211     2654
##  Bridger    2314 112 92     2092     2536
##  Cascade    2184 112 92     1962     2406
##  Dwarf      2308 112 92     2087     2530
##  Glacier    2463 112 92     2242     2685
##  Jet        2304 112 92     2082     2525
## 
## Results are averaged over the levels of: loc, Year 
## Degrees-of-freedom method: containment 
## Confidence level used: 0.95

# for location
(lme.means.loc <- emmeans(shaf.lme, "loc"))

## NOTE: Results may be misleading due to involvement in interactions

##  loc emmean  SE df lower.CL upper.CL
##  GGA   1682 329 92     1030     2335
##  ID    4217 261 92     3698     4736
##  KS    1120 476 92      174     2066
##  MS    2204 476 92     1258     3150
##  MT    3339 474 92     2398     4280
##  NC    1328 329 92      676     1981
##  NY    3139 476 92     2193     4085
##  OR    3292 329 92     2640     3945
##  SC    1819 261 92     1300     2338
##  TGA   1028 261 92      509     1547
##  TN    2543 476 92     1597     3490
##  TX     827 329 92      174     1479
##  VA    2282 328 92     1631     2932
##  WA    3861 261 92     3342     4380
## 
## Results are averaged over the levels of: gen, Year 
## Degrees-of-freedom method: containment 
## Confidence level used: 0.95

# for cultivar means within each location
lme.means.int <- emmeans(shaf.lme, ~ gen | loc + Year)

# this code would produce location means within each cultivar 
# emmeans(model.lme, ~ loc | gen))
# also: 
# emmeans(model.lme, ~ loc | gen)) provides the same estimates as 'emmeans(model.lme, ~ gen | loc))'

Pairwise Contrasts:

# all pairwise
pairs(lme.means.cv)

##  contrast           estimate   SE  df t.ratio p.value
##  Bienvenu - Bridger   118.57 87.6 470   1.353  0.7548
##  Bienvenu - Cascade   248.34 87.6 470   2.834  0.0539
##  Bienvenu - Dwarf     124.11 87.6 470   1.417  0.7170
##  Bienvenu - Glacier   -31.00 87.6 470  -0.354  0.9993
##  Bienvenu - Jet       128.70 87.6 470   1.469  0.6843
##  Bridger - Cascade    129.77 87.6 470   1.481  0.6765
##  Bridger - Dwarf        5.54 87.6 470   0.063  1.0000
##  Bridger - Glacier   -149.57 87.6 470  -1.707  0.5277
##  Bridger - Jet         10.13 87.6 470   0.116  1.0000
##  Cascade - Dwarf     -124.23 87.6 470  -1.418  0.7161
##  Cascade - Glacier   -279.34 87.6 470  -3.188  0.0190
##  Cascade - Jet       -119.64 87.6 470  -1.366  0.7477
##  Dwarf - Glacier     -155.10 87.6 470  -1.770  0.4861
##  Dwarf - Jet            4.59 87.6 470   0.052  1.0000
##  Glacier - Jet        159.70 87.6 470   1.823  0.4521
## 
## Results are averaged over the levels of: loc, Year 
## Degrees-of-freedom method: containment 
## P value adjustment: tukey method for comparing a family of 6 estimates

# plot results
plot(lme.means.cv, comparison = T)

plot(lme.means.loc, comparison = T, horizontal = F) # rotate plots to vertical position

## Warning: Comparison discrepancy in group "1", GGA - OR:
##     Target overlap = 0.0083, overlap on graph = -0.0111

# blue bars = lsmeans confidence 95% confidence intervals
# red arrows. pairwise differences (overlapping arrows = not significantly different)

For those who want the letters assigned to treatments based on all pairwise comparisons, it’s an unwieldy road:

library(multcomp) # this will need to be installed if you do not already have it
tukey <- glht(shaf.lme, linfct = mcp(loc = "Tukey"))
### extract information
cld_tukey <- cld(tukey)
print(cld_tukey)

##  GGA   ID   KS   MS   MT   NC   NY   OR   SC  TGA   TN   TX   VA   WA 
##  "a"  "b"  "a" "ac" "ab"  "a" "ab" "bc"  "a"  "a" "ab"  "a"  "a" "bc"

Interaction plots can also be done:

(but, it gets unwieldy)

plot(lme.means.int, comparison = T, adjust = "tukey")

Other pre-set contrasts

# compare to a control, e.g. "Bridger"
levels(shafii.rapeseed$gen)

## [1] "Bienvenu" "Bridger"  "Cascade"  "Dwarf"    "Glacier"  "Jet"

# Bridger is listed in position 2 of the factor 'shafii.rapeseed$gen'
# so '2' is set as the reference level in the following contrast statement: 

# "trt.vs.ctrlk" (treatment versus control treatment k) is a specific option to compare all treatment levels to a user-defined level
# by default, it will use the last level as the reference level

contrast(lme.means.cv, "trt.vs.ctrlk", ref = 2)

##  contrast           estimate   SE  df t.ratio p.value
##  Bienvenu - Bridger   118.57 87.6 470   1.353  0.5118
##  Cascade - Bridger   -129.77 87.6 470  -1.481  0.4315
##  Dwarf - Bridger       -5.54 87.6 470  -0.063  0.9998
##  Glacier - Bridger    149.57 87.6 470   1.707  0.3034
##  Jet - Bridger        -10.13 87.6 470  -0.116  0.9990
## 
## Results are averaged over the levels of: loc, Year 
## Degrees-of-freedom method: containment 
## P value adjustment: dunnettx method for 5 tests

Search ?contrast.emmGrid to see full list of options for preset contrasts.

Custom contrasts

# example: contrast Western locations versus Southern locations

# first, find out what levels are present
unique(shafii.rapeseed$loc)

##  [1] GGA ID  KS  MS  MT  NC  NY  OR  SC  TGA TN  TX  VA  WA 
## Levels: GGA ID KS MS MT NC NY OR SC TGA TN TX VA WA

# next create a contrast list 
# this is a list of coefficients as long your list of treatment levels
# indicating what coefficients to give each treatment level

# in this example, levels "ID", "MT", "OR", and "WA" are contrasted versus
# "NC", "SC", "MS", "TN", "TX" and "VA"

cList <- list(West_V_South = c(0, 1/4, 0, -1/6, 1/4, -1/6, 0, 1/4, -1/6, 0, -1/6, -1/6, -1/6, 1/4))

# check that each contrast sums to zero:
lapply(cList, sum)

## $West_V_South
## [1] 5.551115e-17

lme.means.loc2 <- emmeans(shaf.lme, "loc", contr = cList)

## NOTE: Results may be misleading due to involvement in interactions

summary(lme.means.loc2)

## $emmeans
##  loc emmean  SE df lower.CL upper.CL
##  GGA   1682 329 92     1030     2335
##  ID    4217 261 92     3698     4736
##  KS    1120 476 92      174     2066
##  MS    2204 476 92     1258     3150
##  MT    3339 474 92     2398     4280
##  NC    1328 329 92      676     1981
##  NY    3139 476 92     2193     4085
##  OR    3292 329 92     2640     3945
##  SC    1819 261 92     1300     2338
##  TGA   1028 261 92      509     1547
##  TN    2543 476 92     1597     3490
##  TX     827 329 92      174     1479
##  VA    2282 328 92     1631     2932
##  WA    3861 261 92     3342     4380
## 
## Results are averaged over the levels of: gen, Year 
## Degrees-of-freedom method: containment 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     1843 233 92   7.910  <.0001
## 
## Results are averaged over the levels of: gen, Year 
## Degrees-of-freedom method: containment

# same contrast can also be done within each level of 'gen':
emmeans(shaf.lme, ~ loc | gen, contr = cList)

## $emmeans
## gen = Bienvenu:
##  loc emmean  SE df lower.CL upper.CL
##  GGA   1785 379 92  1032.31     2537
##  ID    4742 303 92  4140.13     5345
##  KS    1179 546 92    94.60     2263
##  MS    2455 546 92  1371.47     3539
##  MT    2825 544 92  1745.38     3904
##  NC    1330 379 92   577.36     2082
##  NY    2934 546 92  1849.69     4018
##  OR    4118 379 92  3365.98     4870
##  SC    1844 303 92  1241.42     2446
##  TGA    893 303 92   290.99     1496
##  TN    2965 546 92  1880.59     4049
##  TX     919 379 92   167.04     1671
##  VA    2124 378 92  1373.34     2875
##  WA    3943 303 92  3340.44     4545
## 
## gen = Bridger:
##  loc emmean  SE df lower.CL upper.CL
##  GGA   1470 379 92   718.17     2223
##  ID    3591 303 92  2989.15     4194
##  KS    1091 546 92     7.35     2175
##  MS    2478 546 92  1393.89     3562
##  MT    3037 544 92  1957.63     4117
##  NC    1479 379 92   727.28     2232
##  NY    3130 546 92  2045.60     4214
##  OR    2564 379 92  1811.99     3316
##  SC    2282 303 92  1679.58     2884
##  TGA   1603 303 92  1000.66     2205
##  TN    2485 546 92  1401.33     3569
##  TX     851 379 92    99.08     1604
##  VA    2397 378 92  1646.76     3148
##  WA    3935 303 92  3332.27     4537
## 
## gen = Cascade:
##  loc emmean  SE df lower.CL upper.CL
##  GGA   1758 379 92  1006.25     2511
##  ID    4081 303 92  3479.04     4684
##  KS     891 546 92  -193.40     1975
##  MS    1598 546 92   514.04     2682
##  MT    3125 544 92  2045.63     4205
##  NC    1062 379 92   309.61     1814
##  NY    2586 546 92  1502.21     3670
##  OR    2806 379 92  2053.82     3558
##  SC    1982 303 92  1379.70     2584
##  TGA   1492 303 92   889.83     2094
##  TN    2006 546 92   922.37     3090
##  TX     796 379 92    43.59     1548
##  VA    2191 378 92  1440.56     2942
##  WA    4203 303 92  3600.69     4805
## 
## gen = Dwarf:
##  loc emmean  SE df lower.CL upper.CL
##  GGA   1538 379 92   785.71     2290
##  ID    4326 303 92  3723.81     4928
##  KS    1208 546 92   123.85     2292
##  MS    1966 546 92   881.69     3050
##  MT    3661 544 92  2581.14     4740
##  NC    1321 379 92   568.53     2073
##  NY    3645 546 92  2561.26     4729
##  OR    3594 379 92  2841.40     4346
##  SC    1292 303 92   690.10     1895
##  TGA    451 303 92  -151.81     1053
##  TN    2688 546 92  1603.57     3771
##  TX     654 379 92   -98.64     1406
##  VA    2250 378 92  1499.12     3000
##  WA    3726 303 92  3123.52     4328
## 
## gen = Glacier:
##  loc emmean  SE df lower.CL upper.CL
##  GGA   2031 379 92  1278.35     2783
##  ID    4299 303 92  3696.61     4901
##  KS    1268 546 92   183.85     2352
##  MS    2861 546 92  1776.82     3945
##  MT    3516 544 92  2436.14     4595
##  NC    1452 379 92   699.82     2204
##  NY    3301 546 92  2217.49     4385
##  OR    3472 379 92  2719.36     4224
##  SC    2025 303 92  1422.97     2628
##  TGA   1109 303 92   506.90     1712
##  TN    2265 546 92  1180.58     3348
##  TX     720 379 92   -31.85     1473
##  VA    2363 378 92  1612.64     3114
##  WA    3807 303 92  3205.02     4410
## 
## gen = Jet:
##  loc emmean  SE df lower.CL upper.CL
##  GGA   1511 379 92   758.95     2263
##  ID    4262 303 92  3659.68     4864
##  KS    1082 546 92    -2.40     2166
##  MS    1866 546 92   781.68     2950
##  MT    3869 544 92  2789.89     4949
##  NC    1326 379 92   573.58     2078
##  NY    3237 546 92  2152.80     4321
##  OR    3199 379 92  2446.70     3951
##  SC    1488 303 92   886.13     2091
##  TGA    622 303 92    19.27     1224
##  TN    2853 546 92  1768.63     3937
##  TX    1020 379 92   267.99     1772
##  VA    2364 378 92  1613.85     3115
##  WA    3554 303 92  2952.19     4157
## 
## Results are averaged over the levels of: Year 
## Degrees-of-freedom method: containment 
## Confidence level used: 0.95 
## 
## $contrasts
## gen = Bienvenu:
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     1968 267 92   7.359  <.0001
## 
## gen = Bridger:
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     1286 267 92   4.811  <.0001
## 
## gen = Cascade:
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     1948 267 92   7.286  <.0001
## 
## gen = Dwarf:
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     2132 267 92   7.972  <.0001
## 
## gen = Glacier:
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     1826 267 92   6.828  <.0001
## 
## gen = Jet:
##  contrast     estimate  SE df t.ratio p.value
##  West_V_South     1902 267 92   7.112  <.0001
## 
## Results are averaged over the levels of: Year 
## Degrees-of-freedom method: containment

To perform custom contrasts on a another variable, a cList and emmeans call for that variable is required.

ANCOVA

(analysis of covariance) From a R programming perspective, this is no different than running a standard linear model. A data set from agridat, “theobald.covariate” comparing corn silage yields across multiple years, locations and cultivars. The data set includes a covariate, “chu” (corn heat units, a bit like growing degree days).

Load data and examine:

data(theobald.covariate)
str(theobald.covariate)

## 'data.frame':	256 obs. of  5 variables:
##  $ year : int  1990 1990 1990 1990 1990 1991 1991 1991 1991 1991 ...
##  $ env  : Factor w/ 7 levels "E1","E2","E3",..: 1 2 3 4 7 1 2 3 4 5 ...
##  $ gen  : Factor w/ 10 levels "G01","G02","G03",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ yield: num  6.27 5.57 8.45 7.35 6.5 6.71 5.59 8.36 7.25 8.09 ...
##  $ chu  : num  2.57 2.53 2.72 2.72 2.48 2.44 2.55 2.75 2.75 2.61 ...

count_na(theobald.covariate)

##  year   env   gen yield   chu 
##     0     0     0     0     0

Exploratory plots:

# distributions of continuous variables
hist(theobald.covariate$yield, col = "gold")

hist(theobald.covariate$chu, col = "gray70")

# relationship between reponse variable and covariate:
with(theobald.covariate, plot(chu, yield))

length(unique(theobald.covariate$chu))

## [1] 21

# the usual boxplots: 
boxplot(yield ~ env, data = theobald.covariate, col = "orangered")

boxplot(yield ~ year, data = theobald.covariate, col = "chartreuse")

boxplot(yield ~ gen, data = theobald.covariate, col = "darkcyan")

Check the extent of replication:

theobald.covariate$Year <- as.factor(theobald.covariate$year)
replications(yield ~ Year + env + gen, data = theobald.covariate)

## $Year
## Year
## 1990 1991 1992 1993 1994 
##   40   63   60   45   48 
## 
## $env
## env
## E1 E2 E3 E4 E5 E6 E7 
## 35 35 44 36 36 36 34 
## 
## $gen
## gen
## G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 
##  29  29  29  29  22  29  23  18  24  24

# with(theobald.covariate, table(gen, env, Year)) # lots of useful output

The treatments are not fully crossed, so a fully specified model of the form yield ~ Year*env*gen*chu cannot be tested. The treatments and interactions were tested in reduced models and compared (not shown). The final “best” model is shown below.

# the covariate, chu, is added in like any other effect. 
theobald.lm2 <- lm(yield ~  Year + env*chu, data = theobald.covariate)
Anova(theobald.lm2, type = "III")

## Anova Table (Type III tests)
## 
## Response: yield
##              Sum Sq  Df F value    Pr(>F)    
## (Intercept)   4.309   1  6.8321  0.009524 ** 
## Year         76.589   4 30.3607 < 2.2e-16 ***
## env          13.473   6  3.5607  0.002138 ** 
## chu          11.831   1 18.7596 2.187e-05 ***
## env:chu      13.376   6  3.5350  0.002268 ** 
## Residuals   150.096 238                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# how to extract the covariate slope(s): 
emtrends(theobald.lm2, ~ env, "chu")

##  env chu.trend   SE  df lower.CL upper.CL
##  E1      7.015 1.62 238     3.82    10.21
##  E2      0.979 4.44 238    -7.76     9.72
##  E3      4.099 3.15 238    -2.11    10.31
##  E4     -2.884 3.54 238    -9.87     4.10
##  E5      8.222 2.70 238     2.90    13.54
##  E6      3.425 2.72 238    -1.93     8.78
##  E7     -0.359 2.55 238    -5.38     4.66
## 
## Results are averaged over the levels of: Year 
## Confidence level used: 0.95

# emmeans extracted as usual:
emmeans(theobald.lm2, ~ env)

## NOTE: Results may be misleading due to involvement in interactions

##  env emmean    SE  df lower.CL upper.CL
##  E1    6.67 0.175 238     6.32     7.01
##  E2    5.13 0.256 238     4.63     5.64
##  E3    6.66 0.482 238     5.71     7.61
##  E4    7.22 0.508 238     6.22     8.22
##  E5    6.61 0.138 238     6.34     6.88
##  E6    6.43 0.236 238     5.97     6.90
##  E7    6.32 0.397 238     5.54     7.10
## 
## Results are averaged over the levels of: Year 
## Confidence level used: 0.95

emmeans(theobald.lm2, ~ Year)

##  Year emmean    SE  df lower.CL upper.CL
##  1990   6.97 0.189 238     6.60     7.34
##  1991   6.75 0.170 238     6.41     7.08
##  1992   7.07 0.187 238     6.70     7.44
##  1993   5.39 0.208 238     4.98     5.80
##  1994   6.00 0.218 238     5.57     6.43
## 
## Results are averaged over the levels of: env 
## Confidence level used: 0.95

Split-plot

Load “Oats” from nlme. Nitrogen level (“nitro”) is the main plot, cultivar (“Variety”) is the sub-plot and “Block” describes the blocking layout.

data(Oats) 
str(Oats)

## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':	72 obs. of  4 variables:
##  $ Block  : Ord.factor w/ 6 levels "VI"<"V"<"III"<..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ Variety: Factor w/ 3 levels "Golden Rain",..: 3 3 3 3 1 1 1 1 2 2 ...
##  $ nitro  : num  0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 ...
##  $ yield  : num  111 130 157 174 117 114 161 141 105 140 ...
##  - attr(*, "formula")=Class 'formula'  language yield ~ nitro | Block
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ y: chr "Yield"
##   ..$ x: chr "Nitrogen concentration"
##  - attr(*, "units")=List of 2
##   ..$ y: chr "(bushels/acre)"
##   ..$ x: chr "(cwt/acre)"
##  - attr(*, "inner")=Class 'formula'  language ~Variety
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

count_na(Oats)

##   Block Variety   nitro   yield 
##       0       0       0       0

Oats$N <- as.factor(Oats$nitro)
replications(yield ~ Variety*N*Block, data = Oats)

##         Variety               N           Block       Variety:N   Variety:Block 
##              24              18              12               6               4 
##         N:Block Variety:N:Block 
##               3               1

table(Oats$Variety, Oats$N)

##              
##               0 0.2 0.4 0.6
##   Golden Rain 6   6   6   6
##   Marvellous  6   6   6   6
##   Victory     6   6   6   6

hist(Oats$yield, col = "gold")

boxplot(yield ~ N, data = Oats, col = "dodgerblue1")

boxplot(yield ~ Variety, data = Oats, col = "red3")

Balanced Trial Analysis

The format for specifying split-plot error terms is Error(blocking factor/main plot).

#contrasts("contr.sum")
spl.oats <- aov(yield ~ Variety*N + Error(Block:N), data = Oats) 
summary(spl.oats)

## 
## Error: Block:N
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## N          3  20020    6673   7.556 0.00143 **
## Residuals 20  17663     883                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)  
## Variety    2   1786   893.2   2.930 0.0649 .
## Variety:N  6    322    53.6   0.176 0.9818  
## Residuals 40  12194   304.8                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

emmeans(spl.oats, "N")

## Note: re-fitting model with sum-to-zero contrasts

## NOTE: Results may be misleading due to involvement in interactions

##  N   emmean SE df lower.CL upper.CL
##  0     79.4  7 20     64.8       94
##  0.2   98.9  7 20     84.3      114
##  0.4  114.2  7 20     99.6      129
##  0.6  123.4  7 20    108.8      138
## 
## Results are averaged over the levels of: Variety 
## Warning: EMMs are biased unless design is perfectly balanced 
## Confidence level used: 0.95

emmeans(spl.oats, ~ Variety)

## Note: re-fitting model with sum-to-zero contrasts
## NOTE: Results may be misleading due to involvement in interactions

##  Variety     emmean   SE   df lower.CL upper.CL
##  Golden Rain  104.5 4.55 46.1     95.3      114
##  Marvellous   109.8 4.55 46.1    100.6      119
##  Victory       97.6 4.55 46.1     88.5      107
## 
## Results are averaged over the levels of: N 
## Warning: EMMs are biased unless design is perfectly balanced 
## Confidence level used: 0.95

Unbalanced Trial Analysis

spl.oats2 <- lmer(yield ~ N*Variety + (1|Block:N), data = Oats) 
Anova(spl.oats2, type = "3")

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: yield
##               Chisq Df Pr(>Chisq)    
## (Intercept) 77.1670  1  < 2.2e-16 ***
## N           13.9028  3   0.003041 ** 
## Variety      2.2747  2   0.320663    
## N:Variety    1.0554  6   0.983423    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

emmeans(spl.oats2, "N")

## NOTE: Results may be misleading due to involvement in interactions

##  N   emmean SE df lower.CL upper.CL
##  0     79.4  7 20     64.8       94
##  0.2   98.9  7 20     84.3      114
##  0.4  114.2  7 20     99.6      129
##  0.6  123.4  7 20    108.8      138
## 
## Results are averaged over the levels of: Variety 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

emmeans(spl.oats2, ~ Variety)

## NOTE: Results may be misleading due to involvement in interactions

##  Variety     emmean   SE   df lower.CL upper.CL
##  Golden Rain  104.5 4.55 46.1     95.3      114
##  Marvellous   109.8 4.55 46.1    100.6      119
##  Victory       97.6 4.55 46.1     88.5      107
## 
## Results are averaged over the levels of: N 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

Other Designs

There are many other experimental designs commonly used in agricultural trials (split-split plot, split-block, alpha lattice, etc). We have written an online resource for routine incorporation of spatial covariates into field trial analysis that includes information on how to analyze different designs. You could also consider using the agricolae package.

Extra Functions

for extracting model parameters, diagnostics and other model information

These work differently with different R object types. That is, different output will result depending on if a “lm”, “lme” or “merMod” (lmer) object is used in the function call.

# extract model summary
summary()

#extract coefficients:
coef()

#extract residuals
resid()
rstudent()
residuals()

# extract predicted values
fits()

# make diagnostic plots
plot()

# extract influence measures:
influence.measures()

#other fir diagnostics:
cooks.distance()
dffits()
dfbeta()
hat()

To see the all functions available for a particular type of linear model object, use:

methods(class = "lm") # for lm objects

##  [1] add1                addterm             alias              
##  [4] anova               Anova               attrassign         
##  [7] avPlot              Boot                bootCase           
## [10] boxcox              boxCox              brief              
## [13] case.names          ceresPlot           coerce             
## [16] concordance         confidenceEllipse   confint            
## [19] Confint             cooks.distance      crPlot             
## [22] deltaMethod         deviance            dfbeta             
## [25] dfbetaPlots         dfbetas             dfbetasPlots       
## [28] drop1               dropterm            dummy.coef         
## [31] durbinWatsonTest    effects             emm_basis          
## [34] extractAIC          family              formula            
## [37] hatvalues           hccm                infIndexPlot       
## [40] influence           influencePlot       initialize         
## [43] inverseResponsePlot kappa               labels             
## [46] leveneTest          leveragePlot        linearHypothesis   
## [49] logLik              logtrans            mcPlot             
## [52] mmp                 model.frame         model.matrix       
## [55] ncvTest             nextBoot            nobs               
## [58] outlierTest         plot                powerTransform     
## [61] predict             Predict             print              
## [64] proj                qqnorm              qqPlot             
## [67] qr                  recover_data        residualPlot       
## [70] residualPlots       residuals           rstandard          
## [73] rstudent            S                   show               
## [76] sigmaHat            simulate            slotsFromS3        
## [79] spreadLevelPlot     summary             symbox             
## [82] variable.names      vcov               
## see '?methods' for accessing help and source code

methods(class = "lme") # for lme4 objects

##  [1] ACF              anova            Anova            augPred         
##  [5] coef             comparePred      confint          Confint         
##  [9] deltaMethod      deviance         emm_basis        extractAIC      
## [13] fitted           fixef            formula          getData         
## [17] getGroups        getGroupsFormula getResponse      getVarCov       
## [21] influence        intervals        linearHypothesis logLik          
## [25] matchCoefs       nobs             pairs            plot            
## [29] predict          print            qqnorm           ranef           
## [33] recover_data     residuals        S                sigma           
## [37] simulate         summary          update           VarCorr         
## [41] Variogram        vcov            
## see '?methods' for accessing help and source code

methods(class = "merMod") # for nlme objects

##  [1] anova            Anova            as.function      coef            
##  [5] confint          cooks.distance   deltaMethod      deviance        
##  [9] df.residual      drop1            emm_basis        extractAIC      
## [13] family           fitted           fixef            formula         
## [17] getData          getL             getME            hatvalues       
## [21] influence        isGLMM           isLMM            isNLMM          
## [25] isREML           linearHypothesis logLik           matchCoefs      
## [29] model.frame      model.matrix     na.action        ngrps           
## [33] nobs             plot             predict          print           
## [37] profile          ranef            recover_data     refit           
## [41] refitML          rePCA            residuals        rstudent        
## [45] show             sigma            simulate         summary         
## [49] terms            update           VarCorr          vcov            
## [53] vif              weights         
## see '?methods' for accessing help and source code

The package emmeans also supports a large number of models.