Making Sense of Your Estimates

Julia Piaskowski

March 24, 2026

“I want to know if these things are different.”

Linear Model Review

\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

\(Y_i\) = dependent variable (there is only 1)

\(X_i\) = independent variable(s) (there may be many)

\(B_0\) = model intercept, the overall mean of \(Y\)

\(B_1\) = how \(Y\) changes with \(X\)

\(\epsilon_i\) = model residual, the gap between the predicted value for \(Y_i\) and its observed value

The Expected Value, \(E(Y)\)

Synonyms: “least squares means”, “estimated marginal means”, “best linear unbiased linear estimates” (BLUEs)
Estimating these is often a goal of an experiment and analysis
It is a function of the model parameters

\[\hat{Y_{i}} = \beta_0 + \beta_i X\]

\[SE(Y_i) = \mathbf{X \Sigma X^T}\]

\(\mathbf{X}\) = design matrix for X.

\(\mathbf{\Sigma}\) = variance-covariance matrix for fixed effects, \(\mathbf{\beta}\)

Estimates & Hypothesis Statements

It helps an analysis to have specific questions or goals.
Examples:
- How much does a type of cattle feed increase weight gain among calves < 1 year old?
- How many nematodes eggs are produced in each of these crop varieties and which ones differ from the control variety?
- Does grain yield increase by at least 2 units as a result of 10 years of reduced tillage?
- How far do I need to sample in a field to achieve spatial independence?

Confidence Interval

[A confidence interval percentage] is the frequency with which other unobserved intervals will contain the true effect….if all the assumptions used to compute the intervals were correct.

– Greenland et al, 2016

The confidence level instead reflects the long-run reliability of the method used to generate the interval….if the same sampling procedure were repeated 100 times from the same population, approximately 95 of the resulting intervals would be expected to contain the true population mean. The frequentist approach sees the true population mean as a fixed unknown constant, while the confidence interval is calculated using data from a random sample.

– Wikipedia

P-value

…a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.

– American Statistical Association

Estimates

Estimates Example: White pine

White pine study (from homoscedasticity lecture), 4 pollen parents crossed with 7 female trees (4 reps), epicotyl length was assessed
What pollen and egg parent have the longest epicotyl length? Which parental combination results in the longest epicotyl length?

library(lme4); library(emmeans); library(lmerTest)
data("hanover.whitepine", package = "agridat")
m1 <- lmer(length ~ male*female + (1|rep), data = hanover.whitepine)

Note: for this and all other analyses in this presentation, the data and model inspected in previous lectures and found to meet linear model assumptions of normality, homoscedasticity and independence.

Estimates Example: white pine

ANOVA

anova(m1)

Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
male        12.3538  4.1179     3    81 20.5460 5.393e-10 ***
female       6.4977  1.0829     6    81  5.4032 9.740e-05 ***
male:female 13.4609  0.7478    18    81  3.7312 2.266e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates Example: white pine

Main effects

(wp_emm1 <- emmeans(m1, ~ male)) # print out, same as summary()

 male emmean    SE   df lower.CL upper.CL
 M17    3.80 0.104 11.5     3.58     4.03
 M19    2.91 0.104 11.5     2.68     3.14
 M22    3.57 0.104 11.5     3.34     3.80
 M58    3.31 0.104 11.5     3.08     3.54

Results are averaged over the levels of: female 
Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

(wp_emm2 <- emmeans(m1, ~ female, level = 0.90)) #90% confidence interval

 female emmean    SE   df lower.CL upper.CL
 F193     3.74 0.127 23.3     3.52     3.96
 F195     3.28 0.127 23.3     3.06     3.50
 F197     3.58 0.127 23.3     3.36     3.80
 F201     3.15 0.127 23.3     2.93     3.37
 F203     3.57 0.127 23.3     3.35     3.78
 F204     3.01 0.127 23.3     2.79     3.23
 F208     3.46 0.127 23.3     3.24     3.68

Results are averaged over the levels of: male 
Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.9

Notes

If you get the following message, install the ‘pbrktest’ package:

Cannot use mode = "kenward-roger" because *pbkrtest* package is not installed

This message is automatically printed when main effects are requested from a model with interactions:

NOTE: Results may be misleading due to involvement in interactions

Interaction types

Estimates Example: white pine

Interactions

wp_emm3 <- emmeans(m1, ~female*male) 
wp_emm3 |> as.data.frame() |> dplyr::arrange(desc(emmean))

 female male emmean        SE    df lower.CL upper.CL
 F197   M17  4.7225 0.2318164 74.77 4.260675 5.184325
 F193   M22  4.2350 0.2318164 74.77 3.773175 4.696825
 F203   M17  4.1525 0.2318164 74.77 3.690675 4.614325
 F208   M17  3.9775 0.2318164 74.77 3.515675 4.439325
 F203   M19  3.9750 0.2318164 74.77 3.513175 4.436825
 F193   M17  3.9675 0.2318164 74.77 3.505675 4.429325
 F201   M22  3.7525 0.2318164 74.77 3.290675 4.214325
 F208   M22  3.6450 0.2318164 74.77 3.183175 4.106825
 F193   M58  3.6300 0.2318164 74.77 3.168175 4.091825
 F195   M17  3.6225 0.2318164 74.77 3.160675 4.084325
 F195   M58  3.5700 0.2318164 74.77 3.108175 4.031825
 F208   M58  3.5175 0.2318164 74.77 3.055675 3.979325
 F197   M22  3.5150 0.2318164 74.77 3.053175 3.976825
 F197   M58  3.4850 0.2318164 74.77 3.023175 3.946825
 F204   M22  3.4125 0.2318164 74.77 2.950675 3.874325
 F203   M22  3.2300 0.2318164 74.77 2.768175 3.691825
 F204   M17  3.2050 0.2318164 74.77 2.743175 3.666825
 F195   M22  3.2000 0.2318164 74.77 2.738175 3.661825
 F193   M19  3.1225 0.2318164 74.77 2.660675 3.584325
 F201   M58  3.1025 0.2318164 74.77 2.640675 3.564325
 F201   M17  2.9825 0.2318164 74.77 2.520675 3.444325
 F204   M58  2.9450 0.2318164 74.77 2.483175 3.406825
 F203   M58  2.9075 0.2318164 74.77 2.445675 3.369325
 F201   M19  2.7550 0.2318164 74.77 2.293175 3.216825
 F195   M19  2.7375 0.2318164 74.77 2.275675 3.199325
 F208   M19  2.6950 0.2318164 74.77 2.233175 3.156825
 F197   M19  2.5975 0.2318164 74.77 2.135675 3.059325
 F204   M19  2.4850 0.2318164 74.77 2.023175 2.946825

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

Estimates Example: white pine

Plotting interactions helps

emmip(m1, male ~ female)

emmip() documentation

Estimates Example: white pine

Contrasts

Pairwise is possible, but with 28 levels (7 female * 4 male parents) that will be very messy since that means 378 total contrasts. Note that functions for compact letter display, e.g. multcomp::cld(), are doing this in order to generate letters.
For the white pine example, there may be no need to conduct hypothesis tests, getting the estimates may be enough for parental selection.

Contrasts

Major Contrast Types

Pairwise
Consecutively pairwise between ordered levels
To a reference level
To a constant
Custom

In most cases, the only way to know if two things are different is to conduct a contrast.

Pairwise Contrasts

Nitrogen fertility experiment: 9 N levels, 2 zones, 4 reps, completely randomized design
Dependent variable: crop yield

data("bachmaier.nitrogen", package = "agridat")
bachmaier.nitrogen$nitro <- as.factor(bachmaier.nitrogen$nitro)
m2 <- lm(yield ~ nitro*zone, data = bachmaier.nitrogen)
(bach_emm1 <- emmeans(m2, ~ zone | nitro, at = list(nitro = c("0", "100", "200"))))

nitro = 0:
 zone emmean    SE df lower.CL upper.CL
 high   5.35 0.354 66     4.64     6.06
 low    3.47 0.354 66     2.76     4.17

nitro = 100:
 zone emmean    SE df lower.CL upper.CL
 high   8.68 0.354 66     7.97     9.39
 low    6.91 0.354 66     6.20     7.62

nitro = 200:
 zone emmean    SE df lower.CL upper.CL
 high  10.22 0.354 66     9.52    10.93
 low    8.06 0.354 66     7.35     8.77

Confidence level used: 0.95

Pairwise contrasts

class(bach_emm1)

[1] "emmGrid"
attr(,"package")
[1] "emmeans"

pairs(bach_emm1)

nitro = 0:
 contrast   estimate    SE df t.ratio p.value
 high - low     1.88 0.501 66   3.753  0.0004

nitro = 100:
 contrast   estimate    SE df t.ratio p.value
 high - low     1.77 0.501 66   3.531  0.0008

nitro = 200:
 contrast   estimate    SE df t.ratio p.value
 high - low     2.16 0.501 66   4.312 <0.0001

Pairwise Contrasts

The difference between | and *

# pairs(emmeans(m2, ~ zone | nitro, at = list(nitro = c("0", "100", "200"))))
pairs(emmeans(m2, ~ zone*nitro, at = list(nitro = c("0", "100", "200"))))

 contrast                      estimate    SE df t.ratio p.value
 high nitro0 - low nitro0         1.882 0.501 66   3.753  0.0048
 high nitro0 - high nitro100     -3.335 0.501 66  -6.652 <0.0001
 high nitro0 - low nitro100      -1.565 0.501 66  -3.121  0.0307
 high nitro0 - high nitro200     -4.876 0.501 66  -9.726 <0.0001
 high nitro0 - low nitro200      -2.714 0.501 66  -5.414 <0.0001
 low nitro0 - high nitro100      -5.216 0.501 66 -10.405 <0.0001
 low nitro0 - low nitro100       -3.446 0.501 66  -6.875 <0.0001
 low nitro0 - high nitro200      -6.757 0.501 66 -13.479 <0.0001
 low nitro0 - low nitro200       -4.596 0.501 66  -9.167 <0.0001
 high nitro100 - low nitro100     1.770 0.501 66   3.531  0.0095
 high nitro100 - high nitro200   -1.541 0.501 66  -3.074  0.0348
 high nitro100 - low nitro200     0.621 0.501 66   1.238  0.8164
 low nitro100 - high nitro200    -3.311 0.501 66  -6.605 <0.0001
 low nitro100 - low nitro200     -1.149 0.501 66  -2.292  0.2119
 high nitro200 - low nitro200     2.162 0.501 66   4.312  0.0008

P value adjustment: tukey method for comparing a family of 6 estimates

Consecutive Contrasts

We can look at these same estimates in another way

(bach_emm2 <- emmeans(m2, ~ nitro | zone , at = list(zone = "low")))

zone = low:
 nitro emmean    SE df lower.CL upper.CL
 0       3.47 0.354 66     2.76     4.17
 80      6.27 0.354 66     5.57     6.98
 100     6.91 0.354 66     6.20     7.62
 120     7.37 0.354 66     6.66     8.08
 140     6.93 0.354 66     6.23     7.64
 160     7.81 0.354 66     7.10     8.52
 180     7.13 0.354 66     6.42     7.84
 200     8.06 0.354 66     7.35     8.77
 220     7.92 0.354 66     7.21     8.62
 240     8.28 0.354 66     7.57     8.99
 260     8.06 0.354 66     7.35     8.77

Confidence level used: 0.95

Consecutive Contrasts

We can conduct pairwise contrasts of an ordered list

contrast(bach_emm2, "consec")

zone = low:
 contrast            estimate    SE df t.ratio p.value
 nitro80 - nitro0       2.809 0.501 66   5.602 <0.0001
 nitro100 - nitro80     0.638 0.501 66   1.272  0.8445
 nitro120 - nitro100    0.457 0.501 66   0.911  0.9734
 nitro140 - nitro120   -0.435 0.501 66  -0.867  0.9805
 nitro160 - nitro140    0.877 0.501 66   1.749  0.5192
 nitro180 - nitro160   -0.683 0.501 66  -1.362  0.7923
 nitro200 - nitro180    0.933 0.501 66   1.861  0.4393
 nitro220 - nitro200   -0.145 0.501 66  -0.289  1.0000
 nitro240 - nitro220    0.362 0.501 66   0.722  0.9942
 nitro260 - nitro240   -0.219 0.501 66  -0.436  0.9999

P value adjustment: mvt method for 10 tests

Stratification

What if there are 3 (or more) factors?
Three ways to look at all three at once:

emmeans(m0, ~ A|B*C)
emmeans(m0, ~ A*B|C)
emmeans(m0, ~ A|B|C)
emmeans(m0, ~ A*B*C)

There are decisions to make!
In all but the last example, the order of the factors matter.

Contrast to a Control

Fertilizer study: 7 levels (None, Spring or Fall applied at 300, 600 and 1200 lbs/acre of sulfur), completely randomized design
Dependent variable: percent scab infection on potato skins

data(cochran.crd, package= "agridat")
cochran.crd$trt <- factor(cochran.crd$trt, levels = c("O", "F3", "F6", "F12", "S3", "S6", "S12"))
m3 <- lm(inf ~ trt, data = cochran.crd)
(inf_emm1 <-emmeans(m3, "trt"))

 trt emmean   SE df lower.CL upper.CL
 O    22.62 2.37 25    17.74     27.5
 F3    9.50 3.35 25     2.60     16.4
 F6   15.50 3.35 25     8.60     22.4
 F12   5.75 3.35 25    -1.15     12.7
 S3   16.75 3.35 25     9.85     23.7
 S6   18.25 3.35 25    11.35     25.2
 S12  14.25 3.35 25     7.35     21.2

Confidence level used: 0.95

Contrast to a Control

contrast(inf_emm1, "trt.vs.ctrl", ref = "O")

 contrast estimate  SE df t.ratio p.value
 F3 - O     -13.12 4.1 25  -3.198  0.0190
 F6 - O      -7.12 4.1 25  -1.736  0.3475
 F12 - O    -16.88 4.1 25  -4.112  0.0020
 S3 - O      -5.88 4.1 25  -1.432  0.5203
 S6 - O      -4.38 4.1 25  -1.066  0.7405
 S12 - O     -8.38 4.1 25  -2.041  0.2127

P value adjustment: dunnettx method for 6 tests

contrast(inf_emm1, "trt.vs.ctrl", ref = c("O", "S6")) # for multiple reference levels

 contrast        estimate   SE df t.ratio p.value
 F3 - avg(O,S6)    -10.94 3.93 25  -2.784  0.0421
 F6 - avg(O,S6)     -4.94 3.93 25  -1.257  0.5853
 F12 - avg(O,S6)   -14.69 3.93 25  -3.738  0.0044
 S3 - avg(O,S6)     -3.69 3.93 25  -0.938  0.7737
 S12 - avg(O,S6)    -6.19 3.93 25  -1.575  0.3967

P value adjustment: dunnettx method for 5 tests

Contrast to the Mean

Useful when there is no natural ordering

contrast(inf_emm1, "eff")

 contrast   estimate   SE df t.ratio p.value
 O effect      7.964 2.35 25   3.396  0.0160
 F3 effect    -5.161 3.08 25  -1.673  0.2490
 F6 effect     0.839 3.08 25   0.272  0.8951
 F12 effect   -8.911 3.08 25  -2.889  0.0275
 S3 effect     2.089 3.08 25   0.677  0.7060
 S6 effect     3.589 3.08 25   1.164  0.4470
 S12 effect   -0.411 3.08 25  -0.133  0.8951

P value adjustment: fdr method for 7 tests

Contrasts to a Constant

Example: are the treatments able to keep the infection rate below 10 percent?

summary(inf_emm1, null = 10, infer = TRUE, side = "<")

 trt emmean   SE df lower.CL upper.CL null t.ratio p.value
 O    22.62 2.37 25     -Inf     26.7   10   5.328  1.0000
 F3    9.50 3.35 25     -Inf     15.2   10  -0.149  0.4413
 F6   15.50 3.35 25     -Inf     21.2   10   1.641  0.9434
 F12   5.75 3.35 25     -Inf     11.5   10  -1.268  0.1082
 S3   16.75 3.35 25     -Inf     22.5   10   2.014  0.9726
 S6   18.25 3.35 25     -Inf     24.0   10   2.462  0.9895
 S12  14.25 3.35 25     -Inf     20.0   10   1.268  0.8918

Confidence level used: 0.95 
P values are left-tailed

#summary(inf_emm1, null = 10, infer = TRUE) # two-sided test (above or below 10)

Custom Contrasts

When there are special contrasts to do not captured by standard contrast options
Use coefficients for each level of the independent variable, they must sum to zero.
Use fractions that sum to |1| to calculate the average differences

levels(cochran.crd$Trt)

NULL

clist = list("Spring.v.Fall" = c(0, -1/3, -1/3, -1/3, 1/3, 1/3, 1/3),
              "F3.v.S3" = c(0, 1, 0, 0, -1, 0, 0),
             "F12.v.Spr" = c(0, 0, 0, 1, -1/3, -1/3, -1/3))
contrast(inf_emm1, clist)

 contrast      estimate   SE df t.ratio p.value
 Spring.v.Fall     6.17 2.74 25   2.254  0.0332
 F3.v.S3          -7.25 4.74 25  -1.530  0.1386
 F12.v.Spr       -10.67 3.87 25  -2.757  0.0107

Back to the White Pine Epicotyl Study

4x7 design
No natural ordering among the treatments

What contrasts might be beneficial for this study?

Compact Letter Display

Most often, the least powerful way to evaluate treatment difference
Very susceptible to misinterpretation

library(multcomp)
cld(bach_emm2, Letters = LETTERS)

zone = low:
 nitro emmean    SE df lower.CL upper.CL .group
 0       3.47 0.354 66     2.76     4.17  A    
 80      6.27 0.354 66     5.57     6.98   B   
 100     6.91 0.354 66     6.20     7.62   BC  
 140     6.93 0.354 66     6.23     7.64   BC  
 180     7.13 0.354 66     6.42     7.84   BC  
 120     7.37 0.354 66     6.66     8.08   BC  
 160     7.81 0.354 66     7.10     8.52   BC  
 220     7.92 0.354 66     7.21     8.62   BC  
 260     8.06 0.354 66     7.35     8.77    C  
 200     8.06 0.354 66     7.35     8.77    C  
 240     8.28 0.354 66     7.57     8.99    C  

Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 11 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same.

Equivalence Testing

Are these the same?

A user sets a threshold, a quantitative difference between two treatment levels, where a difference equal to that vaue or less is scientifically not meaninful. This is a value judgement informed by domain knowledge.
A hypothesis test is conducted to evaluate if two things differ by more than the predetermined threshold.
The null hypothesis is that two means are equivalent and the alternative hypothesis is that they are not.
This test provides some answers to the long-desired and frequently unanswered question “are these things the same”?

Equivalence Testing

Pairs example:

test(pairs(wp_emm1), delta = 0.5, side = 2)

 contrast  estimate   SE df t.ratio p.value
 M17 - M19    0.895 0.12 81   3.298  1.0000
 M17 - M22    0.234 0.12 81  -2.221  0.0844
 M17 - M58    0.496 0.12 81  -0.033  0.9818
 M19 - M22   -0.660 0.12 81   1.340  1.0000
 M19 - M58   -0.399 0.12 81  -0.848  0.7370
 M22 - M58    0.262 0.12 81  -1.991  0.1406

Results are averaged over the levels of: female 
Degrees-of-freedom method: kenward-roger 
P value adjustment: sidak method for 6 tests 
Statistics are tests of equivalence with a threshold of 0.5 
P values are left-tailed

High p-values favor differences that are greater than 0.5

Equivalence Testing

Compare to mean example

The means are compared to the overall mean and differences less than 0.5 are tested.
High p-values favor observations that are not equivalent to the overall mean (within 0.5 units of it).

test(contrast(wp_emm3, "eff"), delta = -0.5, side = "<") |> arrange(desc(estimate))

 contrast        estimate   SE df t.ratio p.value
 F197 M17 effect   1.3245 0.22 81   3.751  0.9998
 F193 M22 effect   0.8370 0.22 81   1.533  0.9701
 F203 M17 effect   0.7545 0.22 81   1.158  0.9421
 F208 M17 effect   0.5795 0.22 81   0.362  0.7176
 F203 M19 effect   0.5770 0.22 81   0.350  0.7176
 F193 M17 effect   0.5695 0.22 81   0.316  0.7176
 F201 M22 effect   0.3545 0.22 81  -0.662  0.3244
 F208 M22 effect   0.2470 0.22 81  -1.151  0.1687
 F193 M58 effect   0.2320 0.22 81  -1.219  0.1584
 F195 M17 effect   0.2245 0.22 81  -1.254  0.1574
 F195 M58 effect   0.1720 0.22 81  -1.492  0.1085
 F208 M58 effect   0.1195 0.22 81  -1.731  0.0718
 F197 M22 effect   0.1170 0.22 81  -1.743  0.0718
 F197 M58 effect   0.0870 0.22 81  -1.879  0.0596
 F204 M22 effect   0.0145 0.22 81  -2.209  0.0300
 F203 M22 effect  -0.1680 0.22 81  -3.039  0.0034
 F204 M17 effect  -0.1930 0.22 81  -3.153  0.0026
 F195 M22 effect  -0.1980 0.22 81  -3.176  0.0026
 F193 M19 effect  -0.2755 0.22 81  -3.528  0.0010
 F201 M58 effect  -0.2955 0.22 81  -3.619  0.0008
 F201 M17 effect  -0.4155 0.22 81  -4.165  0.0001
 F204 M58 effect  -0.4530 0.22 81  -4.336 <0.0001
 F203 M58 effect  -0.4905 0.22 81  -4.506 <0.0001
 F201 M19 effect  -0.6430 0.22 81  -5.200 <0.0001
 F195 M19 effect  -0.6605 0.22 81  -5.280 <0.0001
 F208 M19 effect  -0.7030 0.22 81  -5.473 <0.0001
 F197 M19 effect  -0.8005 0.22 81  -5.917 <0.0001
 F204 M19 effect  -0.9130 0.22 81  -6.428 <0.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: fdr method for 28 tests 
Statistics are tests of nonsuperiority with a threshold of 0.5 
P values are left-tailed

When a Covariate Is Present

Slopes

Corn yield study: 10 varieties (“gen”), 5 years, a continuous variable “chu” (corn heat units)

data(theobald.covariate, package = "agridat")
m4 <- lmer(yield ~  gen + chu:gen + (1|gen:year), data = theobald.covariate)

l1 <- emtrends(m4, var = "chu")
summary(l1)

 gen  chu chu.trend   SE  df
 G01 2.55      5.65 1.46 225
 G02 2.55      3.92 1.46 225
 G03 2.55      3.73 1.46 225
 G04 2.55      3.38 1.46 225
 G05 2.55      3.64 1.76 231
 G06 2.55      4.83 1.46 225
 G07 2.55      4.17 1.58 222
 G08 2.55      9.42 1.82 220
 G09 2.55      3.22 1.57 224
 G10 2.55      3.46 1.57 224

Degrees-of-freedom method: kenward-roger

mean(theobald.covariate$chu)

[1] 2.552773

When a Covariate Is Present

Main effects

Calculated at average value for the covariate

emmeans(m4, ~ gen)

 gen emmean    SE   df lower.CL upper.CL
 G01   6.48 0.270 32.7     5.93     7.03
 G02   6.62 0.270 32.7     6.07     7.17
 G03   5.70 0.270 32.7     5.15     6.25
 G04   6.79 0.270 32.7     6.24     7.34
 G05   6.30 0.305 34.1     5.68     6.92
 G06   6.26 0.270 32.7     5.71     6.81
 G07   6.62 0.303 33.1     6.01     7.24
 G08   6.97 0.347 31.9     6.26     7.68
 G09   6.08 0.300 31.9     5.46     6.69
 G10   6.70 0.300 31.9     6.09     7.31

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

When a Covariate Is Present

Main effects

If desired, set a range of values for estimation:

emmeans(m4, ~ gen*chu, 
        at = list(chu = c(2.3, 2.4, 2.5, 2.6, 2.7, 2.8), gen = "G08"))

 gen chu emmean    SE    df lower.CL upper.CL
 G08 2.3   4.59 0.582 152.1     3.44     5.74
 G08 2.4   5.53 0.449  78.5     4.64     6.42
 G08 2.5   6.47 0.362  37.3     5.74     7.21
 G08 2.6   7.41 0.355  35.1     6.69     8.14
 G08 2.7   8.36 0.433  71.4     7.49     9.22
 G08 2.8   9.30 0.561 144.7     8.19    10.41

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

Plotting & Saving to File

Convert to a data frame or tibble first
Use ggplot or use emmip()

bach_emm3_df <- emmeans(m2, ~ nitro*zone) |> as.data.frame() 
head(bach_emm3_df, n = 4)

 nitro zone  emmean        SE df lower.CL upper.CL
 0     high 5.34725 0.3544908 66 4.639486 6.055014
 80    high 7.98450 0.3544908 66 7.276736 8.692264
 100   high 8.68200 0.3544908 66 7.974236 9.389764
 120   high 8.37225 0.3544908 66 7.664486 9.080014

Confidence level used: 0.95

write.csv(bach_emm3_df, "bachmaier_nitrogen_estimates.csv", row.names = FALSE)

Plotting & Saving to File

ggplot(bach_emm3_df, aes(x = nitro, y = emmean, group = zone, color = zone)) +
  geom_line(linewidth = 0.7) + 
  geom_errorbar(aes(ymin = emmean-SE, ymax = emmean+SE), width = 0.15, linewidth = 1) +
  ylab("yield") + xlab("nitrogen applied") + 
  scale_color_manual(values = c("brown", "cyan3")) + theme_bw(base_size = 18)

Final Thoughts

The estimated marginal means are the mathematical products of the coefficients determined during model fitting; they are a summary term for each level of a fixed effect.
Think carefully about contrasts to conduct. What questions do you want answered?
Avoid using CLD if you can because it substantially reduces statistical power.
Consider other options to CLD: compare to a reference, compare to the mean, compare to a value, conduct equivalence tests.
When an interaction is present, explore that first before looking at main effects.
Explore ‘emmeans’ documentation (help files, vignettes) to see the full suite of functionality.

Additional Resources

Emmeans package documentation: comprehensive guide
Use ?summary.emmGrid for extensive information on emmeans objects including p-value adjustments
Statistical Programs’ philosophy on analysis
Data Analysis Using Regression and Multilevel/Hierarchical Models (2007) by Andrew Gelman and Jennifer Hill. This is a comprehensive and readable explanation of linear mixed models (available in the UI library).

The scientific method is the most rigorous path to knowledge, but it’s also messy and tough. ~~Science~~ [Statistics] deserves respect exactly because it is difficult — not because it gets everything correct on the first try. The uncertainty inherent in science doesn’t mean that we can’t use it to make important policies or decisions. It just means that we should remain cautious and adopt a mindset that’s open to changing course if new data arises. We should make the best decisions we can with the current evidence and take care not to lose sight of its strength and degree of certainty. It’s no accident that every good paper includes the phrase “more study is needed” — there is always more to learn.

– Christie Aschwanden/538, Science Isn’t Broken