This tutorial will cover Analysis of Variance using mixed model methodology in SAS. Analysis of Variance (ANOVA) is a class of linear models where all the factors entering the model (on the “righthand side”) are discrete and categorial. These factors typically represent levels of an applied treatment, replication, or other qualitative information such as location, year, or trial number.

The “mixed model” terminology refers to how the factors are interpreted and estimated as either “Fixed” or “Random” effects. A fixed model effect is a factor that we are interested in and is repeatable. For example, levels of a treatment, say a dose of drug or concentration of herbicide, might picked because they are of interest and, when applied at any time or location, will be considered to be the same. A random model effect is one that is not constant or is considered arbitrary, such as the coding for replication, or choice of locations or subjects. There are no steadfast rules about what must be fixed or random, and the assumptions on which to use will often depend on the research objectives and the desired domain of inference. As an example, we typically consider treatments to be fixed effects. This is because we want to make inferences about those specific treatments and they were not chosen to be a random sample of all possible treatments. In contrast, replications are usually considered random effects. That is, we have no real interest in making specific inferences about “Rep 1” vs “Rep 2”. The replications were chosen only to provide better representation of the fixed effect treatments. Factors such as year or location, however, can often be more nebulous. Locations, for example, may have been chosen because we are truly interested in those specific locations (fixed effect) or alternatively, they may have been chosen only as representative of all possible locations (random effect).

What difference does it make? In a mixed model, fixed effects will be estimated as means, while random effects will be estimated as variances or “variance components”. Hence, as model output, we will often get means and mean inferences for fixed treatment effects, and variance estimates for random effects such as replication.

The application of mixed model methodology is relatively recent in statistical history. Hence, many software packages, including SAS and R, still have older legacy routines that fit only fixed effects, i.e. Proc ANOVA and Proc GLM in SAS. It is recommended, however, that researchers use mixed models when feasible as they more accurately represent the models under consideration, and they provide more flexibility in estimation. It should be also noted that many publication outlets, granting agencies, and reviewers will now require the use of mixed modeling approaches for statistical analysis. For more detailed information on mixed models in SAS, the readers are referred to Claassen et al. (2018).

The data used in the examples below are a hypothetical variety and fertilizer trial measuring “yield” over five sampling times. Specifically, there are two varieties (Var1 and Var2) evaluated under 3 fertilizer levels (1, 2, 3) in a 2x3 factorial treatment design. Each combination of variety and fertilizer is replicated 4 times. The response to analyze is yield. There is no explicit indication of the experimental design as this data could fit many scenarios which will be demonstrated below. A CSV file for this data can be found here and example code to read in and print a section of the data is below. For more information on reading data into SAS, please see the tutorial on the SAS Data Step.

```
proc import out= work.yield
datafile= ".\data\Yield.csv"
dbms=csv replace;
run;
proc print data=Yield(obs=12);
run;
```

Obs | Sample_time | Variety | Fertilizer | Rep | pH | Yield |
---|---|---|---|---|---|---|

1 | 1 | VAR1 | 1 | 1 | 7.07 | 0.604220697 |

2 | 1 | VAR1 | 1 | 2 | 7.06 | 0.5946271 |

3 | 1 | VAR1 | 1 | 3 | 7.08 | 3.145058879 |

4 | 1 | VAR1 | 1 | 4 | 7.09 | 3.090934575 |

5 | 1 | VAR1 | 2 | 1 | 7.13 | 2.415133188 |

6 | 1 | VAR1 | 2 | 2 | 7.12 | 2.615854276 |

7 | 1 | VAR1 | 2 | 3 | 7.15 | 4.009176701 |

8 | 1 | VAR1 | 2 | 4 | 7.14 | 3.460704548 |

9 | 1 | VAR1 | 3 | 1 | 7.18 | 3.276358733 |

10 | 1 | VAR1 | 3 | 2 | 7.18 | 3.131134036 |

11 | 1 | VAR1 | 3 | 3 | 7.14 | 4.145610152 |

12 | 1 | VAR1 | 3 | 4 | 7.15 | 5.157147204 |

There are multiple procedures in SAS that can estimate mixed models.
This tutorial will focus on the most common procedure, **Proc
Mixed**. This procedure makes a strong assumption of normally
distributed continuous response data. If you are dealing with
non-normal, discrete, or skewed data, you may want to consider an
alternative procedure for generalized mixed models, such as Proc Glimmix.

Proc Mixed has three statement types that must be specified for
ANOVA. The first is the **CLASS** statement which is used
to define variables which represent factor groupings or classifications.
Examples would be treatment and replication ID numbers or letters. Since
the values of these variables represent the levels of an effect, SAS
will accept either numeric or alphabetic data in the CLASS statement.
Note: Data do not have to be sorted according to the CLASS
variables.

The second required statement is **MODEL**. The model
statement specifies the response variable and and how the independent
fixed effect factors of the **CLASS** statement enter the
model. The general form is:

*MODEL Response = Fixed factors;*,

where Fixed factors is a list of all or some of the variables given
in the **CLASS** statement. Proc Mixed also provides the
ability to easily state interaction as well as nested terms in the
**MODEL** statement. Interaction effects represent
combinations of factors and can be specified by using the * symbol
between two or more terms, for example: A*B. This notation can become
cumbersome if many interactions are present so a shorthand version
exists. The vertical bar between effects signifies all possible
interactions. An example would be: A|B|C which produces the effects A,
B, A*B, C, A*C, B*C, and A*B*C, in that order. Nesting occurs when one
effect exists completely within another effect either in time or space.
These effects are denoted with parentheses such as B(A) which is read as
‘Factor B within factor A’. The *, |, and () can be used alone or in
combination to specify the exact model required. In a Proc Mixed call,
only *one* **MODEL** statement can be given.

The last required statement for a mixed model is the
**Random** statement. This statement tells SAS what factors
to treat as random effects. These factors *must* be listed in the
**CLASS** statement along with the fixed effects. As with
fixed effects, random factors can also be specified as interaction or
nested terms in the same manner as the **MODEL** statement.
Unlike the **MODEL** statement, multiple
**RANDOM** statements can be used within one Proc Mixed
call.

There are two methods of specifying **RANDOM**
statements: Direct and Indirect. Each method can achieve the same goal,
but the method of estimation for the random effects may differ between
the two. In some cases where SAS is having trouble estimating the model,
the indirect method may help. Examples of each are generally shown
as:

**Direct Form**

*Random replication;*

**Indirect Form**

*Random intercept/subject=replication;*

In the indirect form, **intercept** and
**subject** are key words that must be used. Here,
**subject** specifies the basic experimental or
observational unit that the variance components will be computed from.
If multiple random effects are necessary and the indirect form is used,
then a **RANDOM** statement must be specified for each
random effect. Alternatively, in the direct form, multiple random
effects can be listed in a single **RANDOM** statement.

Both the direct and indirect forms can also specify options. Most
commonly, these options relate to how the variance-covariance model
matrix should be set up using a **Type=** option. By
default, Proc Mixed assumes a “variance component” structure where a
single variance is estimated for each random effect. This can also be
specified with a **Type=VC** option. Other available types
allow for the estimation of additional covariances between observations
or subjects. The choice of a covariance structure is dependent on the
experimental set up and SAS provides for many different types. In using
them, however, care should be taken to use the simplest structure
possible while maintaining good representation and estimation of the
data. Specifying a variance-covariance structure that is too complex
will inevitably lead to estimation problems.

**REPEATED**: One of the flexabilities of mixed models
is their ability to incorporate correlation structure. Repeated measures
designs are an example of this and are accommodated into Proc Mixed
through the **REPEATED** statement. The syntax and options
are similar to the **RANDOM** statement above. The random
effect variable in this case may be optional in some cases, however,
specifying the effect can avoid problems when missing values are
present. As with the **RANDOM** statement, the
**SUBJECT** and **TYPE** options may be used
to specify the basic unit at which the repeated effect occurs,
e.g. individual plots, animals, etc., as well as the correlation
structure associated with them. The correlation structures available are
numerous and cover both time as well as spatial effects. Typical
examples of correlational structures for time effects would be
*UN* for unstructured, *CS* for compound symmetry, and
*AR(1)* for autoregressive correlations. Spatial correlation can
also be specified for most common semivariogram models such as
*SP(POW)*, *SP(GAUS)*, and *SP(EXP)* for power,
Gaussian, and exponential models, respectively. As before, care should
be taken when specifying correlational structures with large numbers of
components to avoid computational problems in estimation.

**LSMEANS**: Although Proc Mixed estimates models
utilizing the technique of Maximum Likelihood, SAS has retained the
nomenclature LSMEANS or Least Squares Means for estimating means for
fixed effects. SAS provides for comparison of LSMEANS by the
**DIFF** or **PDIFF** options which produce a
table of all possible pair-wise comparisons and their associated
t-tests. It is important to remember that the probabilities associated
with these comparisons are applicable to a limited number of preplanned
tests only. If a multiple comparison adjustment to these tests is
needed, the **Adjust=** option can be used with any of
several adjustment types such as *Tukeys*, *Bonferroni*,
etc. The default is equivalent to Fishers LSD. In addition to the mean
estimates, standard errors will also be produced for each mean.
**LSMEANS** can also be used for interaction and nested
effects, when applicable. Note, however, that **LSMEANS**
can only be estimated for fixed effects listed in the
**CLASS** statement.

**CONTRAST**: Single degree of freedom contrasts which
test specific hypotheses of interest are an alternative to pair-wise
comparisons. SAS provides for these with the **CONTRAST**
statement. A single df contrast has the general form of:

CONTRAST ‘any label’ Factor name {coefficients};

The label portion allows you to give a name to the hypothesis being tested so it can be identified in the output. The Factor name identifies what fixed effect the contrast is working on and the coefficients specify the contrast itself. An example might be:

CONTRAST ’ Trts vs Ctrl’ A 2 -1 -1;

Here the average of the **LSMEANS** for the last 2
levels of fixed effect A are contrasted to the first level. Some
contrasts test more than one hypothesis at a time (multiple DF
contrasts) and these are separated by a comma in the
**CONTRAST** statement. For example:

CONTRAST ‘Testit’ A 2 -1 -1 , A 0 1 -1;

would be a 2 degree of freedom contrast that jointly tests the average of levels 2 and 3 to level 1 of factor A, simultaneously with a comparison of level 2 to level 3 of factor A.

**ODS**: ODS is an abbreviation for Output Delivery
System. In SAS, the ODS is very generalized and has many functions. In
this case, **ODS** will allow us to output and save to a
SAS data set many of the internal statistical values involved with
ANOVA. Most commonly, this will be model estimates, and specifically for
ANOVA, **LSMEANS**. This gives the user the ability to
manipulate, plot, or output from SAS the estimated means. The general
format for this would be:

ODS output LSMeans=<name>;

Here, “LSMeans” is a keyword indicating the statistic we want to
save. Note the spelling and capitalization as it must be written exactly
in this form. The *<name>* portion is what you want to call
the data set containing the means. This data set will have predefined
variables indicating the fixed effect, it’s level, the mean value
(Estimate), its standard error (StdErr), and the df, t-value and p-value
for testing whether the mean is equal to zero. If confidence limits (CL)
were requested with the LSMeans, then they will be output as well as
“Lower” and “Upper”. When multiple fixed effects are listed in the
LSMEANS statement, then all the means for all listed effects will be
present in the saved data set. Note: To use the **ODS**
output statement, the requested statistics, such as LSmeans, must be
used as part of the Proc Mixed call.

In what follows, several experimental designs and options for ANOVA will be shown using the hypothetical data described above. For further details on these and other potential designs and models, the reader is referred to Claassen et al. (2018).

In a CRD design, all treatments and their replicates are randomized
without regard to grouping or structure. In this most basic design, Proc
Mixed will *not* require a **RANDOM** statement, as
the residual error will be the only random effect, and this is assumed
by default in SAS.

```
PROC MIXED Data=Yield;
WHERE Sample_Time = 1;
CLASS Variety Fertilizer;
MODEL YIELD = Variety Fertilizer Variety*Fertilizer ;
LSMEANS Variety Fertilizer Variety*Fertilizer /cl;
run;
```

Model Information | |
---|---|

Data Set | WORK.YIELD |

Dependent Variable | Yield |

Covariance Structure | Diagonal |

Estimation Method | REML |

Residual Variance Method | Profile |

Fixed Effects SE Method | Model-Based |

Degrees of Freedom Method | Residual |

Class Level Information | ||
---|---|---|

Class | Levels | Values |

Variety | 2 | VAR1 VAR2 |

Fertilizer | 3 | 1 2 3 |

Dimensions | |
---|---|

Covariance Parameters | 1 |

Columns in X | 12 |

Columns in Z | 0 |

Subjects | 1 |

Max Obs per Subject | 24 |

Number of Observations | |
---|---|

Number of Observations Read | 24 |

Number of Observations Used | 24 |

Number of Observations Not Used | 0 |

Covariance Parameter Estimates | |
---|---|

Cov Parm | Estimate |

Residual | 1.1534 |

Fit Statistics | |
---|---|

-2 Res Log Likelihood | 62.0 |

AIC (Smaller is Better) | 64.0 |

AICC (Smaller is Better) | 64.2 |

BIC (Smaller is Better) | 64.9 |

Type 3 Tests of Fixed Effects | ||||
---|---|---|---|---|

Effect | Num DF | Den DF | F Value | Pr > F |

Variety | 1 | 18 | 2.16 | 0.1591 |

Fertilizer | 2 | 18 | 9.09 | 0.0019 |

Variety*Fertilizer | 2 | 18 | 0.31 | 0.7408 |

Least Squares Means | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Effect | Variety | Fertilizer | Estimate |
Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper |

Variety | VAR1 | 2.9705 | 0.3100 | 18 | 9.58 | <.0001 | 0.05 | 2.3192 | 3.6218 | |

Variety | VAR2 | 2.3265 | 0.3100 | 18 | 7.50 | <.0001 | 0.05 | 1.6751 | 2.9778 | |

Fertilizer | 1 | 1.3606 | 0.3797 | 18 | 3.58 | 0.0021 | 0.05 | 0.5629 | 2.1584 | |

Fertilizer | 2 | 3.0353 | 0.3797 | 18 | 7.99 | <.0001 | 0.05 | 2.2375 | 3.8330 | |

Fertilizer | 3 | 3.5495 | 0.3797 | 18 | 9.35 | <.0001 | 0.05 | 2.7518 | 4.3473 | |

Variety*Fertilizer | VAR1 | 1 | 1.8587 | 0.5370 | 18 | 3.46 | 0.0028 | 0.05 | 0.7305 | 2.9869 |

Variety*Fertilizer | VAR1 | 2 | 3.1252 | 0.5370 | 18 | 5.82 | <.0001 | 0.05 | 1.9971 | 4.2534 |

Variety*Fertilizer | VAR1 | 3 | 3.9276 | 0.5370 | 18 | 7.31 | <.0001 | 0.05 | 2.7994 | 5.0557 |

Variety*Fertilizer | VAR2 | 1 | 0.8626 | 0.5370 | 18 | 1.61 | 0.1256 | 0.05 | -0.2656 | 1.9907 |

Variety*Fertilizer | VAR2 | 2 | 2.9453 | 0.5370 | 18 | 5.48 | <.0001 | 0.05 | 1.8171 | 4.0734 |

Variety*Fertilizer | VAR2 | 3 | 3.1715 | 0.5370 | 18 | 5.91 | <.0001 | 0.05 | 2.0434 | 4.2997 |

Because the data contains five sampling times (Sample_Time), and we
are not considering the repeated measures nature of the data yet, a
**WHERE** statement has been used and the analysis has been
limited to the first Sample_Time. The initial portion of the output
indicates the levels of the CLASS statement variables. Whenever Proc
Mixed is first run, users should carefully check this to make sure it is
correct. It is easy to overlook data entry errors when coding
classification variables. In this example, things appear correct with
Variety having two levels (VAR1 and VAR2) and Fertilizer having three
levels (1, 2, and 3). Users should also check the number of observations
used. Here, in the first sample time, there are 4 replications, 2
varieties, and 3 fertilizer levels, or 4x2x3 = 24 observations.

The section labeled “Covariance Parameter Estimates” gives the estimates for the random effects, which in this case is simply the residual variance. This statstic is analogous to the Mean Squared Error (MSE) found in traditional least squares anova. The next output table “Fit Statistics” which are useful for comparison of multiple model forms.

Last are the “Type 3 Tests of Fixed Effects” and the output from the
**LSMEANS** statement. In the Type 3 Tests section, it
would appear from the p-values that Fertilizer has some effect on yield
(small p-value), while Variety and the interaction do not (larger
p-values). For more information and cautions on using and interpreting
p-values and test statistics, see the blog post here. The means table shows
the estimated means, standard errors and associated 95% confidence
intervals. Note the tests and p-values here **only** relate
to whether these means are different from zero. They **do
not** relate to any comparisons of means because we did not use
the **DIFF** or **PDIFF** options in the
**LSMEANS** statement.

The randomized complete block design is undoubtedly the most common
design in randomized experiments. Unlike the CRD, it allows for
isolation of extraneous variation unrelated to treatment effects through
blocking. In the RCB design, a block is a full set of all treatments,
usually aggregated together either physically, spatially, or temporally.
Treatments are randomly distributed within each block. Replication then
occurs as multiple blocks, that is, multiple contiguous, complete sets
of treatments. Treatments *may* appear more than once within a
block as well, a process referred to as sub-replication, however, that
is beyond this discussion. For more information on the uses and
advantages of blocking, please see the discussion located here.

The following example re-envisions the yield data as a RCB design.
Here “Rep” acts as blocks and a **RANDOM** statement is
used to indicate it as a random effect. A new option of the Proc Mixed
call, *plots=(studentpanel)*, is used to produce residual
diagnostics. This example also uses the **PDIFF** option to
compare means, as well as **CONTRAST** statement to test
for linear and quadratic trends in fertilizer rates. Lastly, an
**ODS** statement is used to output the estimated means and
plot the interaction using Proc Sgplot.

```
PROC MIXED plots=(studentpanel) DATA=YIELD;
WHERE SAMPLE_TIME=1;
CLASS Variety Fertilizer Rep Sample_Time;
MODEL YIELD = Variety Fertilizer Variety*Fertilizer ;
RANDOM Rep;
LSMEANS Variety Fertilizer Variety*Fertilizer /cl pdiff;
CONTRAST 'Linear trend' Fertilizer -1 0 1;
CONTRAST 'Quadratic trend' Fertilizer 1 -2 1;
ods output LSMeans=means;
run;
PROC SGPLOT DATA=MEANS;
WHERE EFFECT='Variety*Fertilizer';
SERIES X=FERTILIZER Y=ESTIMATE/GROUP=VARIETY;
highlow x=FERTILIZER high=upper low=lower/GROUP=VARIETY type=line highcap=serif lowcap=serif;
RUN;
```

Model Information | |
---|---|

Data Set | WORK.YIELD |

Dependent Variable | Yield |

Covariance Structure | Variance Components |

Estimation Method | REML |

Residual Variance Method | Profile |

Fixed Effects SE Method | Model-Based |

Degrees of Freedom Method | Containment |

Class Level Information | ||
---|---|---|

Class | Levels | Values |

Variety | 2 | VAR1 VAR2 |

Fertilizer | 3 | 1 2 3 |

Rep | 4 | 1 2 3 4 |

Sample_time | 1 | 1 |

Dimensions | |
---|---|

Covariance Parameters | 2 |

Columns in X | 12 |

Columns in Z | 4 |

Subjects | 1 |

Max Obs per Subject | 24 |

Number of Observations | |
---|---|

Number of Observations Read | 24 |

Number of Observations Used | 24 |

Number of Observations Not Used | 0 |

Iteration History | |||
---|---|---|---|

Iteration | Evaluations | -2 Res Log Like | Criterion |

0 | 1 | 61.96850573 | |

1 | 1 | 55.99848538 | 0.00000000 |

Convergence criteria met. |

Covariance Parameter Estimates | |
---|---|

Cov Parm | Estimate |

Rep | 0.5457 |

Residual | 0.6077 |

Fit Statistics | |
---|---|

-2 Res Log Likelihood | 56.0 |

AIC (Smaller is Better) | 60.0 |

AICC (Smaller is Better) | 60.8 |

BIC (Smaller is Better) | 58.8 |

Type 3 Tests of Fixed Effects | ||||
---|---|---|---|---|

Effect | Num DF | Den DF | F Value | Pr > F |

Variety | 1 | 15 | 4.09 | 0.0612 |

Fertilizer | 2 | 15 | 17.24 | 0.0001 |

Variety*Fertilizer | 2 | 15 | 0.58 | 0.5725 |

Contrasts | ||||
---|---|---|---|---|

Label | Num DF | Den DF | F Value | Pr > F |

Linear trend | 1 | 15 | 31.53 | <.0001 |

Quadratic trend | 1 | 15 | 2.95 | 0.1062 |

Least Squares Means | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Effect | Variety | Fertilizer | Estimate |
Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper |

Variety | VAR1 | 2.9705 | 0.4325 | 15 | 6.87 | <.0001 | 0.05 | 2.0486 | 3.8924 | |

Variety | VAR2 | 2.3265 | 0.4325 | 15 | 5.38 | <.0001 | 0.05 | 1.4046 | 3.2483 | |

Fertilizer | 1 | 1.3606 | 0.4609 | 15 | 2.95 | 0.0099 | 0.05 | 0.3784 | 2.3429 | |

Fertilizer | 2 | 3.0353 | 0.4609 | 15 | 6.59 | <.0001 | 0.05 | 2.0530 | 4.0175 | |

Fertilizer | 3 | 3.5495 | 0.4609 | 15 | 7.70 | <.0001 | 0.05 | 2.5673 | 4.5318 | |

Variety*Fertilizer | VAR1 | 1 | 1.8587 | 0.5370 | 15 | 3.46 | 0.0035 | 0.05 | 0.7142 | 3.0033 |

Variety*Fertilizer | VAR1 | 2 | 3.1252 | 0.5370 | 15 | 5.82 | <.0001 | 0.05 | 1.9807 | 4.2698 |

Variety*Fertilizer | VAR1 | 3 | 3.9276 | 0.5370 | 15 | 7.31 | <.0001 | 0.05 | 2.7830 | 5.0721 |

Variety*Fertilizer | VAR2 | 1 | 0.8626 | 0.5370 | 15 | 1.61 | 0.1290 | 0.05 | -0.2820 | 2.0071 |

Variety*Fertilizer | VAR2 | 2 | 2.9453 | 0.5370 | 15 | 5.48 | <.0001 | 0.05 | 1.8007 | 4.0898 |

Variety*Fertilizer | VAR2 | 3 | 3.1715 | 0.5370 | 15 | 5.91 | <.0001 | 0.05 | 2.0270 | 4.3161 |

Differences of Least Squares Means | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Effect | Variety | Fertilizer | _Variety | _Fertilizer | Estimate |
Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper |

Variety | VAR1 | VAR2 | 0.6440 | 0.3183 | 15 | 2.02 | 0.0612 | 0.05 | -0.03433 | 1.3224 | ||

Fertilizer | 1 | 2 | -1.6746 | 0.3898 | 15 | -4.30 | 0.0006 | 0.05 | -2.5054 | -0.8438 | ||

Fertilizer | 1 | 3 | -2.1889 | 0.3898 | 15 | -5.62 | <.0001 | 0.05 | -3.0197 | -1.3581 | ||

Fertilizer | 2 | 3 | -0.5143 | 0.3898 | 15 | -1.32 | 0.2068 | 0.05 | -1.3451 | 0.3165 | ||

Variety*Fertilizer | VAR1 | 1 | VAR1 | 2 | -1.2665 | 0.5512 | 15 | -2.30 | 0.0364 | 0.05 | -2.4415 | -0.09155 |

Variety*Fertilizer | VAR1 | 1 | VAR1 | 3 | -2.0689 | 0.5512 | 15 | -3.75 | 0.0019 | 0.05 | -3.2438 | -0.8939 |

Variety*Fertilizer | VAR1 | 1 | VAR2 | 1 | 0.9961 | 0.5512 | 15 | 1.81 | 0.0909 | 0.05 | -0.1788 | 2.1711 |

Variety*Fertilizer | VAR1 | 1 | VAR2 | 2 | -1.0866 | 0.5512 | 15 | -1.97 | 0.0674 | 0.05 | -2.2615 | 0.08838 |

Variety*Fertilizer | VAR1 | 1 | VAR2 | 3 | -1.3128 | 0.5512 | 15 | -2.38 | 0.0309 | 0.05 | -2.4878 | -0.1378 |

Variety*Fertilizer | VAR1 | 2 | VAR1 | 3 | -0.8023 | 0.5512 | 15 | -1.46 | 0.1661 | 0.05 | -1.9773 | 0.3726 |

Variety*Fertilizer | VAR1 | 2 | VAR2 | 1 | 2.2626 | 0.5512 | 15 | 4.10 | 0.0009 | 0.05 | 1.0877 | 3.4376 |

Variety*Fertilizer | VAR1 | 2 | VAR2 | 2 | 0.1799 | 0.5512 | 15 | 0.33 | 0.7486 | 0.05 | -0.9950 | 1.3549 |

Variety*Fertilizer | VAR1 | 2 | VAR2 | 3 | -0.04630 | 0.5512 | 15 | -0.08 | 0.9342 | 0.05 | -1.2213 | 1.1287 |

Variety*Fertilizer | VAR1 | 3 | VAR2 | 1 | 3.0650 | 0.5512 | 15 | 5.56 | <.0001 | 0.05 | 1.8900 | 4.2399 |

Variety*Fertilizer | VAR1 | 3 | VAR2 | 2 | 0.9823 | 0.5512 | 15 | 1.78 | 0.0950 | 0.05 | -0.1927 | 2.1572 |

Variety*Fertilizer | VAR1 | 3 | VAR2 | 3 | 0.7560 | 0.5512 | 15 | 1.37 | 0.1904 | 0.05 | -0.4189 | 1.9310 |

Variety*Fertilizer | VAR2 | 1 | VAR2 | 2 | -2.0827 | 0.5512 | 15 | -3.78 | 0.0018 | 0.05 | -3.2577 | -0.9077 |

Variety*Fertilizer | VAR2 | 1 | VAR2 | 3 | -2.3089 | 0.5512 | 15 | -4.19 | 0.0008 | 0.05 | -3.4839 | -1.1340 |

Variety*Fertilizer | VAR2 | 2 | VAR2 | 3 | -0.2262 | 0.5512 | 15 | -0.41 | 0.6873 | 0.05 | -1.4012 | 0.9487 |

In the RCB output, the initial information is the same or similar to
what was shown above. There is an additional variance component
estimated for Rep in the table for *Covariance Parameter
Estimates*. The Type 3 Tests of Fixed Effects has changed, with the
denominator degrees of freedom for the F-test now being 15, reduced by 3
due to the specification of Rep (3 df) as a random effect. This 15 df is
composed of all block or “Rep” interactions: Rep*Variety=3x1 = 3,
Rep*Fertilizer=2x3 = 6, and Rep*Variety*Fertilizer = 3x1x2 = 6 ==> 3
+ 6 + 6 = 15.

The **CONTRAST** statements evaluate linear and
quadratic trends across the Fertilizer rates. The coefficients for trend
analyses are specialized and called Orthogonal Polynomial Coefficients.
These vary depending on how many levels of an effect there are and can
be found for a various number of levels here. Note that such contrasts assume the levels of
the effect are evenly spaced. For uneven spacing of levels, the
coefficients must be computed individually (for more information see here). In this case, a linear trend is indicated,
but a quadratic trend is not.

The table “Differences of Least Squares Means” results from the
**PDIFF** option and is a table of all possible pair-wise
comparisons of means for all effects listed in the
**LSMEANS** statement. Columns indicate which levels are
under consideration and what the comparison is. For example, under
Fertilizer and _Fertilizer, we can see comparisons for 1 vs 2, 1 vs 3,
and 2 vs 3. The “Estimate” and “Standard Error” columns are for the
actual mean differences and the p-values relate to tests of mean
equivalence.

The first set of plots are generated from the Proc Mixed option
*plots=(studentpanel)*. This requests plots and diagnostics for
studentized residuals. The residual vs Predicted plot shows randomly
scattered points of reasonable magnitude (within [-3, +3]) with no
trending or patterning. The histogram is reasonably symmetric and bell
shaped matching a theoretical Normal distribution well, and the quantile
plot shows data points very close to the diagonal line representing a
Normal distribution. All these indicate the residuals match the
normaility assumption of the ANOVA well and are not problematic.

The last portion of the output are the plotted interaction means.
Because the ODS data set “*means*” contains all effects in the
**LSMEANS** statement, we must use the
**WHERE** statement to select for only the interaction
effect, Variety*Fertilizer. This plot shows an increasing trend for each
Variety and their associated confidence intervals. If an interaction was
indicated (not the case here), it would imply that the lines were not
parallel within the observed error. While these lines are not exactly
parallel, the confidence intervals are wide enough that a statistical
test implies no interaction is present.

The term “Split” in a design name refers to a restriction in
randomization. That is, an additional step of randomizing the
treatments. In the example here, we will retain the blocked design
structure, however, we now take two steps in randomizing the two
factors, Variety and Fertilizer, when assigning the place in each block.
For example we can first randomize varieties to the blocks, say
splitting the block into two halves, one for each Variety. Then we take
the second randomization step and randomize the placement of fertilizers
within each Variety. This process is then repeated for each block with
new randomizations. The first randomized factor, Variety, will be
referred to as the “Main Plot”, while the second randomized factor will
be the “Sub-plot”. We can induce this structure into the model with the
addition of an additional random effect in the **RANDOM**
statement that will be the interaction between block and the Main Plot
factor. This subtracts more df from the denominator df of the F-test and
gives different precision between Main Plots and Sub-plots. This
differential precision between Main and Sub-plots is often cited as a
rational for using split designs. In reality for field work, however,
these designs were implemented to accommodate logistical constraints in
applying treatments. For example, in the Variety data, it would not be
practical to change varieties for every plot, as required by the CRD. We
can, however, plant large swaths of each variety and then apply varying
Fertilizer levels over the top of that. The bottom line is to choose the
experimental design based on logistical concerns, not some academic
exercise in potential precision of testing.

The example below is shown without output as it is similar in nature
and interpretation to the RCB above. Note, however, that the
**RANDOM** statement now contains a term for Rep*Variety,
signifying the split-plot design. In addition, the
**LSMEANS** and **CONTRAST** statements will
inherently incorporate this design change into their computations. this
differs from previous SAS procedures such as Proc GLM where users had to
specify correct error terms for each hypothisis in Split designs.

```
PROC MIXED plots=(studentpanel) DATA=YIELD;
WHERE SAMPLE_TIME=1;
CLASS Variety Fertilizer Rep Sample_Time;
MODEL YIELD = Variety Fertilizer Variety*Fertilizer ;
RANDOM Rep Rep*Variety;
LSMEANS Variety Fertilizer Variety*Fertilizer /cl pdiff;
CONTRAST 'Linear trend' Fertilizer -1 0 1;
CONTRAST 'Quadratic trend' Fertilizer 1 -2 1;
ods output LSMeans=means;
run;
```

As with the Split-plot design, the Split-block impliments another level of randomization. This time both Variety and Fertilizer levels will be separately randomized within each block. The easiest way to conceptualize this is to think of each block as rows and columns. In the block we first randomize two “rows”, one for each variety. We then randomize three columns for three fertilizer levels across the variety rows. Thus, each intersection of row-column represents a combination of Variety and Fertilizer. Again, the rationale here should be logistics. With some treatment types, it is easier to apply them in this manner. For example, flood irrigation treatments are practically impossible to randomize from plot-to-plot. They can, however, be randomized by row or column. Additional treatments, such as herbicdes could then be applied across/perpedicular to the irrigation treatments.

The code below adds an additional term to the **RANDOM**
statement (Rep*Fertilizer), further dividing the denominator degrees of
freedom. Similar to the Split-plot analysis, Proc Mixed will inherently
account for this change in df in all relevant tests and estimations.

```
PROC MIXED plots=(studentpanel) DATA=YIELD;
WHERE SAMPLE_TIME=1;
CLASS Variety Fertilizer Rep Sample_Time;
MODEL YIELD = Variety Fertilizer Variety*Fertilizer ;
RANDOM Rep Rep*Variety Rep*Fertilizer;
LSMEANS Variety Fertilizer Variety*Fertilizer /cl pdiff;
CONTRAST 'Linear trend' Fertilizer -1 0 1;
CONTRAST 'Quadratic trend' Fertilizer 1 -2 1;
ods output LSMeans=means;
run;
```

Thus far, we have ignored the five sampling times in the data. In
order to incorporate sampling time into the model, however, we need to
consider its characteristics. In particular, time is a sequential
quantity. that is time 1 must come before time 2 and that before time 3,
etc. We cannot randomize time. Spatially arranged data can also exhibit
a similar type of correlation (blocking described above is one
rudimentary means of accounting for spatial correlation). Because of
this lack of randomization, when we measure a response on the same
subject over time, the observations through time can be correlated. One
of the base assumptions of linear modeling is that the residuals or
errors are independently distributed. Correlated observations, however,
violate this assumption. Fortunately, mixed modeling methods give us a
mechanism for dealing with this and can generate independent,
uncorrelated errors when properly used. This is done by modeling the
correlation(s) within the variance-covariance matrix. Proc Mixed
impliments this through the **REPEATED** statement where
the repeated factor, such as Sample_Time, is given, as well as the
correlation model or type to use. Common structure types are Compound
Symmetry,CS, where equal correlation is assumed between time points, or
Autoregressive Correlation, AR(1) where adjacent time points have the
highest correlation they become less correlated the further apart in
time they become. Many other correlation types exist as well and which
to utilize can be problem dependent.

The code below illustrates the repeated measures analysis. We are now
using the full data set with all times, so the **WHERE**
statement has been removed. The repeated factor is Sample_Time, which
must be listed in the **CLASS** statement. The subject
option tells SAS what base experimental unit is measured each time
(Rep*variety*Fertilizer) and the autoregressive AR(1) structure has been
specified. The resulting output can then be considered to be “adjusted”
for time correlation effects.

```
PROC MIXED plots=(studentpanel) DATA=YIELD;
CLASS Variety Fertilizer Rep Sample_Time;
MODEL YIELD = Variety Fertilizer Variety*Fertilizer ;
RANDOM Rep;
REPEATED Sample_Time / SUBJECT=Rep*Variety*Fertilizer TYPE=ar(1);
LSMEANS Variety Fertilizer Variety*Fertilizer /cl pdiff;
CONTRAST 'Linear trend' Fertilizer -1 0 1;
CONTRAST 'Quadratic trend' Fertilizer 1 -2 1;
ods output LSMeans=means;
run;
```

Claassen, E. A., R. D. Wolfinger, G. A. Milliken, and W. W. Stroup.
2018. *SAS for Mixed Models: Introduction and Basic
Applications.* United States: SAS Institute.