This tutorial is an introduction to nonlinear regression models in SAS. Some additional knowledge of SAS may be useful for understanding certain aspects of this material. For more information on these requirements, the reader is referred to the tutorials on the Data Step and Graphics and Plotting.
Nonlinear models are often associated with functions that curve or are not straight lines. This, however, is a not a precise definition. Linear or nonlinear in statistical models refers to how the model parameters enter the model function. Some models that graphically appear “curved” are, in fact, linear models. For example, a quadratic model is graphically a concave or convex shape, but the model is linear:
\[y_{i} = \beta_0 + \beta_1 \cdot x_{1} + \beta_2 \cdot x_{1}^2 + \epsilon_{i}\] In this model, the parameters \(\beta_0\), \(\beta_1\), and \(\beta_2\) are additive relative to one another making this a linear model. An example of a nonlinear model would be:
\[y_{i} = \beta_0\cdot x_{i}^{\beta_1} + \epsilon_{i}\]
where the parameters \(\beta_0\) and \(\beta_1\) are multiplicative relative to each other. In terms of estimation, this makes a difference. In linear models, solutions for parameter estimates can be derived algebraically, while no closed form solutions or equations can be derived for nonlinear models. Estimation in nonlinear models is obtained through an iterative “trial-and-error” process, the details of which can vary and are only covered briefly here. For more technical material on nonlinear estimation methods, see texts such as Nonlinear Regression Analysis and Its Applications.
Regardless of the methodology, the iterative estimation process
typically requires the user to specify starting values for each
parameter. These are basically guesses at what the parameters might be
to start the process. The user may have some clues from the data, or
they may be completely unknown to the user. Such details will depend on
the model used, knowledge of the process being modeled, and the data
itself. Examples for these aspects will be demonstrated
below.
The ability to estimate nonlinear models is, computationally, relatively recent. Before this ability was available, nonlinear model forms were often transformed or linearized such that they became linear with additive parameters. This made estimation through standard linear regression possible. An example would be the power model above:
\[y_{i} = \beta_0\cdot x_{i}^{\beta_1} + \epsilon_{i}\]
By taking the logarithm of the model, we get a linear form:
\[ln(y_{i}) = ln(\beta_0) + \beta_1\cdot x_{i}^{} + \phi_{i} \] \[ y_{i}^*= \beta_0^* + \beta_1\cdot x_{i} + \phi_{i}\] where \(y_{i}^*\) is the logarithm of the response, \(y_{i}\), and the intercept of this model, \(\beta_0^*\), is the logarithm of the original parameter \(\beta_0\). The exponential function could then be used to untransform the predicted values and intercept parameter. Note, however, that the error term \(\phi_{i}\) is not equivalent to the additive error \(\epsilon_{i}\). Thus, linearization provides only an approximation to the desired nonlinear form. For these reasons, it is now more desirable to use true nonlinear estimation techniques rather than linearization. In some cases, however, linearization may be useful for deriving the starting values necessary for the nonlinear estimation process.
There are an infinite number of nonlinear forms that can be selected for representing data. The selection process, however, should follow some basic guidelines. First and foremost, model selection, and the use of nonlinear modeling in general, should meet the research objectives. What is the purpose of modeling and what does the researcher hope to gain from it? A second and related factor is model interpretation. Are the parameters of the model interpretable in biological terms? Do they have biological relevance? This is often determined from theoretical considerations or previous research. Given these decisions, the data itself are an important consideration. What pattern does the data generally have? Is it increasing, decreasing, curved, etc.? Are there sufficient data points to support estimation of a model and do they fall in appropriate places to provide good estimation? As an example, many models attempt to estimate minimum or maximum theoretical values (asymptotes) when the values of dose or time are very large. Even though such doses or times are not practically relevant, they are important for model estimation. If the data set lacks points at large dose/time values, then the estimations will be poor, or even fail. Lastly, the complexity of model forms and their parameterization should be considered. Models containing many parameters (often more than 3) or parameters that involve exponents, ratios, or other multiplicative relationships can be difficult to estimate. Models should be as concise as possible with the fewest parameters as possible (parsimony). In many cases, parsimony and complexity issues can be resolved by reparameterizing the model or selecting a different model to make a simpler form for estimation.
Exponential forms are often used to describe uniformly decreasing or increasing curved trends found in response decay or growth. They are typically two to three parameter models. Two common forms used in agricultural research are:
\[ y = a*exp(-b*x) + c \tag*{(Decreasing)} \] \[y = M - a*exp(-b*x) \tag*{(Increasing)} \] In these models, the parameter \(a\) is releated to an intercept at \(x=0\), \(b\) is a rate related parameter, \(M\) is an upper asymptote, and \(c\) is a lower asymptote, both approached as \(x\) grows larger. In many cases, \(c\) is often assumed to be zero and \(M\) may be set to 1.0 or 100 for proportional or percentage data, respectively.
Similar to the exponential forms, S-curves may be either increasing or decreasing. A key difference, however, is that these forms have an inflection point midway where the rate of change in the curve slows and changes direction giving the curves their characteristic “S” shape. They are commonly used to describe dose-response relationships or growth over time. Many are taken directly from the functional forms of cumulative probability distributions for theoretical reasons (for example probit, tobit and logit models). Others are derived from mathematical derivations of growth or decay. Some specific types of S-curves are outlined below.
The logistic curve (not to be confused with logistic regression) is often seen in dose-response work with a probabilistic interpretation, or from differential equations describing growth. Its parameterization can vary widely. The form below is often seen in agricultural dose-response work.
\[ y = \frac{M}{1 + exp(K*(x - L))}\] Here, \(M\) is an upper asymptote representing a theoretical maximum of the response, \(y\). \(K\) is a rate parameter. When \(K\) is positive, the curve begins close to \(M\) and decreases. When negative, the curve starts close to zero and increases to \(M\). The parameter \(L\) is, by default, the infection point at 50% of \(M\). That is, when the variable \(x\) equals \(L\) the response \(y\) equals \(\frac{M}{2}\). In dose-response work, this 50th percentile is commonly known as the \(LD_{50}\) or \(ED_{50}\). In growth models, \(L\) represents a quantification of lag time for the start of growth. The quantity \((x - L)\) often appears in logrithmic form as this transformation makes the data pattern symmetric about the inflection point (as in the data examples below). Hence, the form may appear as \((ln(x) - L)\), \(ln(\frac{x}{L})\), or an equivalent. It is also possible to have \(L\) estimate other percentiles by reparameterizing the function (see for example Price, et. al, 2017.
These encompass a wide variety of models that were traditionally used for dose-response modeling. Examples include cumulative distributions for the Normal (probit), t (tobit), and logistic (logit). Because these functions are complex and involve integration to compute, they are not readily written out in equation form, but instead either use predefined probability functions in SAS, or dedicated SAS procedures (Proc Probit).
All the curves described above exhibit symmetry about the inflection point. As mentioned, sometimes asymmetry in the data is accommodated through logrithmic transformations in the model. There are some models, however, that can directly assume asymmetry. Two examples of this are the Weibull and Gompherz.
\[y = M*(1-exp(-((K*x)^C)))\tag*{Weibull}\]
\[y = = M*exp(−exp(−K*(x − L)))\tag*{Gomphertz}\] In these forms, \(M\), \(K\), and \(L\) play similar roles as those in the logistic model above, although \(L\) does not relate to the 50th percentile.
There are a special class of dose-response models where the data generally follow the decreasing S-Curve shape, but they also exhibit a small initial increase at very low doses. This behavior, for example, is often observed in some herbicides that are similar to growth hormones and small doses actually induce growth, rather than plant injury. This process is referred to as hormesis and one model to describe this phenomenon was given by Brain and Cousens, 1989:
\[y = \frac{(M + f*x)}{1 + exp(B*(log(x)-log(L)))}\] This model is similar to the logistic model above with the addition of the parameter \(f\) which creates the bump in the curve at small doses, \(x\). An example of fitting this curve can be found at Price, et. al, 2017
Nonlinear estimation is an iterative process. That is, there are no closed form equations to compute parameter values. Because of this, the computer algorithms that estimate parameters will use a trial and error process to judge whether the estimates it has are “good enough”, which usually means they have met some predetermined numeric condition (referred to as convergence). This requires that the algorithm has a place to start, or starting parameter values, to begin the estimation. The algorithm will then adjust the starting values to improve to model fit and continue doing so until the model converges. Conceptually, this process is analogous to letting a marble drop into a bowl from the edge. The marble will oscillate back and forth and around until it settles at the bottom or a low spot.
Traditionally, the trial and error iterative process has been based on the concept of least squares where the objective is to find parameters such that the distance between the predicted curve and the observed points is minimal. In SAS, the least squares estimation process is carried out through the procedure Proc Nlin. While Proc Nlin is still useful, it does have limitations because, for statistical inference, it assumes the data are normally distributed in addition to assuming that all parameters in the model are fixed effects. Its benefits, however, include ease of use and good convergence on parameter solutions, which can be useful for deriving starting values to use in more sophisticated nonlinear modeling procedures.
A more modern alternative to least squares estimation is the process of maximum likelihood estimation. In maximum likelihood, the objective of the parameter iterations and adjustments is to maximize the likelihood or probability of observing the data at hand. This has the obvious advantage that the likelihood function does not have to be a normal distribution, but could, alternatively, come from a wide array of potential distributions (generalized nonlinear models). The procedure available in SAS, Proc Nlmixed, also has the ability to consider parameters as random effects, thereby giving these models more flexibility to accurately represent the data. The drawbacks here are a more sensitive and complex estimation algorithm that can show difficulty in converging on a solution. Obtaining good starting values close to the final estimates can be essential when using Proc Nlmixed. The least squares procedure, Proc Nlin above, can be a good way of deriving these.
Both least squares Proc Nlin and maximum likelihood Proc Nlmixed will be demonstrated below.
These data are a simple simulated set of variables for demonstrating the exponential model and basic nonlinear estimation and results.
data exp;
do x = 0 to 20;
do rep = 1 to 4;
y = 100*exp(-.2*x) + rannor(0934)*5;
output;
end;
end;
;
proc sgplot;
scatter x=x y=y/filledoutlinedmarkers markeroutlineattrs=(color=black) markerattrs=(symbol=circlefilled size=12) MARKERFILLATTRS=(color=green);
xaxis LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
yaxis LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;
These data are from an experiment assessing the germination of onion seeds under varying osmotic pressure (available moisture). In the experiemnt, there were two petri dishes of 100 seeds each. This was replicated four times. The cumulative number of seeds germinated (emerged radical of 1 mm) were recorded for 20 days.
PROC IMPORT OUT= WORK.onion
DATAFILE= "./data/onion.xlsx"
DBMS=XLSX REPLACE;
sheet="Onion";
RUN;
proc sort data=onion;
by trt day;
run;
proc sgplot data=onion;
styleattrs datacolors=(cxFD07FA cx0752FD);
scatter x=day y=germ/group=trt filledoutlinedmarkers markeroutlineattrs=(color=black) markerattrs=(symbol=circlefilled size=12);;
xaxis type=log logbase=e logstyle=linear label='Day' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
yaxis label='Germination (%)' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
keylegend /AUTOITEMSIZE valueattrs=(size=14) TITLEATTRS=(weight=bold size=12) Title = 'Osmotic Pressure (Pa)';
run;
These data record the activity of Acetolactate Synthase (ALS) as % absorbtion in winter wheat under increasing doses of a herbicide that inhibits ALS (Rainbolt, C. R., et al., 2005. The study was carried out over eight replications (plates).
PROC IMPORT OUT= WORK.ALS
DATAFILE= "./data/ALS.xlsx"
DBMS=XLSX REPLACE;
sheet="ALS";
RUN;
proc sgplot;
scatter x=dose y=percent/FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKERFILLATTRS=(color=tan) MARKEROUTLINEATTRS=(color=black);;
xaxis type=log logbase=e logstyle=linear label='ALS Herbicide Dose (ug/ml)' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
yaxis label="ALS Activity (% Absorbance)" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;
The following code applies least squares estimation to an exponential model with the procedure Proc Nlin. The predicted curve and studentized model residuals are output and plotted to visually assess the model fit. The Parms statement is used to specify the starting values for the parameters a and b. The method=guass option applies one of several available least squares estimation algorithms.
proc nlin data=exp method=gauss;
parms a=100 b=-.2;
model y = a*exp(-b*x);
output out=nl_pred p=pr student=resid;
run;
proc sgplot;
scatter x=x y=y;
series x=x y=pr;
run;
proc sgplot;
scatter x=x y=resid;
refline 0/axis=y;
run;
Iterative Phase | |||
---|---|---|---|
Iter | a | b | Sum of Squares |
0 | 100.0 | -0.2000 | 3.6017E8 |
1 | 5.6513 | -0.1970 | 1070170 |
2 | 5.9832 | -0.1402 | 222659 |
3 | 15.8768 | 0.00579 | 72281.7 |
4 | 68.0750 | 0.2716 | 20700.8 |
5 | 97.3920 | 0.1391 | 8512.4 |
6 | 96.9721 | 0.1850 | 1701.4 |
7 | 98.8214 | 0.2010 | 1453.3 |
8 | 98.8811 | 0.2017 | 1452.9 |
9 | 98.8788 | 0.2016 | 1452.9 |
NOTE: Convergence criterion met. |
Estimation Summary | |
---|---|
Method | Gauss-Newton |
Iterations | 9 |
R | 4.364E-6 |
PPC(b) | 9.834E-7 |
RPC(b) | 0.000053 |
Object | 5.532E-8 |
Objective | 1452.874 |
Observations Read | 84 |
Observations Used | 84 |
Observations Missing | 0 |
Note: | An intercept was not specified for this model. |
Source | DF | Sum of Squares | Mean Square | F Value |
Approx Pr > F |
---|---|---|---|---|---|
Model | 2 | 117815 | 58907.5 | 3324.73 | <.0001 |
Error | 82 | 1452.9 | 17.7180 | ||
Uncorrected Total | 84 | 119268 |
Parameter | Estimate |
Approx Std Error |
Approximate 95% Confidence Limits |
|
---|---|---|---|---|
a | 98.8788 | 1.5696 | 95.7564 | 102.0 |
b | 0.2016 | 0.00502 | 0.1917 | 0.2116 |
Approximate Correlation Matrix | ||
---|---|---|
a | b | |
a | 1.0000000 | 0.6349554 |
b | 0.6349554 | 1.0000000 |
In this output, the first table lists the changes made to the parameters over the iterations required to converge on an answer (9). Also listed are the sum of squares error which decreases until it stabilizes at 1452. The note following the table is critical, stating the convergence was met. When using any nonlinear procedure, this statement or equivalent is essential. Results from analyses not meeting convergence should not be used.
An “ANOVA” like table lists significance for the model, p<0.0001, suggesting the exponential model fit the data. A parameter table with estimates and confidence intervals is also shown where a = 98.87 and b = 0.20. The last table lists correlation between the parameters. These correlations are important when they reach high levels, say larger than 0.9, which may indicate that the model has too many parameters for this data to support. In such circumstances, reducing the number of parameters through respecification of the model, or simply selecting a simpler alternative model, may be desirable. Note: In nonlinear estimation, all inferential tests and statistics are approximate and, hence, they should be interpreted cautiously and in conjunction with other information such as the size, sign and relevance of parameter estimates, as well as visual confirmation that the estimated curve approximates the observed data well.
To, that end, the final plots show the model runs through the observed data and the residuals are randomly scattered with no trending and they are of reasonable magnitude (between -3 and 3 for studentized residuals). All this information together indicates the exponential model is an appropriate description of this data.
In this example we have two treatments. We would like to fit germination curves to each treatment and then compare them. The data, however, are binomial counts, not continuous normal data, therefore, we need to use a generalized model (Proc Nlmixed) for the final estimation. Before doing so, however, it is a good idea to see if we can generally fit a curve (logistic in this case) to the data. To do so, Proc Nlin is used, ignoring the count nature of the data. This provides us with estimated starting values for the actual generalized model later. We will ignore all inferential statistics here and only concentrate on the estimated parameters and the visual fit of the curves. Once satisfied with these we can move on to the curve comparisons.
The code below sorts the data by treatment, fits the logistic model separately for each, and then plots the predicted curves with the observed data. As noted above, when the data pattern is asymmetric, as in this case, a logrithmic transformation is useful. This is done for the variable Day and the parameter \(L\).
proc sort data=onion;
by trt day;
run;
proc nlin data=onion;
parms M=80 K=1 L=4;
model germ = M/(1 + exp(-K*(log(day) - log(L))));
by trt;
output out=pred p=pr;
run;
proc sgplot data=pred;
styleattrs datacolors=(cxFD07FA cx0752FD);
scatter x=day y=germ/group=trt FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKEROUTLINEATTRS=(color=black);
series x=day y=pr/group=trt;
xaxis label='Day' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
yaxis label="Germinated" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
title1 'Logistic Model';
run;
Parameter | Estimate: Trt = 0 | Estimate: Trt = -0.662 |
---|---|---|
M | 79.8596 | 81.1036 |
K | 4.7118 | 5.4270 |
L | 2.6412 | 4.2480 |
The estimated curves seem to fit the data well. The parameter estimates will be used in the following section to run the curve comparison, while accounting for the binomial nature of the data.
Curve comparison is done using Proc Nlmixed. Unlike Proc Nlin, Nlmixed allows for Contrast statements on estimated parameters. Hence, even when the data at hand are normally distributed, curve comparison will require Proc Nlmixed. Additionally, Proc Nlmixed carries out estimation through maximum likelihood. Because of this, the model is specified in terms of the distribution and it’s parameters. Lastly, we will use a Dummy Variable Regression specification to build a full model that simultaneously fits the two treatments. This process is similar to that covered for linear regression here. While this example uses two treatments, multiple treatments can be fit as well by extending the framework shown here. Keep in mind, however, that fitting too many treatments at once will reduce statistical power and is lacking in parsimony.
We start by specifying parameters M, K, and L for each treatment using the starting values above (approximation is fine here). The numeric indices 1 and 2 refer to treatments 0 and -0.662, respectively. Of note, parameter M is now expressed as a proportion, not the percentage that was used in Proc Nlin because the binomial distribution expects values between 0.0 and 1.0. The binomial distribution requires two parts: An expected value or mean, and the total number of trials per observations, N. The expected value will be the logistic model, defined here as mu. In the experiment, each observation was based on the number of seeds germinated out of an initial 100 seeds, so the number of trials is 100, which is recorded in variable total. The model statement then is expressed as germ distributed as (~) a binomial with expected value mu, a function of our parameters, and the total.
Contrast statements are written differently than most SAS modeling procedures and are more simply expressed as equations such as \(M1 - M2\) or similar. Note, however, that contrasts don’t just involve treatments, but must also specify what part or parameter in the model to test. Below, tests are done to see whether treatments differ in the maximum extent of germination, the rate of germination, or the lag time in germination. A Predict statement is also used to output the predicted values to a data set Pred. In this statement, mu is multiplied by 100 to convert the predicted values back to a percentage scale.
proc nlmixed data=onion;
parms M1 = .80 K1 = 4.7 L1 = 2.64
M2 = .81 K2 = 5.42 L2 = 4.25;
/** If an observation is from trt=0, then use trt=0 parameters. Otherwise, use trt=-0.662 parameters **/
if trt=0 then do;
M = M1;
K = K1;
L = L1;
end;
else if trt=-0.662 then do;
M = M2;
K = K2;
L = L2;
end;
mu = M/(1 + exp(-K*(log(day) - log(L))));
model germ~binomial(total, mu);
contrast 'Parameters M' M1-M2;
contrast 'Parameters K' K1-K2;
contrast 'Parameters L' L1-L2;
predict 100*mu out = pred;
title1 'Logistic Model Comparisons: Full Model';
run;
Logistic Model Comparisons: Full Model |
Specifications | |
---|---|
Data Set | WORK.ONION |
Dependent Variable | Germ |
Distribution for Dependent Variable | Binomial |
Optimization Technique | Dual Quasi-Newton |
Integration Method | None |
Dimensions | |
---|---|
Observations Used | 240 |
Observations Not Used | 0 |
Total Observations | 240 |
Parameters | 6 |
Initial Parameters | ||||||
---|---|---|---|---|---|---|
M1 | K1 | L1 | M2 | K2 | L2 |
Negative Log Likelihood |
0.8 | 4.7 | 2.64 | 0.81 | 5.42 | 4.25 | 726.462991 |
Iteration History | |||||
---|---|---|---|---|---|
Iteration | Calls |
Negative Log Likelihood |
Difference |
Maximum Gradient |
Slope |
1 | 7 | 726.3097 | 0.153291 | 40.0155 | -733.277 |
2 | 10 | 724.8178 | 1.491938 | 228.173 | -246.853 |
3 | 14 | 724.0280 | 0.789744 | 150.754 | -3825.47 |
4 | 16 | 723.7816 | 0.246454 | 124.159 | -13.6850 |
5 | 20 | 719.3527 | 4.428818 | 18.1069 | -21.7328 |
6 | 23 | 719.2679 | 0.084847 | 5.23785 | -0.15551 |
7 | 26 | 719.2621 | 0.005768 | 1.00919 | -0.00816 |
8 | 29 | 719.2621 | 0.000036 | 0.057938 | -0.00007 |
9 | 32 | 719.2621 | 4.484E-7 | 0.006386 | -8.07E-7 |
NOTE: GCONV convergence criterion satisfied. |
Fit Statistics | |
---|---|
-2 Log Likelihood | 1438.5 |
AIC (smaller is better) | 1450.5 |
AICC (smaller is better) | 1450.9 |
BIC (smaller is better) | 1471.4 |
Parameter Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | Estimate |
Standard Error |
DF | t Value | Pr > |t| | 95% Confidence Limits | Gradient | |
M1 | 0.7972 | 0.004992 | 240 | 159.70 | <.0001 | 0.7874 | 0.8070 | 0.000919 |
K1 | 5.1117 | 0.3128 | 240 | 16.34 | <.0001 | 4.4954 | 5.7280 | 0.000029 |
L1 | 2.6797 | 0.03938 | 240 | 68.04 | <.0001 | 2.6021 | 2.7573 | 0.000031 |
M2 | 0.8067 | 0.005434 | 240 | 148.45 | <.0001 | 0.7960 | 0.8174 | 0.006386 |
K2 | 6.1220 | 0.2806 | 240 | 21.82 | <.0001 | 5.5693 | 6.6747 | 0.000026 |
L2 | 4.2794 | 0.04517 | 240 | 94.74 | <.0001 | 4.1905 | 4.3684 | -0.00094 |
Contrasts | ||||
---|---|---|---|---|
Label | Num DF | Den DF | F Value | Pr > F |
Parameters M | 1 | 240 | 1.64 | 0.2012 |
Parameters K | 1 | 240 | 5.78 | 0.0170 |
Parameters L | 1 | 240 | 712.55 | <.0001 |
The output here is similar to Proc Nlin with iterations, etc. The parameter estimates are very similar to the least squares estimates. Due to the different estimation method, however, they will not be exactly the same. An advantage of the generalized model assuming a binomial here is that the estimates will stay within the theoretical bounds, unlike least squares assuming normality. For example, the maximum germination estimates cannot exceed 100% or a proportion of 1.0 in the generalized model, while they potentially could under least squares.
The table of contrasts indicates that the maximum germination of the two treatments, \(M\), is similar at about 80%, while the rate of germination, \(K\), for the drier treatment, -0.662, is higher than the control and also has a longer lag time in germination, estimated by parameter \(L\).
Mixed effects in nonlinear models may be considered when there are additional known factors which could affect the variability in estimation. Examples might be data sets that represent multiple studies, experiments, locations, seedlots, animal barns, etc. Like the case of curve comparison, we must consider where that variation might occur among the model parameters. In a growth curve, would it affect the parameter(s) for rate, or the maximum growth, or both? This can only be assessed by fitting models both with and without random effects. Caution is advised, however, as adding multiple random effects can make model estimation difficult and it also adds additional variance parameters, reducing parsimony and requiring further guessing about starting values.
The initial step is to fit a fixed model assuming no random effects. In the code below, a fixed logistic model is fit to absorbence data. In this case, the logistic model is a decreasing function starting close to 100% (1.0). This model is assuming normality where the mean or expected value of the distribution is defined in mu as the logistic model using logrithmic transformations on ALS dose and the parameter \(L\). The variance of the normal distribution also needs to be specified as a parameter, sigma2. Starting values were determined from Proc Nlin (not shown). The starting value for sigma2 was obtained from the MSE value in that estimation. Because the data have observations at a dose of 0.0, and the model attempts to use a log(dose) function, which would be undefined at dose=0.0, there is an additional If statement to reduce the model under this condition to mu=M.
proc nlmixed data=als corr;
parms M=.94 b=1 L=2.748 sigma2=38;
if Dose=0 then mu=M; /** Steps to avoid log(0) **/
else mu = M/(1 + exp(B*(log(Dose)-log(L))));
model percent ~normal(mu, sigma2);
title1 'Winter Wheat, Continuous Response: Fixed Model';
run;
Winter Wheat, Continuous Response: Fixed Model |
Specifications | |
---|---|
Data Set | WORK.ALS |
Dependent Variable | Percent |
Distribution for Dependent Variable | Normal |
Optimization Technique | Dual Quasi-Newton |
Integration Method | None |
Dimensions | |
---|---|
Observations Used | 56 |
Observations Not Used | 0 |
Total Observations | 56 |
Parameters | 4 |
Initial Parameters | ||||
---|---|---|---|---|
M | b | L | sigma2 |
Negative Log Likelihood |
0.94 | 1 | 2.748 | 38 | 2351.22301 |
Iteration History | |||||
---|---|---|---|---|---|
Iteration | Calls |
Negative Log Likelihood |
Difference |
Maximum Gradient |
Slope |
1 | 8 | 884.6597 | 1466.563 | 24.6599 | -476.477 |
2 | 12 | 333.8276 | 550.8321 | 5.20083 | -580.170 |
3 | 16 | 309.2824 | 24.54522 | 11.5213 | -26.9999 |
4 | 20 | 289.3418 | 19.94064 | 2.41527 | -11.4335 |
5 | 24 | 286.4067 | 2.93508 | 3.37836 | -3.19529 |
6 | 38 | 226.8640 | 59.54271 | 2.91944 | -1.50236 |
7 | 43 | 226.7298 | 0.134124 | 3.74046 | -55.7129 |
8 | 49 | 202.9336 | 23.79621 | 10.3992 | -9.54469 |
9 | 52 | 200.4843 | 2.449369 | 5.28665 | -16.9803 |
10 | 54 | 197.3912 | 3.093109 | 3.07122 | -3.58802 |
11 | 57 | 197.0765 | 0.314667 | 2.95303 | -0.87153 |
12 | 60 | 196.9340 | 0.142524 | 1.00865 | -0.24142 |
13 | 63 | 196.9129 | 0.021055 | 0.31438 | -0.06472 |
14 | 66 | 196.9024 | 0.010551 | 0.65743 | -0.00694 |
15 | 74 | 193.9927 | 2.909663 | 10.6529 | -0.01512 |
16 | 81 | 185.4806 | 8.512097 | 20.0505 | -6.21263 |
17 | 84 | 180.6497 | 4.830874 | 8.37607 | -26.2304 |
18 | 88 | 180.1312 | 0.518539 | 4.46786 | -6.66086 |
19 | 91 | 179.9314 | 0.199792 | 3.10243 | -1.84137 |
20 | 94 | 179.8563 | 0.075064 | 0.68888 | -0.16647 |
21 | 98 | 179.8537 | 0.002641 | 0.21606 | -0.02095 |
22 | 101 | 179.8536 | 0.000116 | 0.017616 | -0.00025 |
23 | 104 | 179.8536 | 1.114E-6 | 0.000244 | -2.28E-6 |
24 | 107 | 179.8536 | 5.291E-9 | 0.000081 | -1.08E-8 |
NOTE: GCONV convergence criterion satisfied. |
Fit Statistics | |
---|---|
-2 Log Likelihood | 359.7 |
AIC (smaller is better) | 367.7 |
AICC (smaller is better) | 368.5 |
BIC (smaller is better) | 375.8 |
Parameter Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | Estimate |
Standard Error |
DF | t Value | Pr > |t| | 95% Confidence Limits | Gradient | |
M | 94.1121 | 2.0799 | 56 | 45.25 | <.0001 | 89.9456 | 98.2786 | 8.353E-6 |
b | 1.0348 | 0.05868 | 56 | 17.63 | <.0001 | 0.9172 | 1.1524 | 0.000047 |
L | 2.7483 | 0.1937 | 56 | 14.19 | <.0001 | 2.3603 | 3.1363 | 0.000081 |
sigma2 | 36.0701 | 6.8166 | 56 | 5.29 | <.0001 | 22.4147 | 49.7255 | -5.55E-7 |
Correlation Matrix of Parameter Estimates | ||||
---|---|---|---|---|
M | b | L | sigma2 | |
M | 1.0000 | -0.4063 | -0.7374 | 0.0000 |
b | -0.4063 | 1.0000 | 0.3850 | 0.0000 |
L | -0.7374 | 0.3850 | 1.0000 | -0.0000 |
sigma2 | 0.0000 | 0.0000 | -0.0000 | 1.0000 |
What is important in this output are the fit statistics. The AIC here is 367.7.
Next, the logistic is fit allowing for the \(ED_{50}\) estimate, \(L\) to have a random effect from experiment (plate). This is entered into the model in three steps: 1) by defining a variance for \(L\) as sig_l, and 2) adding statement to define the random \(L\) parameter as L1 = L + u, e.g. L1 is the original parameter plus a random component, and lastly, 3) adding a random statement that defines the random component u as a normal variable with mean=0 and variance of \(sig\_l^2\). A subject option denotes where the random effect comes from, plate. Note also that the logistic model now uses the newly defined L1 parameter. The Predict statements for predicted values, however, use the original \(L\). If we did not do this, the predicted line would represent random fluctuation, not a smooth predicted curve.
proc nlmixed data=als ;
parms M=.94 b=1 L=2.748 sigma2=38 sig_l = 1;
L1 = L + u;
if Dose=0 then mu=M; /** Steps to avoid log(0) **/
else mu = M/(1 + exp(B*(log(Dose)-log(L1))));
model percent ~normal(mu, sigma2);
random u ~ normal([0], [sig_l**2]) subject=plate;
predict M/(1 + exp(B*(log(Dose+0.1)-log(L)))) out = pred;
title1 'Winter Wheat, Continuous Response: Random Model (L)';
run;
proc sort data=pred;
by dose;
run;
proc sgplot data=pred;
scatter x=dose y=percent/FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKERFILLATTRS=(color=tan) MARKEROUTLINEATTRS=(color=black);
series x=dose y=upper;
series x=dose y=lower;
series x=dose y=pred;
xaxis label='ALS Herbicide Dose' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
yaxis label="ALS Activity (% Absorbance)" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;
Winter Wheat, Continuous Response: Random Model (L) |
Specifications | |
---|---|
Data Set | WORK.ALS |
Dependent Variable | Percent |
Distribution for Dependent Variable | Normal |
Random Effects | u |
Distribution for Random Effects | Normal |
Subject Variable | Plate |
Optimization Technique | Dual Quasi-Newton |
Integration Method | Adaptive Gaussian Quadrature |
Dimensions | |
---|---|
Observations Used | 56 |
Observations Not Used | 0 |
Total Observations | 56 |
Subjects | 8 |
Max Obs per Subject | 7 |
Parameters | 5 |
Quadrature Points | 1 |
Initial Parameters | |||||
---|---|---|---|---|---|
M | b | L | sigma2 | sig_l |
Negative Log Likelihood |
0.94 | 1 | 2.748 | 38 | 1 | 2351.1212 |
Iteration History | |||||
---|---|---|---|---|---|
Iteration | Calls |
Negative Log Likelihood |
Difference |
Maximum Gradient |
Slope |
1 | 8 | 831.9076 | 1519.214 | 23.5213 | -483.043 |
2 | 12 | 334.7991 | 497.1085 | 5.97950 | -525.548 |
3 | 16 | 278.9919 | 55.80715 | 2.79165 | -38.3667 |
4 | 18 | 275.8544 | 3.137487 | 2.79277 | -4.76914 |
5 | 22 | 273.0654 | 2.78899 | 4.88799 | -1.64060 |
6 | 291 | 219.9774 | 53.08808 | 17.6545 | -7.02670 |
7* | 297 | 216.0941 | 3.883216 | 0.59118 | -219.933 |
8 | 299 | 215.7704 | 0.323706 | 4.12978 | -0.06219 |
9 | 301 | 215.2915 | 0.478987 | 0.54512 | -0.55451 |
10 | 303 | 214.5412 | 0.750236 | 8.21685 | -0.26351 |
11 | 310 | 211.7060 | 2.835235 | 15.7943 | -1.58736 |
12 | 317 | 209.9275 | 1.778473 | 14.5760 | -5.43158 |
13 | 319 | 207.2618 | 2.665749 | 7.28591 | -10.1042 |
14 | 322 | 206.7364 | 0.525331 | 3.62850 | -1.10840 |
15 | 324 | 205.9773 | 0.759154 | 2.87652 | -0.30012 |
16 | 329 | 203.4997 | 2.477613 | 15.4830 | -1.31747 |
17 | 331 | 199.9381 | 3.561604 | 11.7881 | -27.0544 |
18 | 338 | 196.1037 | 3.834337 | 9.77497 | -30.7343 |
19 | 342 | 195.3312 | 0.77251 | 3.93638 | -3.89766 |
20 | 344 | 195.1109 | 0.220328 | 3.89521 | -2.93744 |
21 | 346 | 194.7588 | 0.352069 | 0.87214 | -0.64024 |
22 | 349 | 194.7172 | 0.041581 | 0.30391 | -0.08144 |
23 | 352 | 194.6977 | 0.019565 | 0.35845 | -0.01055 |
24 | 356 | 194.6377 | 0.059999 | 0.99235 | -0.01591 |
25 | 361 | 194.4437 | 0.193935 | 1.86660 | -0.11263 |
26 | 363 | 194.2791 | 0.164639 | 0.57421 | -0.65523 |
27 | 365 | 194.0198 | 0.259332 | 1.02787 | -0.19506 |
28 | 369 | 192.9211 | 1.098699 | 2.54044 | -0.22950 |
29 | 376 | 191.8747 | 1.046347 | 35.4228 | -1.92433 |
30 | 378 | 185.1822 | 6.692553 | 20.7788 | -122.228 |
31 | 381 | 179.5439 | 5.638304 | 6.44752 | -37.3200 |
32 | 384 | 178.2532 | 1.290613 | 17.0467 | -2.25730 |
33 | 387 | 177.5889 | 0.664355 | 13.1564 | -4.97466 |
34 | 391 | 175.8719 | 1.717037 | 3.06094 | -2.24523 |
35 | 393 | 174.9813 | 0.890545 | 8.61356 | -1.92998 |
36 | 397 | 172.5265 | 2.454811 | 3.24251 | -15.6953 |
37 | 404 | 170.9078 | 1.61868 | 2.90202 | -4.93989 |
38 | 407 | 170.2334 | 0.674382 | 4.42552 | -0.85054 |
39 | 412 | 169.9614 | 0.271993 | 6.42306 | -0.64752 |
40 | 415 | 169.8310 | 0.13044 | 0.31305 | -0.27047 |
41 | 418 | 169.8293 | 0.0017 | 0.027929 | -0.00324 |
42 | 421 | 169.8293 | 0.000027 | 0.002526 | -0.00005 |
43 | 424 | 169.8293 | 1.281E-7 | 0.000260 | -2.59E-7 |
NOTE: GCONV convergence criterion satisfied. |
Fit Statistics | |
---|---|
-2 Log Likelihood | 339.7 |
AIC (smaller is better) | 349.7 |
AICC (smaller is better) | 350.9 |
BIC (smaller is better) | 350.1 |
Parameter Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | Estimate |
Standard Error |
DF | t Value | Pr > |t| | 95% Confidence Limits | Gradient | |
M | 94.5055 | 1.4931 | 7 | 63.29 | <.0001 | 90.9749 | 98.0361 | 3.723E-6 |
b | 1.0409 | 0.04254 | 7 | 24.47 | <.0001 | 0.9403 | 1.1415 | -0.00026 |
L | 2.7794 | 0.2716 | 7 | 10.23 | <.0001 | 2.1371 | 3.4217 | 0.000031 |
sigma2 | 18.8887 | 3.8684 | 7 | 4.88 | 0.0018 | 9.7414 | 28.0360 | -0.00001 |
sig_l | -0.6573 | 0.1969 | 7 | -3.34 | 0.0124 | -1.1228 | -0.1918 | 0.000257 |
The fit statistics here give an AIC value of 349.7 which is lower than the fixed model. Hence, this suggests that the addition of the random effect improves the model fit. The final plot shows the fitted curve with 95% confidence bounds.