1 Introduction

This tutorial is an introduction to nonlinear regression models in SAS. Some additional knowledge of SAS may be useful for understanding certain aspects of this material. For more information on these requirements, the reader is referred to the tutorials on the Data Step and Graphics and Plotting.

Nonlinear models are often associated with functions that curve or are not straight lines. This, however, is a not a precise definition. Linear or nonlinear in statistical models refers to how the model parameters enter the model function. Some models that graphically appear “curved” are, in fact, linear models. For example, a quadratic model is graphically a concave or convex shape, but the model is linear:

\[y_{i} = \beta_0 + \beta_1 \cdot x_{1} + \beta_2 \cdot x_{1}^2 + \epsilon_{i}\] In this model, the parameters \(\beta_0\), \(\beta_1\), and \(\beta_2\) are additive relative to one another making this a linear model. An example of a nonlinear model would be:

\[y_{i} = \beta_0\cdot x_{i}^{\beta_1} + \epsilon_{i}\]

where the parameters \(\beta_0\) and \(\beta_1\) are multiplicative relative to each other. In terms of estimation, this makes a difference. In linear models, solutions for parameter estimates can be derived algebraically, while no closed form solutions or equations can be derived for nonlinear models. Estimation in nonlinear models is obtained through an iterative “trial-and-error” process, the details of which can vary and are only covered briefly here. For more technical material on nonlinear estimation methods, see texts such as Nonlinear Regression Analysis and Its Applications.

Regardless of the methodology, the iterative estimation process typically requires the user to specify starting values for each parameter. These are basically guesses at what the parameters might be to start the process. The user may have some clues from the data, or they may be completely unknown to the user. Such details will depend on the model used, knowledge of the process being modeled, and the data itself. Examples for these aspects will be demonstrated below.

1.0.0.0.1 Note: Linearization

The ability to estimate nonlinear models is, computationally, relatively recent. Before this ability was available, nonlinear model forms were often transformed or linearized such that they became linear with additive parameters. This made estimation through standard linear regression possible. An example would be the power model above:

\[y_{i} = \beta_0\cdot x_{i}^{\beta_1} + \epsilon_{i}\]

By taking the logarithm of the model, we get a linear form:

\[ln(y_{i}) = ln(\beta_0) + \beta_1\cdot x_{i}^{} + \phi_{i} \] \[ y_{i}^*= \beta_0^* + \beta_1\cdot x_{i} + \phi_{i}\] where \(y_{i}^*\) is the logarithm of the response, \(y_{i}\), and the intercept of this model, \(\beta_0^*\), is the logarithm of the original parameter \(\beta_0\). The exponential function could then be used to untransform the predicted values and intercept parameter. Note, however, that the error term \(\phi_{i}\) is not equivalent to the additive error \(\epsilon_{i}\). Thus, linearization provides only an approximation to the desired nonlinear form. For these reasons, it is now more desirable to use true nonlinear estimation techniques rather than linearization. In some cases, however, linearization may be useful for deriving the starting values necessary for the nonlinear estimation process.


2 Model Selection

There are an infinite number of nonlinear forms that can be selected for representing data. The selection process, however, should follow some basic guidelines. First and foremost, model selection, and the use of nonlinear modeling in general, should meet the research objectives. What is the purpose of modeling and what does the researcher hope to gain from it? A second and related factor is model interpretation. Are the parameters of the model interpretable in biological terms? Do they have biological relevance? This is often determined from theoretical considerations or previous research. Given these decisions, the data itself are an important consideration. What pattern does the data generally have? Is it increasing, decreasing, curved, etc.? Are there sufficient data points to support estimation of a model and do they fall in appropriate places to provide good estimation? As an example, many models attempt to estimate minimum or maximum theoretical values (asymptotes) when the values of dose or time are very large. Even though such doses or times are not practically relevant, they are important for model estimation. If the data set lacks points at large dose/time values, then the estimations will be poor, or even fail. Lastly, the complexity of model forms and their parameterization should be considered. Models containing many parameters (often more than 3) or parameters that involve exponents, ratios, or other multiplicative relationships can be difficult to estimate. Models should be as concise as possible with the fewest parameters as possible (parsimony). In many cases, parsimony and complexity issues can be resolved by reparameterizing the model or selecting a different model to make a simpler form for estimation.

3 Common Nonlinear Models

3.1 Exponential Forms

Exponential forms are often used to describe uniformly decreasing or increasing curved trends found in response decay or growth. They are typically two to three parameter models. Two common forms used in agricultural research are:

\[ y = a*exp(-b*x) + c \tag*{(Decreasing)} \] \[y = M - a*exp(-b*x) \tag*{(Increasing)} \] In these models, the parameter \(a\) is releated to an intercept at \(x=0\), \(b\) is a rate related parameter, \(M\) is an upper asymptote, and \(c\) is a lower asymptote, both approached as \(x\) grows larger. In many cases, \(c\) is often assumed to be zero and \(M\) may be set to 1.0 or 100 for proportional or percentage data, respectively.

Decreasing Exponential Increasing Exponential

3.2 S-Curve Forms

Similar to the exponential forms, S-curves may be either increasing or decreasing. A key difference, however, is that these forms have an inflection point midway where the rate of change in the curve slows and changes direction giving the curves their characteristic “S” shape. They are commonly used to describe dose-response relationships or growth over time. Many are taken directly from the functional forms of cumulative probability distributions for theoretical reasons (for example probit, tobit and logit models). Others are derived from mathematical derivations of growth or decay. Some specific types of S-curves are outlined below.

Increasing S-Curve Decreasing S-Curve

3.2.1 Logistic Curve

The logistic curve (not to be confused with logistic regression) is often seen in dose-response work with a probabilistic interpretation, or from differential equations describing growth. Its parameterization can vary widely. The form below is often seen in agricultural dose-response work.

\[ y = \frac{M}{1 + exp(K*(x - L))}\] Here, \(M\) is an upper asymptote representing a theoretical maximum of the response, \(y\). \(K\) is a rate parameter. When \(K\) is positive, the curve begins close to \(M\) and decreases. When negative, the curve starts close to zero and increases to \(M\). The parameter \(L\) is, by default, the infection point at 50% of \(M\). That is, when the variable \(x\) equals \(L\) the response \(y\) equals \(\frac{M}{2}\). In dose-response work, this 50th percentile is commonly known as the \(LD_{50}\) or \(ED_{50}\). In growth models, \(L\) represents a quantification of lag time for the start of growth. The quantity \((x - L)\) often appears in logrithmic form as this transformation makes the data pattern symmetric about the inflection point (as in the data examples below). Hence, the form may appear as \((ln(x) - L)\), \(ln(\frac{x}{L})\), or an equivalent. It is also possible to have \(L\) estimate other percentiles by reparameterizing the function (see for example Price, et. al, 2017.

3.2.2 Probability Based Models

These encompass a wide variety of models that were traditionally used for dose-response modeling. Examples include cumulative distributions for the Normal (probit), t (tobit), and logistic (logit). Because these functions are complex and involve integration to compute, they are not readily written out in equation form, but instead either use predefined probability functions in SAS, or dedicated SAS procedures (Proc Probit).

3.2.3 Asymmetric S-Curves

All the curves described above exhibit symmetry about the inflection point. As mentioned, sometimes asymmetry in the data is accommodated through logrithmic transformations in the model. There are some models, however, that can directly assume asymmetry. Two examples of this are the Weibull and Gompherz.

\[y = M*(1-exp(-((K*x)^C)))\tag*{Weibull}\]

\[y = = M*exp(−exp(−K*(x − L)))\tag*{Gomphertz}\] In these forms, \(M\), \(K\), and \(L\) play similar roles as those in the logistic model above, although \(L\) does not relate to the 50th percentile.

3.2.4 Hormesis Models

There are a special class of dose-response models where the data generally follow the decreasing S-Curve shape, but they also exhibit a small initial increase at very low doses. This behavior, for example, is often observed in some herbicides that are similar to growth hormones and small doses actually induce growth, rather than plant injury. This process is referred to as hormesis and one model to describe this phenomenon was given by Brain and Cousens, 1989:

\[y = \frac{(M + f*x)}{1 + exp(B*(log(x)-log(L)))}\] This model is similar to the logistic model above with the addition of the parameter \(f\) which creates the bump in the curve at small doses, \(x\). An example of fitting this curve can be found at Price, et. al, 2017

4 Model Estimation

Nonlinear estimation is an iterative process. That is, there are no closed form equations to compute parameter values. Because of this, the computer algorithms that estimate parameters will use a trial and error process to judge whether the estimates it has are “good enough”, which usually means they have met some predetermined numeric condition (referred to as convergence). This requires that the algorithm has a place to start, or starting parameter values, to begin the estimation. The algorithm will then adjust the starting values to improve to model fit and continue doing so until the model converges. Conceptually, this process is analogous to letting a marble drop into a bowl from the edge. The marble will oscillate back and forth and around until it settles at the bottom or a low spot.

Traditionally, the trial and error iterative process has been based on the concept of least squares where the objective is to find parameters such that the distance between the predicted curve and the observed points is minimal. In SAS, the least squares estimation process is carried out through the procedure Proc Nlin. While Proc Nlin is still useful, it does have limitations because, for statistical inference, it assumes the data are normally distributed in addition to assuming that all parameters in the model are fixed effects. Its benefits, however, include ease of use and good convergence on parameter solutions, which can be useful for deriving starting values to use in more sophisticated nonlinear modeling procedures.

A more modern alternative to least squares estimation is the process of maximum likelihood estimation. In maximum likelihood, the objective of the parameter iterations and adjustments is to maximize the likelihood or probability of observing the data at hand. This has the obvious advantage that the likelihood function does not have to be a normal distribution, but could, alternatively, come from a wide array of potential distributions (generalized nonlinear models). The procedure available in SAS, Proc Nlmixed, also has the ability to consider parameters as random effects, thereby giving these models more flexibility to accurately represent the data. The drawbacks here are a more sensitive and complex estimation algorithm that can show difficulty in converging on a solution. Obtaining good starting values close to the final estimates can be essential when using Proc Nlmixed. The least squares procedure, Proc Nlin above, can be a good way of deriving these.

Both least squares Proc Nlin and maximum likelihood Proc Nlmixed will be demonstrated below.

5 Data used in examples

5.1 Exponential Data

These data are a simple simulated set of variables for demonstrating the exponential model and basic nonlinear estimation and results.


data exp;
    do x = 0 to 20;
        do rep = 1 to 4;
            y = 100*exp(-.2*x) + rannor(0934)*5;
            output;
        end;
    end;
;

proc sgplot;
        scatter x=x y=y/filledoutlinedmarkers markeroutlineattrs=(color=black) markerattrs=(symbol=circlefilled size=12) MARKERFILLATTRS=(color=green);
    xaxis LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;

The SGPlot Procedure


5.2 Onion Data

These data are from an experiment assessing the germination of onion seeds under varying osmotic pressure (available moisture). In the experiemnt, there were two petri dishes of 100 seeds each. This was replicated four times. The cumulative number of seeds germinated (emerged radical of 1 mm) were recorded for 20 days.


PROC IMPORT OUT= WORK.onion
    DATAFILE= "./data/onion.xlsx"
    DBMS=XLSX REPLACE;
    sheet="Onion";
RUN;

proc sort data=onion;
    by trt day;
run;

proc sgplot data=onion;
    styleattrs datacolors=(cxFD07FA cx0752FD);
    scatter x=day y=germ/group=trt filledoutlinedmarkers markeroutlineattrs=(color=black) markerattrs=(symbol=circlefilled size=12);;
    xaxis type=log logbase=e logstyle=linear label='Day' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label='Germination (%)' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    keylegend /AUTOITEMSIZE valueattrs=(size=14) TITLEATTRS=(weight=bold size=12) Title = 'Osmotic Pressure (Pa)'; 
run;

The SGPlot Procedure


5.3 ALS Activity in Winter Wheat

These data record the activity of Acetolactate Synthase (ALS) as % absorbtion in winter wheat under increasing doses of a herbicide that inhibits ALS (Rainbolt, C. R., et al., 2005. The study was carried out over eight replications (plates).


PROC IMPORT OUT= WORK.ALS
    DATAFILE= "./data/ALS.xlsx"
    DBMS=XLSX REPLACE;
    sheet="ALS";
RUN;
    
proc sgplot;
    scatter x=dose y=percent/FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKERFILLATTRS=(color=tan) MARKEROUTLINEATTRS=(color=black);;
    xaxis type=log logbase=e logstyle=linear label='ALS Herbicide Dose (ug/ml)' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label="ALS Activity (% Absorbance)" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;

The SGPlot Procedure


6 Least Squares Estimation

The following code applies least squares estimation to an exponential model with the procedure Proc Nlin. The predicted curve and studentized model residuals are output and plotted to visually assess the model fit. The Parms statement is used to specify the starting values for the parameters a and b. The method=guass option applies one of several available least squares estimation algorithms.


proc nlin data=exp method=gauss;
    parms a=100 b=-.2;
    model y = a*exp(-b*x);
    output out=nl_pred p=pr student=resid;
run;

proc sgplot;
    scatter x=x y=y;
        series x=x y=pr;
        
run;

proc sgplot;
    scatter x=x y=resid;
        refline 0/axis=y;
run;
The NLIN Procedure
Dependent Variable y
Method: Gauss-Newton

Iterative Phase
Iter a b Sum of Squares
0 100.0 -0.2000 3.6017E8
1 5.6513 -0.1970 1070170
2 5.9832 -0.1402 222659
3 15.8768 0.00579 72281.7
4 68.0750 0.2716 20700.8
5 97.3920 0.1391 8512.4
6 96.9721 0.1850 1701.4
7 98.8214 0.2010 1453.3
8 98.8811 0.2017 1452.9
9 98.8788 0.2016 1452.9

NOTE: Convergence criterion met.

Estimation Summary
Method Gauss-Newton
Iterations 9
R 4.364E-6
PPC(b) 9.834E-7
RPC(b) 0.000053
Object 5.532E-8
Objective 1452.874
Observations Read 84
Observations Used 84
Observations Missing 0


Note: An intercept was not specified for this model.

Source DF Sum of Squares Mean Square F Value Approx
Pr > F
Model 2 117815 58907.5 3324.73 <.0001
Error 82 1452.9 17.7180
Uncorrected Total 84 119268

Parameter Estimate Approx
Std Error
Approximate 95% Confidence
Limits
a 98.8788 1.5696 95.7564 102.0
b 0.2016 0.00502 0.1917 0.2116

Approximate Correlation Matrix
a b
a 1.0000000 0.6349554
b 0.6349554 1.0000000



The SGPlot Procedure




The SGPlot Procedure


In this output, the first table lists the changes made to the parameters over the iterations required to converge on an answer (9). Also listed are the sum of squares error which decreases until it stabilizes at 1452. The note following the table is critical, stating the convergence was met. When using any nonlinear procedure, this statement or equivalent is essential. Results from analyses not meeting convergence should not be used.

An “ANOVA” like table lists significance for the model, p<0.0001, suggesting the exponential model fit the data. A parameter table with estimates and confidence intervals is also shown where a = 98.87 and b = 0.20. The last table lists correlation between the parameters. These correlations are important when they reach high levels, say larger than 0.9, which may indicate that the model has too many parameters for this data to support. In such circumstances, reducing the number of parameters through respecification of the model, or simply selecting a simpler alternative model, may be desirable. Note: In nonlinear estimation, all inferential tests and statistics are approximate and, hence, they should be interpreted cautiously and in conjunction with other information such as the size, sign and relevance of parameter estimates, as well as visual confirmation that the estimated curve approximates the observed data well.

To, that end, the final plots show the model runs through the observed data and the residuals are randomly scattered with no trending and they are of reasonable magnitude (between -3 and 3 for studentized residuals). All this information together indicates the exponential model is an appropriate description of this data.

7 Maximum Likelihood Estimation and Curve Comparison

7.1 Initial Model Fitting

In this example we have two treatments. We would like to fit germination curves to each treatment and then compare them. The data, however, are binomial counts, not continuous normal data, therefore, we need to use a generalized model (Proc Nlmixed) for the final estimation. Before doing so, however, it is a good idea to see if we can generally fit a curve (logistic in this case) to the data. To do so, Proc Nlin is used, ignoring the count nature of the data. This provides us with estimated starting values for the actual generalized model later. We will ignore all inferential statistics here and only concentrate on the estimated parameters and the visual fit of the curves. Once satisfied with these we can move on to the curve comparisons.

The code below sorts the data by treatment, fits the logistic model separately for each, and then plots the predicted curves with the observed data. As noted above, when the data pattern is asymmetric, as in this case, a logrithmic transformation is useful. This is done for the variable Day and the parameter \(L\).

proc sort data=onion;
    by trt day;
run;

proc nlin data=onion;
    parms M=80 K=1 L=4;
    model germ = M/(1 + exp(-K*(log(day) - log(L))));
    by trt;
    output out=pred p=pr;
run;

proc sgplot data=pred;
    styleattrs datacolors=(cxFD07FA cx0752FD);
    scatter x=day y=germ/group=trt  FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKEROUTLINEATTRS=(color=black);
    series x=day y=pr/group=trt;
    xaxis label='Day' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label="Germinated" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    title1 'Logistic Model';
run;
Parameter Estimate: Trt = 0 Estimate: Trt = -0.662
M 79.8596 81.1036
K 4.7118 5.4270
L 2.6412 4.2480


Logistic Fit to Onion Data

The estimated curves seem to fit the data well. The parameter estimates will be used in the following section to run the curve comparison, while accounting for the binomial nature of the data.

7.2 Curve Comparison

Curve comparison is done using Proc Nlmixed. Unlike Proc Nlin, Nlmixed allows for Contrast statements on estimated parameters. Hence, even when the data at hand are normally distributed, curve comparison will require Proc Nlmixed. Additionally, Proc Nlmixed carries out estimation through maximum likelihood. Because of this, the model is specified in terms of the distribution and it’s parameters. Lastly, we will use a Dummy Variable Regression specification to build a full model that simultaneously fits the two treatments. This process is similar to that covered for linear regression here. While this example uses two treatments, multiple treatments can be fit as well by extending the framework shown here. Keep in mind, however, that fitting too many treatments at once will reduce statistical power and is lacking in parsimony.

We start by specifying parameters M, K, and L for each treatment using the starting values above (approximation is fine here). The numeric indices 1 and 2 refer to treatments 0 and -0.662, respectively. Of note, parameter M is now expressed as a proportion, not the percentage that was used in Proc Nlin because the binomial distribution expects values between 0.0 and 1.0. The binomial distribution requires two parts: An expected value or mean, and the total number of trials per observations, N. The expected value will be the logistic model, defined here as mu. In the experiment, each observation was based on the number of seeds germinated out of an initial 100 seeds, so the number of trials is 100, which is recorded in variable total. The model statement then is expressed as germ distributed as (~) a binomial with expected value mu, a function of our parameters, and the total.

Contrast statements are written differently than most SAS modeling procedures and are more simply expressed as equations such as \(M1 - M2\) or similar. Note, however, that contrasts don’t just involve treatments, but must also specify what part or parameter in the model to test. Below, tests are done to see whether treatments differ in the maximum extent of germination, the rate of germination, or the lag time in germination. A Predict statement is also used to output the predicted values to a data set Pred. In this statement, mu is multiplied by 100 to convert the predicted values back to a percentage scale.


proc nlmixed data=onion;
    parms M1 = .80 K1 = 4.7  L1 = 2.64 
          M2 = .81 K2 = 5.42 L2 = 4.25;

    /**  If an observation is from trt=0, then use trt=0 parameters. Otherwise, use trt=-0.662 parameters  **/
    if trt=0 then do;
        M = M1;
        K = K1;
        L = L1;
    end;
    else if trt=-0.662 then do;
        M = M2;
        K = K2;
        L = L2;
    end;

    mu = M/(1 + exp(-K*(log(day) - log(L))));
    model germ~binomial(total, mu);

    contrast 'Parameters M' M1-M2;
    contrast 'Parameters K' K1-K2;
    contrast 'Parameters L' L1-L2;

    predict 100*mu out = pred;
    title1 'Logistic Model Comparisons: Full Model';
run;
Logistic Model Comparisons: Full Model

Specifications
Data Set WORK.ONION
Dependent Variable Germ
Distribution for Dependent Variable Binomial
Optimization Technique Dual Quasi-Newton
Integration Method None

Dimensions
Observations Used 240
Observations Not Used 0
Total Observations 240
Parameters 6

Initial Parameters
M1 K1 L1 M2 K2 L2 Negative
Log
Likelihood
0.8 4.7 2.64 0.81 5.42 4.25 726.462991

Iteration History
Iteration Calls Negative
Log
Likelihood
Difference Maximum
Gradient
Slope
1 7 726.3097 0.153291 40.0155 -733.277
2 10 724.8178 1.491938 228.173 -246.853
3 14 724.0280 0.789744 150.754 -3825.47
4 16 723.7816 0.246454 124.159 -13.6850
5 20 719.3527 4.428818 18.1069 -21.7328
6 23 719.2679 0.084847 5.23785 -0.15551
7 26 719.2621 0.005768 1.00919 -0.00816
8 29 719.2621 0.000036 0.057938 -0.00007
9 32 719.2621 4.484E-7 0.006386 -8.07E-7

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood 1438.5
AIC (smaller is better) 1450.5
AICC (smaller is better) 1450.9
BIC (smaller is better) 1471.4

Parameter Estimates
Parameter Estimate Standard
Error
DF t Value Pr > |t| 95% Confidence Limits Gradient
M1 0.7972 0.004992 240 159.70 <.0001 0.7874 0.8070 0.000919
K1 5.1117 0.3128 240 16.34 <.0001 4.4954 5.7280 0.000029
L1 2.6797 0.03938 240 68.04 <.0001 2.6021 2.7573 0.000031
M2 0.8067 0.005434 240 148.45 <.0001 0.7960 0.8174 0.006386
K2 6.1220 0.2806 240 21.82 <.0001 5.5693 6.6747 0.000026
L2 4.2794 0.04517 240 94.74 <.0001 4.1905 4.3684 -0.00094

Contrasts
Label Num DF Den DF F Value Pr > F
Parameters M 1 240 1.64 0.2012
Parameters K 1 240 5.78 0.0170
Parameters L 1 240 712.55 <.0001

The output here is similar to Proc Nlin with iterations, etc. The parameter estimates are very similar to the least squares estimates. Due to the different estimation method, however, they will not be exactly the same. An advantage of the generalized model assuming a binomial here is that the estimates will stay within the theoretical bounds, unlike least squares assuming normality. For example, the maximum germination estimates cannot exceed 100% or a proportion of 1.0 in the generalized model, while they potentially could under least squares.

The table of contrasts indicates that the maximum germination of the two treatments, \(M\), is similar at about 80%, while the rate of germination, \(K\), for the drier treatment, -0.662, is higher than the control and also has a longer lag time in germination, estimated by parameter \(L\).

8 Mixed Model Estimation

Mixed effects in nonlinear models may be considered when there are additional known factors which could affect the variability in estimation. Examples might be data sets that represent multiple studies, experiments, locations, seedlots, animal barns, etc. Like the case of curve comparison, we must consider where that variation might occur among the model parameters. In a growth curve, would it affect the parameter(s) for rate, or the maximum growth, or both? This can only be assessed by fitting models both with and without random effects. Caution is advised, however, as adding multiple random effects can make model estimation difficult and it also adds additional variance parameters, reducing parsimony and requiring further guessing about starting values.

The initial step is to fit a fixed model assuming no random effects. In the code below, a fixed logistic model is fit to absorbence data. In this case, the logistic model is a decreasing function starting close to 100% (1.0). This model is assuming normality where the mean or expected value of the distribution is defined in mu as the logistic model using logrithmic transformations on ALS dose and the parameter \(L\). The variance of the normal distribution also needs to be specified as a parameter, sigma2. Starting values were determined from Proc Nlin (not shown). The starting value for sigma2 was obtained from the MSE value in that estimation. Because the data have observations at a dose of 0.0, and the model attempts to use a log(dose) function, which would be undefined at dose=0.0, there is an additional If statement to reduce the model under this condition to mu=M.


proc nlmixed data=als corr;  
    parms  M=.94 b=1 L=2.748 sigma2=38;

    if Dose=0 then mu=M;    /**  Steps to avoid log(0)  **/
    else mu = M/(1 + exp(B*(log(Dose)-log(L))));

    model percent ~normal(mu, sigma2);

title1 'Winter Wheat, Continuous Response: Fixed Model';
run;
Winter Wheat, Continuous Response: Fixed Model

Specifications
Data Set WORK.ALS
Dependent Variable Percent
Distribution for Dependent Variable Normal
Optimization Technique Dual Quasi-Newton
Integration Method None

Dimensions
Observations Used 56
Observations Not Used 0
Total Observations 56
Parameters 4

Initial Parameters
M b L sigma2 Negative
Log
Likelihood
0.94 1 2.748 38 2351.22301

Iteration History
Iteration Calls Negative
Log
Likelihood
Difference Maximum
Gradient
Slope
1 8 884.6597 1466.563 24.6599 -476.477
2 12 333.8276 550.8321 5.20083 -580.170
3 16 309.2824 24.54522 11.5213 -26.9999
4 20 289.3418 19.94064 2.41527 -11.4335
5 24 286.4067 2.93508 3.37836 -3.19529
6 38 226.8640 59.54271 2.91944 -1.50236
7 43 226.7298 0.134124 3.74046 -55.7129
8 49 202.9336 23.79621 10.3992 -9.54469
9 52 200.4843 2.449369 5.28665 -16.9803
10 54 197.3912 3.093109 3.07122 -3.58802
11 57 197.0765 0.314667 2.95303 -0.87153
12 60 196.9340 0.142524 1.00865 -0.24142
13 63 196.9129 0.021055 0.31438 -0.06472
14 66 196.9024 0.010551 0.65743 -0.00694
15 74 193.9927 2.909663 10.6529 -0.01512
16 81 185.4806 8.512097 20.0505 -6.21263
17 84 180.6497 4.830874 8.37607 -26.2304
18 88 180.1312 0.518539 4.46786 -6.66086
19 91 179.9314 0.199792 3.10243 -1.84137
20 94 179.8563 0.075064 0.68888 -0.16647
21 98 179.8537 0.002641 0.21606 -0.02095
22 101 179.8536 0.000116 0.017616 -0.00025
23 104 179.8536 1.114E-6 0.000244 -2.28E-6
24 107 179.8536 5.291E-9 0.000081 -1.08E-8

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood 359.7
AIC (smaller is better) 367.7
AICC (smaller is better) 368.5
BIC (smaller is better) 375.8

Parameter Estimates
Parameter Estimate Standard
Error
DF t Value Pr > |t| 95% Confidence Limits Gradient
M 94.1121 2.0799 56 45.25 <.0001 89.9456 98.2786 8.353E-6
b 1.0348 0.05868 56 17.63 <.0001 0.9172 1.1524 0.000047
L 2.7483 0.1937 56 14.19 <.0001 2.3603 3.1363 0.000081
sigma2 36.0701 6.8166 56 5.29 <.0001 22.4147 49.7255 -5.55E-7

Correlation Matrix of Parameter Estimates
M b L sigma2
M 1.0000 -0.4063 -0.7374 0.0000
b -0.4063 1.0000 0.3850 0.0000
L -0.7374 0.3850 1.0000 -0.0000
sigma2 0.0000 0.0000 -0.0000 1.0000

What is important in this output are the fit statistics. The AIC here is 367.7.

Next, the logistic is fit allowing for the \(ED_{50}\) estimate, \(L\) to have a random effect from experiment (plate). This is entered into the model in three steps: 1) by defining a variance for \(L\) as sig_l, and 2) adding statement to define the random \(L\) parameter as L1 = L + u, e.g. L1 is the original parameter plus a random component, and lastly, 3) adding a random statement that defines the random component u as a normal variable with mean=0 and variance of \(sig\_l^2\). A subject option denotes where the random effect comes from, plate. Note also that the logistic model now uses the newly defined L1 parameter. The Predict statements for predicted values, however, use the original \(L\). If we did not do this, the predicted line would represent random fluctuation, not a smooth predicted curve.


proc nlmixed data=als ;  
    parms  M=.94 b=1 L=2.748 sigma2=38  sig_l = 1;

    L1 = L + u;
    if Dose=0 then mu=M;    /**  Steps to avoid log(0)  **/
    else mu = M/(1 + exp(B*(log(Dose)-log(L1))));

    model percent ~normal(mu, sigma2);
    random u ~ normal([0], [sig_l**2]) subject=plate;
    predict M/(1 + exp(B*(log(Dose+0.1)-log(L)))) out = pred;

title1 'Winter Wheat, Continuous Response: Random Model (L)';
run;

proc sort data=pred;
    by dose;
run;

proc sgplot data=pred;
    scatter x=dose y=percent/FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKERFILLATTRS=(color=tan) MARKEROUTLINEATTRS=(color=black);
    series x=dose y=upper;
    series x=dose y=lower;
    series x=dose y=pred;
    xaxis label='ALS Herbicide Dose' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label="ALS Activity (% Absorbance)" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;
Winter Wheat, Continuous Response: Random Model (L)

Specifications
Data Set WORK.ALS
Dependent Variable Percent
Distribution for Dependent Variable Normal
Random Effects u
Distribution for Random Effects Normal
Subject Variable Plate
Optimization Technique Dual Quasi-Newton
Integration Method Adaptive Gaussian Quadrature

Dimensions
Observations Used 56
Observations Not Used 0
Total Observations 56
Subjects 8
Max Obs per Subject 7
Parameters 5
Quadrature Points 1

Initial Parameters
M b L sigma2 sig_l Negative
Log
Likelihood
0.94 1 2.748 38 1 2351.1212

Iteration History
Iteration Calls Negative
Log
Likelihood
Difference Maximum
Gradient
Slope
1 8 831.9076 1519.214 23.5213 -483.043
2 12 334.7991 497.1085 5.97950 -525.548
3 16 278.9919 55.80715 2.79165 -38.3667
4 18 275.8544 3.137487 2.79277 -4.76914
5 22 273.0654 2.78899 4.88799 -1.64060
6 291 219.9774 53.08808 17.6545 -7.02670
7* 297 216.0941 3.883216 0.59118 -219.933
8 299 215.7704 0.323706 4.12978 -0.06219
9 301 215.2915 0.478987 0.54512 -0.55451
10 303 214.5412 0.750236 8.21685 -0.26351
11 310 211.7060 2.835235 15.7943 -1.58736
12 317 209.9275 1.778473 14.5760 -5.43158
13 319 207.2618 2.665749 7.28591 -10.1042
14 322 206.7364 0.525331 3.62850 -1.10840
15 324 205.9773 0.759154 2.87652 -0.30012
16 329 203.4997 2.477613 15.4830 -1.31747
17 331 199.9381 3.561604 11.7881 -27.0544
18 338 196.1037 3.834337 9.77497 -30.7343
19 342 195.3312 0.77251 3.93638 -3.89766
20 344 195.1109 0.220328 3.89521 -2.93744
21 346 194.7588 0.352069 0.87214 -0.64024
22 349 194.7172 0.041581 0.30391 -0.08144
23 352 194.6977 0.019565 0.35845 -0.01055
24 356 194.6377 0.059999 0.99235 -0.01591
25 361 194.4437 0.193935 1.86660 -0.11263
26 363 194.2791 0.164639 0.57421 -0.65523
27 365 194.0198 0.259332 1.02787 -0.19506
28 369 192.9211 1.098699 2.54044 -0.22950
29 376 191.8747 1.046347 35.4228 -1.92433
30 378 185.1822 6.692553 20.7788 -122.228
31 381 179.5439 5.638304 6.44752 -37.3200
32 384 178.2532 1.290613 17.0467 -2.25730
33 387 177.5889 0.664355 13.1564 -4.97466
34 391 175.8719 1.717037 3.06094 -2.24523
35 393 174.9813 0.890545 8.61356 -1.92998
36 397 172.5265 2.454811 3.24251 -15.6953
37 404 170.9078 1.61868 2.90202 -4.93989
38 407 170.2334 0.674382 4.42552 -0.85054
39 412 169.9614 0.271993 6.42306 -0.64752
40 415 169.8310 0.13044 0.31305 -0.27047
41 418 169.8293 0.0017 0.027929 -0.00324
42 421 169.8293 0.000027 0.002526 -0.00005
43 424 169.8293 1.281E-7 0.000260 -2.59E-7

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood 339.7
AIC (smaller is better) 349.7
AICC (smaller is better) 350.9
BIC (smaller is better) 350.1

Parameter Estimates
Parameter Estimate Standard
Error
DF t Value Pr > |t| 95% Confidence Limits Gradient
M 94.5055 1.4931 7 63.29 <.0001 90.9749 98.0361 3.723E-6
b 1.0409 0.04254 7 24.47 <.0001 0.9403 1.1415 -0.00026
L 2.7794 0.2716 7 10.23 <.0001 2.1371 3.4217 0.000031
sigma2 18.8887 3.8684 7 4.88 0.0018 9.7414 28.0360 -0.00001
sig_l -0.6573 0.1969 7 -3.34 0.0124 -1.1228 -0.1918 0.000257



The SGPlot Procedure


The fit statistics here give an AIC value of 349.7 which is lower than the fixed model. Hence, this suggests that the addition of the random effect improves the model fit. The final plot shows the fitted curve with 95% confidence bounds.