1 Introduction

This tutorial is an introduction to nonlinear regression models in SAS. Some additional knowledge of SAS may be useful for understanding certain aspects of this material. For more information on these requirements, the reader is referred to the tutorials on the Data Step and Graphics and Plotting.

Nonlinear models are often associated with functions that curve or are not straight lines. This, however, is a not a precise definition. Linear or nonlinear in statistical models refers to how the model parameters enter the model function. Some models that graphically appear “curved” are, in fact, linear models. For example, a quadratic model is graphically a concave or convex shape, but the model is linear:

\[y_{i} = \beta_0 + \beta_1 \cdot x_{1} + \beta_2 \cdot x_{1}^2 + \epsilon_{i}\] In this model, the parameters \(\beta_0\), \(\beta_1\), and \(\beta_2\) are additive relative to one another making this a linear model. An example of a nonlinear model would be:

\[y_{i} = \beta_0\cdot x_{i}^{\beta_1} + \epsilon_{i}\]

where the parameters \(\beta_0\) and \(\beta_1\) are multiplicative relative to each other. In terms of estimation, this makes a difference. In linear models, solutions for parameter estimates can be derived algebraically, while no closed form solutions or equations can be derived for nonlinear models. Estimation in nonlinear models is obtained through an iterative “trial-and-error” process, the details of which can vary and are only covered briefly here. For more technical material on nonlinear estimation methods, see texts such as Nonlinear Regression Analysis and Its Applications.

Regardless of the methodology, the iterative estimation process typically requires the user to specify starting values for each parameter. These are basically guesses at what the parameters might be to start the process. The user may have some clues from the data, or they may be completely unknown to the user. Such details will depend on the model used, knowledge of the process being modeled, and the data itself. Examples for these aspects will be demonstrated below.

1.0.0.0.1 Note: Linearization

The ability to estimate nonlinear models is, computationally, relatively recent. Before this ability was available, nonlinear model forms were often transformed or linearized such that they became linear with additive parameters. This made estimation through standard linear regression possible. An example would be the power model above:

\[y_{i} = \beta_0\cdot x_{i}^{\beta_1} + \epsilon_{i}\]

By taking the logarithm of the model, we get a linear form:

\[ln(y_{i}) = ln(\beta_0) + \beta_1\cdot x_{i}^{} + \phi_{i} \] \[ y_{i}^*= \beta_0^* + \beta_1\cdot x_{i} + \phi_{i}\] where \(y_{i}^*\) is the logarithm of the response, \(y_{i}\), and the intercept of this model, \(\beta_0^*\), is the logarithm of the original parameter \(\beta_0\). The exponential function could then be used to untransform the predicted values and intercept parameter. Note, however, that the error term \(\phi_{i}\) is not equivalent to the additive error \(\epsilon_{i}\). Thus, linearization provides only an approximation to the desired nonlinear form. For these reasons, it is now more desirable to use true nonlinear estimation techniques rather than linearization. In some cases, however, linearization may be useful for deriving the starting values necessary for the nonlinear estimation process.

2 Model Selection

There are an infinite number of nonlinear forms that can be selected for representing data. The selection process, however, should follow some basic guidelines. First and foremost, model selection, and the use of nonlinear modeling in general, should meet the research objectives. What is the purpose of modeling and what does the researcher hope to gain from it? A second and related factor is model interpretation. Are the parameters of the model interpretable in biological terms? Do they have biological relevance? This is often determined from theoretical considerations or previous research. Given these decisions, the data itself are an important consideration. What pattern does the data generally have? Is it increasing, decreasing, curved, etc.? Are there sufficient data points to support estimation of a model and do they fall in appropriate places to provide good estimation? As an example, many models attempt to estimate minimum or maximum theoretical values (asymptotes) when the values of dose or time are very large. Even though such doses or times are not practically relevant, they are important for model estimation. If the data set lacks points at large dose/time values, then the estimations will be poor, or even fail. Lastly, the complexity of model forms and their parameterization should be considered. Models containing many parameters (often more than 3) or parameters that involve exponents, ratios, or other multiplicative relationships can be difficult to estimate. Models should be as concise as possible with the fewest parameters as possible (parsimony). In many cases, parsimony and complexity issues can be resolved by reparameterizing the model or selecting a different model to make a simpler form for estimation.

3 Common Nonlinear Models

3.1 Exponential Forms

Exponential forms are often used to describe uniformly decreasing or increasing curved trends found in response decay or growth. They are typically two to three parameter models. Two common forms used in agricultural research are:

\[ y = a*exp(-b*x) + c \tag*{(Decreasing)} \] \[y = M - a*exp(-b*x) \tag*{(Increasing)} \] In these models, the parameter \(a\) is releated to an intercept at \(x=0\), \(b\) is a rate related parameter, \(M\) is an upper asymptote, and \(c\) is a lower asymptote, both approached as \(x\) grows larger. In many cases, \(c\) is often assumed to be zero and \(M\) may be set to 1.0 or 100 for proportional or percentage data, respectively.

Decreasing Exponential Increasing Exponential

3.2 S-Curve Forms

Similar to the exponential forms, S-curves may be either increasing or decreasing. A key difference, however, is that these forms have an inflection point midway where the rate of change in the curve slows and changes direction giving the curves their characteristic “S” shape. They are commonly used to describe dose-response relationships or growth over time. Many are taken directly from the functional forms of cumulative probability distributions for theoretical reasons (for example probit, tobit and logit models). Others are derived from mathematical derivations of growth or decay. Some specific types of S-curves are outlined below.

Increasing S-Curve Decreasing S-Curve

3.2.1 Logistic Curve

The logistic curve (not to be confused with logistic regression) is often seen in dose-response work with a probabilistic interpretation, or from differential equations describing growth. Its parameterization can vary widely. The form below is often seen in agricultural dose-response work.

\[ y = \frac{M}{1 + exp(K*(x - L))}\] Here, \(M\) is an upper asymptote representing a theoretical maximum of the response, \(y\). \(K\) is a rate parameter. When \(K\) is positive, the curve begins close to \(M\) and decreases. When negative, the curve starts close to zero and increases to \(M\). The parameter \(L\) is, by default, the infection point at 50% of \(M\). That is, when the variable \(x\) equals \(L\) the response \(y\) equals \(\frac{M}{2}\). In dose-response work, this 50th percentile is commonly known as the \(LD_{50}\) or \(ED_{50}\). In growth models, \(L\) represents a quantification of lag time for the start of growth. The quantity \((x - L)\) often appears in logrithmic form as this transformation makes the data pattern symmetric about the inflection point (as in the data examples below). Hence, the form may appear as \((ln(x) - L)\), \(ln(\frac{x}{L})\), or an equivalent. It is also possible to have \(L\) estimate other percentiles by reparameterizing the function (see for example Price, et. al, 2017.

3.2.2 Probability Based Models

These encompass a wide variety of models that were traditionally used for dose-response modeling. Examples include cumulative distributions for the Normal (probit), t (tobit), and logistic (logit). Because these functions are complex and involve integration to compute, they are not readily written out in equation form, but instead either use predefined probability functions in SAS, or dedicated SAS procedures (Proc Probit).

3.2.3 Asymmetric S-Curves

All the curves described above exhibit symmetry about the inflection point. As mentioned, sometimes asymmetry in the data is accommodated through logrithmic transformations in the model. There are some models, however, that can directly assume asymmetry. Two examples of this are the Weibull and Gompherz.

\[y = M*(1-exp(-((K*x)^C)))\tag*{Weibull}\]

\[y = = M*exp(−exp(−K*(x − L)))\tag*{Gomphertz}\] In these forms, \(M\), \(K\), and \(L\) play similar roles as those in the logistic model above, although \(L\) does not relate to the 50th percentile.

3.2.4 Hormesis Models

There are a special class of dose-response models where the data generally follow the decreasing S-Curve shape, but they also exhibit a small initial increase at very low doses. This behavior, for example, is often observed in some herbicides that are similar to growth hormones and small doses actually induce growth, rather than plant injury. This process is referred to as hormesis and one model to describe this phenomenon was given by Brain and Cousens, 1989:

\[y = \frac{(M + f*x)}{1 + exp(B*(log(x)-log(L)))}\] This model is similar to the logistic model above with the addition of the parameter \(f\) which creates the bump in the curve at small doses, \(x\). An example of fitting this curve can be found at Price, et. al, 2017

4 Model Estimation

Nonlinear estimation is an iterative process. That is, there are no closed form equations to compute parameter values. Because of this, the computer algorithms that estimate parameters will use a trial and error process to judge whether the estimates it has are “good enough”, which usually means they have met some predetermined numeric condition (referred to as convergence). This requires that the algorithm has a place to start, or starting parameter values, to begin the estimation. The algorithm will then adjust the starting values to improve to model fit and continue doing so until the model converges. Conceptually, this process is analogous to letting a marble drop into a bowl from the edge. The marble will oscillate back and forth and around until it settles at the bottom or a low spot.

Traditionally, the trial and error iterative process has been based on the concept of least squares where the objective is to find parameters such that the distance between the predicted curve and the observed points is minimal. In SAS, the least squares estimation process is carried out through the procedure Proc Nlin. While Proc Nlin is still useful, it does have limitations because, for statistical inference, it assumes the data are normally distributed in addition to assuming that all parameters in the model are fixed effects. Its benefits, however, include ease of use and good convergence on parameter solutions, which can be useful for deriving starting values to use in more sophisticated nonlinear modeling procedures.

A more modern alternative to least squares estimation is the process of maximum likelihood estimation. In maximum likelihood, the objective of the parameter iterations and adjustments is to maximize the likelihood or probability of observing the data at hand. This has the obvious advantage that the likelihood function does not have to be a normal distribution, but could, alternatively, come from a wide array of potential distributions (generalized nonlinear models). The procedure available in SAS, Proc Nlmixed, also has the ability to consider parameters as random effects, thereby giving these models more flexibility to accurately represent the data. The drawbacks here are a more sensitive and complex estimation algorithm that can show difficulty in converging on a solution. Obtaining good starting values close to the final estimates can be essential when using Proc Nlmixed. The least squares procedure, Proc Nlin above, can be a good way of deriving these.

Both least squares Proc Nlin and maximum likelihood Proc Nlmixed will be demonstrated below.

5 Data used in examples

5.1 Exponential Data

These data are a simple simulated set of variables for demonstrating the exponential model and basic nonlinear estimation and results.


data exp;
    do x = 0 to 20;
        do rep = 1 to 4;
            y = 100*exp(-.2*x) + rannor(0934)*5;
            output;
        end;
    end;
;

proc sgplot;
        scatter x=x y=y/filledoutlinedmarkers markeroutlineattrs=(color=black) markerattrs=(symbol=circlefilled size=12) MARKERFILLATTRS=(color=green);
    xaxis LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;

The SGPlot Procedure

5.2 Onion Data

These data are from an experiment assessing the germination of onion seeds under varying osmotic pressure (available moisture). In the experiemnt, there were two petri dishes of 100 seeds each. This was replicated four times. The cumulative number of seeds germinated (emerged radical of 1 mm) were recorded for 20 days.


PROC IMPORT OUT= WORK.onion
    DATAFILE= "./data/onion.xlsx"
    DBMS=XLSX REPLACE;
    sheet="Onion";
RUN;

proc sort data=onion;
    by trt day;
run;

proc sgplot data=onion;
    styleattrs datacolors=(cxFD07FA cx0752FD);
    scatter x=day y=germ/group=trt filledoutlinedmarkers markeroutlineattrs=(color=black) markerattrs=(symbol=circlefilled size=12);;
    xaxis type=log logbase=e logstyle=linear label='Day' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label='Germination (%)' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    keylegend /AUTOITEMSIZE valueattrs=(size=14) TITLEATTRS=(weight=bold size=12) Title = 'Osmotic Pressure (Pa)'; 
run;

The SGPlot Procedure

5.3 ALS Activity in Winter Wheat

These data record the activity of Acetolactate Synthase (ALS) as % absorbtion in winter wheat under increasing doses of a herbicide that inhibits ALS (Rainbolt, C. R., et al., 2005. The study was carried out over eight replications (plates).


PROC IMPORT OUT= WORK.ALS
    DATAFILE= "./data/ALS.xlsx"
    DBMS=XLSX REPLACE;
    sheet="ALS";
RUN;
    
proc sgplot;
    scatter x=dose y=percent/FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKERFILLATTRS=(color=tan) MARKEROUTLINEATTRS=(color=black);;
    xaxis type=log logbase=e logstyle=linear label='ALS Herbicide Dose (ug/ml)' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label="ALS Activity (% Absorbance)" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;

The SGPlot Procedure

6 Least Squares Estimation

The following code applies least squares estimation to an exponential model with the procedure Proc Nlin. The predicted curve and studentized model residuals are output and plotted to visually assess the model fit. The Parms statement is used to specify the starting values for the parameters a and b. The method=guass option applies one of several available least squares estimation algorithms.


proc nlin data=exp method=gauss;
    parms a=100 b=-.2;
    model y = a*exp(-b*x);
    output out=nl_pred p=pr student=resid;
run;

proc sgplot;
    scatter x=x y=y;
        series x=x y=pr;
        
run;

proc sgplot;
    scatter x=x y=resid;
        refline 0/axis=y;
run;

The NLIN Procedure

Dependent Variable y

Method: Gauss-Newton

Iterative Phase
Iter	a	b	Sum of Squares
0	100.0	-0.2000	3.6017E8
1	5.6513	-0.1970	1070170
2	5.9832	-0.1402	222659
3	15.8768	0.00579	72281.7
4	68.0750	0.2716	20700.8
5	97.3920	0.1391	8512.4
6	96.9721	0.1850	1701.4
7	98.8214	0.2010	1453.3
8	98.8811	0.2017	1452.9
9	98.8788	0.2016	1452.9

NOTE: Convergence criterion met.

Estimation Summary
Method	Gauss-Newton
Iterations	9
R	4.364E-6
PPC(b)	9.834E-7
RPC(b)	0.000053
Object	5.532E-8
Objective	1452.874
Observations Read	84
Observations Used	84
Observations Missing	0

Note:

An intercept was not specified for this model.

Source	DF	Sum of Squares	Mean Square	F Value	Approx Pr > F
Model	2	117815	58907.5	3324.73	<.0001
Error	82	1452.9	17.7180
Uncorrected Total	84	119268

Parameter	Estimate	Approx Std Error	Approximate 95% Confidence Limits
a	98.8788	1.5696	95.7564	102.0
b	0.2016	0.00502	0.1917	0.2116

Approximate Correlation Matrix
	a	b
a	1.0000000	0.6349554
b	0.6349554	1.0000000

The SGPlot Procedure

In this output, the first table lists the changes made to the parameters over the iterations required to converge on an answer (9). Also listed are the sum of squares error which decreases until it stabilizes at 1452. The note following the table is critical, stating the convergence was met. When using any nonlinear procedure, this statement or equivalent is essential. Results from analyses not meeting convergence should not be used.

An “ANOVA” like table lists significance for the model, p<0.0001, suggesting the exponential model fit the data. A parameter table with estimates and confidence intervals is also shown where a = 98.87 and b = 0.20. The last table lists correlation between the parameters. These correlations are important when they reach high levels, say larger than 0.9, which may indicate that the model has too many parameters for this data to support. In such circumstances, reducing the number of parameters through respecification of the model, or simply selecting a simpler alternative model, may be desirable. Note: In nonlinear estimation, all inferential tests and statistics are approximate and, hence, they should be interpreted cautiously and in conjunction with other information such as the size, sign and relevance of parameter estimates, as well as visual confirmation that the estimated curve approximates the observed data well.

To, that end, the final plots show the model runs through the observed data and the residuals are randomly scattered with no trending and they are of reasonable magnitude (between -3 and 3 for studentized residuals). All this information together indicates the exponential model is an appropriate description of this data.

7 Maximum Likelihood Estimation and Curve Comparison

7.1 Initial Model Fitting

In this example we have two treatments. We would like to fit germination curves to each treatment and then compare them. The data, however, are binomial counts, not continuous normal data, therefore, we need to use a generalized model (Proc Nlmixed) for the final estimation. Before doing so, however, it is a good idea to see if we can generally fit a curve (logistic in this case) to the data. To do so, Proc Nlin is used, ignoring the count nature of the data. This provides us with estimated starting values for the actual generalized model later. We will ignore all inferential statistics here and only concentrate on the estimated parameters and the visual fit of the curves. Once satisfied with these we can move on to the curve comparisons.

The code below sorts the data by treatment, fits the logistic model separately for each, and then plots the predicted curves with the observed data. As noted above, when the data pattern is asymmetric, as in this case, a logrithmic transformation is useful. This is done for the variable Day and the parameter \(L\).

proc sort data=onion;
    by trt day;
run;

proc nlin data=onion;
    parms M=80 K=1 L=4;
    model germ = M/(1 + exp(-K*(log(day) - log(L))));
    by trt;
    output out=pred p=pr;
run;

proc sgplot data=pred;
    styleattrs datacolors=(cxFD07FA cx0752FD);
    scatter x=day y=germ/group=trt  FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKEROUTLINEATTRS=(color=black);
    series x=day y=pr/group=trt;
    xaxis label='Day' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label="Germinated" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    title1 'Logistic Model';
run;

Parameter	Estimate: Trt = 0	Estimate: Trt = -0.662
M	79.8596	81.1036
K	4.7118	5.4270
L	2.6412	4.2480

Logistic Fit to Onion Data

The estimated curves seem to fit the data well. The parameter estimates will be used in the following section to run the curve comparison, while accounting for the binomial nature of the data.

7.2 Curve Comparison

Curve comparison is done using Proc Nlmixed. Unlike Proc Nlin, Nlmixed allows for Contrast statements on estimated parameters. Hence, even when the data at hand are normally distributed, curve comparison will require Proc Nlmixed. Additionally, Proc Nlmixed carries out estimation through maximum likelihood. Because of this, the model is specified in terms of the distribution and it’s parameters. Lastly, we will use a Dummy Variable Regression specification to build a full model that simultaneously fits the two treatments. This process is similar to that covered for linear regression here. While this example uses two treatments, multiple treatments can be fit as well by extending the framework shown here. Keep in mind, however, that fitting too many treatments at once will reduce statistical power and is lacking in parsimony.

We start by specifying parameters M, K, and L for each treatment using the starting values above (approximation is fine here). The numeric indices 1 and 2 refer to treatments 0 and -0.662, respectively. Of note, parameter M is now expressed as a proportion, not the percentage that was used in Proc Nlin because the binomial distribution expects values between 0.0 and 1.0. The binomial distribution requires two parts: An expected value or mean, and the total number of trials per observations, N. The expected value will be the logistic model, defined here as mu. In the experiment, each observation was based on the number of seeds germinated out of an initial 100 seeds, so the number of trials is 100, which is recorded in variable total. The model statement then is expressed as germ distributed as (~) a binomial with expected value mu, a function of our parameters, and the total.

Contrast statements are written differently than most SAS modeling procedures and are more simply expressed as equations such as \(M1 - M2\) or similar. Note, however, that contrasts don’t just involve treatments, but must also specify what part or parameter in the model to test. Below, tests are done to see whether treatments differ in the maximum extent of germination, the rate of germination, or the lag time in germination. A Predict statement is also used to output the predicted values to a data set Pred. In this statement, mu is multiplied by 100 to convert the predicted values back to a percentage scale.


proc nlmixed data=onion;
    parms M1 = .80 K1 = 4.7  L1 = 2.64 
          M2 = .81 K2 = 5.42 L2 = 4.25;

    /**  If an observation is from trt=0, then use trt=0 parameters. Otherwise, use trt=-0.662 parameters  **/
    if trt=0 then do;
        M = M1;
        K = K1;
        L = L1;
    end;
    else if trt=-0.662 then do;
        M = M2;
        K = K2;
        L = L2;
    end;

    mu = M/(1 + exp(-K*(log(day) - log(L))));
    model germ~binomial(total, mu);

    contrast 'Parameters M' M1-M2;
    contrast 'Parameters K' K1-K2;
    contrast 'Parameters L' L1-L2;

    predict 100*mu out = pred;
    title1 'Logistic Model Comparisons: Full Model';
run;

Logistic Model Comparisons: Full Model

Specifications
Data Set	WORK.ONION
Dependent Variable	Germ
Distribution for Dependent Variable	Binomial
Optimization Technique	Dual Quasi-Newton
Integration Method	None

Dimensions
Observations Used	240
Observations Not Used	0
Total Observations	240
Parameters	6

Initial Parameters
M1	K1	L1	M2	K2	L2	Negative Log Likelihood
0.8	4.7	2.64	0.81	5.42	4.25	726.462991

Iteration History
Iteration	Calls	Negative Log Likelihood	Difference	Maximum Gradient	Slope
1	7	726.3097	0.153291	40.0155	-733.277
2	10	724.8178	1.491938	228.173	-246.853
3	14	724.0280	0.789744	150.754	-3825.47
4	16	723.7816	0.246454	124.159	-13.6850
5	20	719.3527	4.428818	18.1069	-21.7328
6	23	719.2679	0.084847	5.23785	-0.15551
7	26	719.2621	0.005768	1.00919	-0.00816
8	29	719.2621	0.000036	0.057938	-0.00007
9	32	719.2621	4.484E-7	0.006386	-8.07E-7

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	1438.5
AIC (smaller is better)	1450.5
AICC (smaller is better)	1450.9
BIC (smaller is better)	1471.4

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	95% Confidence Limits		Gradient
M1	0.7972	0.004992	240	159.70	<.0001	0.7874	0.8070	0.000919
K1	5.1117	0.3128	240	16.34	<.0001	4.4954	5.7280	0.000029
L1	2.6797	0.03938	240	68.04	<.0001	2.6021	2.7573	0.000031
M2	0.8067	0.005434	240	148.45	<.0001	0.7960	0.8174	0.006386
K2	6.1220	0.2806	240	21.82	<.0001	5.5693	6.6747	0.000026
L2	4.2794	0.04517	240	94.74	<.0001	4.1905	4.3684	-0.00094

Contrasts
Label	Num DF	Den DF	F Value	Pr > F
Parameters M	1	240	1.64	0.2012
Parameters K	1	240	5.78	0.0170
Parameters L	1	240	712.55	<.0001

The output here is similar to Proc Nlin with iterations, etc. The parameter estimates are very similar to the least squares estimates. Due to the different estimation method, however, they will not be exactly the same. An advantage of the generalized model assuming a binomial here is that the estimates will stay within the theoretical bounds, unlike least squares assuming normality. For example, the maximum germination estimates cannot exceed 100% or a proportion of 1.0 in the generalized model, while they potentially could under least squares.

The table of contrasts indicates that the maximum germination of the two treatments, \(M\), is similar at about 80%, while the rate of germination, \(K\), for the drier treatment, -0.662, is higher than the control and also has a longer lag time in germination, estimated by parameter \(L\).

8 Mixed Model Estimation

Mixed effects in nonlinear models may be considered when there are additional known factors which could affect the variability in estimation. Examples might be data sets that represent multiple studies, experiments, locations, seedlots, animal barns, etc. Like the case of curve comparison, we must consider where that variation might occur among the model parameters. In a growth curve, would it affect the parameter(s) for rate, or the maximum growth, or both? This can only be assessed by fitting models both with and without random effects. Caution is advised, however, as adding multiple random effects can make model estimation difficult and it also adds additional variance parameters, reducing parsimony and requiring further guessing about starting values.

The initial step is to fit a fixed model assuming no random effects. In the code below, a fixed logistic model is fit to absorbence data. In this case, the logistic model is a decreasing function starting close to 100% (1.0). This model is assuming normality where the mean or expected value of the distribution is defined in mu as the logistic model using logrithmic transformations on ALS dose and the parameter \(L\). The variance of the normal distribution also needs to be specified as a parameter, sigma2. Starting values were determined from Proc Nlin (not shown). The starting value for sigma2 was obtained from the MSE value in that estimation. Because the data have observations at a dose of 0.0, and the model attempts to use a log(dose) function, which would be undefined at dose=0.0, there is an additional If statement to reduce the model under this condition to mu=M.


proc nlmixed data=als corr;  
    parms  M=.94 b=1 L=2.748 sigma2=38;

    if Dose=0 then mu=M;    /**  Steps to avoid log(0)  **/
    else mu = M/(1 + exp(B*(log(Dose)-log(L))));

    model percent ~normal(mu, sigma2);

title1 'Winter Wheat, Continuous Response: Fixed Model';
run;

Winter Wheat, Continuous Response: Fixed Model

Specifications
Data Set	WORK.ALS
Dependent Variable	Percent
Distribution for Dependent Variable	Normal
Optimization Technique	Dual Quasi-Newton
Integration Method	None

Dimensions
Observations Used	56
Observations Not Used	0
Total Observations	56
Parameters	4

Initial Parameters
M	b	L	sigma2	Negative Log Likelihood
0.94	1	2.748	38	2351.22301

Iteration History
Iteration	Calls	Negative Log Likelihood	Difference	Maximum Gradient	Slope
1	8	884.6597	1466.563	24.6599	-476.477
2	12	333.8276	550.8321	5.20083	-580.170
3	16	309.2824	24.54522	11.5213	-26.9999
4	20	289.3418	19.94064	2.41527	-11.4335
5	24	286.4067	2.93508	3.37836	-3.19529
6	38	226.8640	59.54271	2.91944	-1.50236
7	43	226.7298	0.134124	3.74046	-55.7129
8	49	202.9336	23.79621	10.3992	-9.54469
9	52	200.4843	2.449369	5.28665	-16.9803
10	54	197.3912	3.093109	3.07122	-3.58802
11	57	197.0765	0.314667	2.95303	-0.87153
12	60	196.9340	0.142524	1.00865	-0.24142
13	63	196.9129	0.021055	0.31438	-0.06472
14	66	196.9024	0.010551	0.65743	-0.00694
15	74	193.9927	2.909663	10.6529	-0.01512
16	81	185.4806	8.512097	20.0505	-6.21263
17	84	180.6497	4.830874	8.37607	-26.2304
18	88	180.1312	0.518539	4.46786	-6.66086
19	91	179.9314	0.199792	3.10243	-1.84137
20	94	179.8563	0.075064	0.68888	-0.16647
21	98	179.8537	0.002641	0.21606	-0.02095
22	101	179.8536	0.000116	0.017616	-0.00025
23	104	179.8536	1.114E-6	0.000244	-2.28E-6
24	107	179.8536	5.291E-9	0.000081	-1.08E-8

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	359.7
AIC (smaller is better)	367.7
AICC (smaller is better)	368.5
BIC (smaller is better)	375.8

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	95% Confidence Limits		Gradient
M	94.1121	2.0799	56	45.25	<.0001	89.9456	98.2786	8.353E-6
b	1.0348	0.05868	56	17.63	<.0001	0.9172	1.1524	0.000047
L	2.7483	0.1937	56	14.19	<.0001	2.3603	3.1363	0.000081
sigma2	36.0701	6.8166	56	5.29	<.0001	22.4147	49.7255	-5.55E-7

Correlation Matrix of Parameter Estimates
	M	b	L	sigma2
M	1.0000	-0.4063	-0.7374	0.0000
b	-0.4063	1.0000	0.3850	0.0000
L	-0.7374	0.3850	1.0000	-0.0000
sigma2	0.0000	0.0000	-0.0000	1.0000

What is important in this output are the fit statistics. The AIC here is 367.7.

Next, the logistic is fit allowing for the \(ED_{50}\) estimate, \(L\) to have a random effect from experiment (plate). This is entered into the model in three steps: 1) by defining a variance for \(L\) as sig_l, and 2) adding statement to define the random \(L\) parameter as L1 = L + u, e.g. L1 is the original parameter plus a random component, and lastly, 3) adding a random statement that defines the random component u as a normal variable with mean=0 and variance of \(sig\_l^2\). A subject option denotes where the random effect comes from, plate. Note also that the logistic model now uses the newly defined L1 parameter. The Predict statements for predicted values, however, use the original \(L\). If we did not do this, the predicted line would represent random fluctuation, not a smooth predicted curve.


proc nlmixed data=als ;  
    parms  M=.94 b=1 L=2.748 sigma2=38  sig_l = 1;

    L1 = L + u;
    if Dose=0 then mu=M;    /**  Steps to avoid log(0)  **/
    else mu = M/(1 + exp(B*(log(Dose)-log(L1))));

    model percent ~normal(mu, sigma2);
    random u ~ normal([0], [sig_l**2]) subject=plate;
    predict M/(1 + exp(B*(log(Dose+0.1)-log(L)))) out = pred;

title1 'Winter Wheat, Continuous Response: Random Model (L)';
run;

proc sort data=pred;
    by dose;
run;

proc sgplot data=pred;
    scatter x=dose y=percent/FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKERFILLATTRS=(color=tan) MARKEROUTLINEATTRS=(color=black);
    series x=dose y=upper;
    series x=dose y=lower;
    series x=dose y=pred;
    xaxis label='ALS Herbicide Dose' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
    yaxis label="ALS Activity (% Absorbance)" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;

Winter Wheat, Continuous Response: Random Model (L)

Specifications
Data Set	WORK.ALS
Dependent Variable	Percent
Distribution for Dependent Variable	Normal
Random Effects	u
Distribution for Random Effects	Normal
Subject Variable	Plate
Optimization Technique	Dual Quasi-Newton
Integration Method	Adaptive Gaussian Quadrature

Dimensions
Observations Used	56
Observations Not Used	0
Total Observations	56
Subjects	8
Max Obs per Subject	7
Parameters	5
Quadrature Points	1

Initial Parameters
M	b	L	sigma2	sig_l	Negative Log Likelihood
0.94	1	2.748	38	1	2351.1212

Iteration History
Iteration	Calls	Negative Log Likelihood	Difference	Maximum Gradient	Slope
1	8	831.9076	1519.214	23.5213	-483.043
2	12	334.7991	497.1085	5.97950	-525.548
3	16	278.9919	55.80715	2.79165	-38.3667
4	18	275.8544	3.137487	2.79277	-4.76914
5	22	273.0654	2.78899	4.88799	-1.64060
6	291	219.9774	53.08808	17.6545	-7.02670
7*	297	216.0941	3.883216	0.59118	-219.933
8	299	215.7704	0.323706	4.12978	-0.06219
9	301	215.2915	0.478987	0.54512	-0.55451
10	303	214.5412	0.750236	8.21685	-0.26351
11	310	211.7060	2.835235	15.7943	-1.58736
12	317	209.9275	1.778473	14.5760	-5.43158
13	319	207.2618	2.665749	7.28591	-10.1042
14	322	206.7364	0.525331	3.62850	-1.10840
15	324	205.9773	0.759154	2.87652	-0.30012
16	329	203.4997	2.477613	15.4830	-1.31747
17	331	199.9381	3.561604	11.7881	-27.0544
18	338	196.1037	3.834337	9.77497	-30.7343
19	342	195.3312	0.77251	3.93638	-3.89766
20	344	195.1109	0.220328	3.89521	-2.93744
21	346	194.7588	0.352069	0.87214	-0.64024
22	349	194.7172	0.041581	0.30391	-0.08144
23	352	194.6977	0.019565	0.35845	-0.01055
24	356	194.6377	0.059999	0.99235	-0.01591
25	361	194.4437	0.193935	1.86660	-0.11263
26	363	194.2791	0.164639	0.57421	-0.65523
27	365	194.0198	0.259332	1.02787	-0.19506
28	369	192.9211	1.098699	2.54044	-0.22950
29	376	191.8747	1.046347	35.4228	-1.92433
30	378	185.1822	6.692553	20.7788	-122.228
31	381	179.5439	5.638304	6.44752	-37.3200
32	384	178.2532	1.290613	17.0467	-2.25730
33	387	177.5889	0.664355	13.1564	-4.97466
34	391	175.8719	1.717037	3.06094	-2.24523
35	393	174.9813	0.890545	8.61356	-1.92998
36	397	172.5265	2.454811	3.24251	-15.6953
37	404	170.9078	1.61868	2.90202	-4.93989
38	407	170.2334	0.674382	4.42552	-0.85054
39	412	169.9614	0.271993	6.42306	-0.64752
40	415	169.8310	0.13044	0.31305	-0.27047
41	418	169.8293	0.0017	0.027929	-0.00324
42	421	169.8293	0.000027	0.002526	-0.00005
43	424	169.8293	1.281E-7	0.000260	-2.59E-7

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	339.7
AIC (smaller is better)	349.7
AICC (smaller is better)	350.9
BIC (smaller is better)	350.1

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	95% Confidence Limits		Gradient
M	94.5055	1.4931	7	63.29	<.0001	90.9749	98.0361	3.723E-6
b	1.0409	0.04254	7	24.47	<.0001	0.9403	1.1415	-0.00026
L	2.7794	0.2716	7	10.23	<.0001	2.1371	3.4217	0.000031
sigma2	18.8887	3.8684	7	4.88	0.0018	9.7414	28.0360	-0.00001
sig_l	-0.6573	0.1969	7	-3.34	0.0124	-1.1228	-0.1918	0.000257

The SGPlot Procedure

The fit statistics here give an AIC value of 349.7 which is lower than the fixed model. Hence, this suggests that the addition of the random effect improves the model fit. The final plot shows the fitted curve with 95% confidence bounds.

Nonlinear Regression in SAS

Statistical Programs, University of Idaho

2022-11-09

1 Introduction

1.0.0.0.1 Note: Linearization

2 Model Selection

3 Common Nonlinear Models

3.1 Exponential Forms

3.2 S-Curve Forms

3.2.1 Logistic Curve

3.2.2 Probability Based Models

3.2.3 Asymmetric S-Curves

3.2.4 Hormesis Models

4 Model Estimation

5 Data used in examples

5.1 Exponential Data

5.2 Onion Data

5.3 ALS Activity in Winter Wheat

6 Least Squares Estimation

7 Maximum Likelihood Estimation and Curve Comparison

7.1 Initial Model Fitting

7.2 Curve Comparison

8 Mixed Model Estimation