Categorical responses are data that are intrinsically qualitative or non-numerical. As such, they present some unique characteristics and issues for analysis. There are multiple categorical procedures in SAS for dealing with this. This tutorial will cover common ones use to carry out a variety of analysis types.
Examples of categorical data would be responses such as gender, level of education, Yes-No answers, etc. A characteristic of such data is that the categories are non-overlapping or mutually exclusive. Because of this, when the data are summarized to numbers like percentages, the values must add to 100%, a condition known as “Sum-to-One”. A consequence of this is that if we know all but one category, we automatically know the missing category: If there are 60% Yes responses, then we know there are 40% No. If plants are recorded in three categories as 20% short and 30% medium, then we know there are also 50% tall. In analyses, this effects the number of parameters estimated. Generally, if there are C categories, then the analyses will estimate C-1 parameters.
Another characteristic of some categorical data is order. For example in the short, medium, tall example above, there is a natural progression or order to the categories. Occasionally, we might also categorize true continuous data such as income or age into ordered categories, e.g. age: 20-35, 36-50, and greater than 50. These Ordinal data are often treated differently than non-ordered data (nominal data) by looking at the cumulative responses across the ordered groups measuring the incremental change between levels rather than the absolute percentage at each level.
Individual categorical responses are referred to as Multinomials, as in a multinomial distribution (Multinomials with two categories are a special class referred to as Binomials). The individual categories or cells of the multinomial are characterized by the probability of that category occurring or being observed and the sum of all probabilities across the multinomial is 1.0. Categorical data can also occur in combinations or cross classifications of 2 or more multinomials and these are known as Contingency Tables. Like their singular counterparts, each combination or table cell is represented by a probability of that cell combination and these also sum to 1.0. Contingency tables can also be summarized by row or column totals and these are referred to as marginal distributions. All of these terms and structures can come into play during an analysis as will be demonstrated below.
These data come from survey of participants who participated in several online pesticide safety workshops and are described in Innovative Virtual Pesticide Recertification Webinar Series Achieves Success during the COVID-19 Pandemic, Himyck, R. et al. 2022, Journal of Pesticide Safety Education. While this survey covered many topics, the data used here are a subset that relate to questions on participant age, size of community, and the online device they accessed the workshop with. Age is categorized into 5 levels, Community Size has 5 levels, and two types of Devices (Smart phone and Tablet) are considered. The distributions of Age by Device type are plotted below.
PROC IMPORT OUT= WORK.survey
DATAFILE= ".\Data\survey.xlsx"
DBMS=XLSX REPLACE;
sheet="Survey";
RUN;
proc print;
run;
proc sgplot data=survey;
styleattrs datacolors=(cx1805A7 cx8805A9 cxF717BB cxF71731 cxF78B17) datacontrastcolors=(black black black);
vbarparm category=Device response=Count / LIMITATTRS=(color=black)
group=Age grouporder=data OUTLINEATTRS=(color=black) groupdisplay=cluster;
xaxis label='Device' TYPE=discrete DISCRETEORDER=formatted LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
yaxis label="Count" LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
title1 ' ';
keylegend /AUTOITEMSIZE valueattrs=(size=14) TITLEATTRS=(weight=bold size=12);
run;
Obs | Age | Size | Device | Count |
---|---|---|---|---|
1 | 18-35 | Large City | Smart phone | 18 |
2 | 36-45 | Large City | Smart phone | 16 |
3 | 46-55 | Large City | Smart phone | 14 |
4 | 56-65 | Large City | Smart phone | 6 |
5 | 18-35 | Large City | Tablet | 1 |
6 | 36-45 | Large City | Tablet | 13 |
7 | 46-55 | Large City | Tablet | 2 |
8 | 56-65 | Large City | Tablet | 10 |
9 | Over 65 | Large City | Tablet | 1 |
10 | 18-35 | Medium City | Smart phone | 8 |
11 | 36-45 | Medium City | Smart phone | 4 |
12 | 46-55 | Medium City | Smart phone | 10 |
13 | 56-65 | Medium City | Smart phone | 6 |
14 | Over 65 | Medium City | Smart phone | 1 |
15 | 18-35 | Medium City | Tablet | 1 |
16 | 56-65 | Medium City | Tablet | 7 |
17 | Over 65 | Medium City | Tablet | 5 |
18 | 18-35 | Small City | Smart phone | 1 |
19 | 36-45 | Small City | Smart phone | 7 |
20 | 46-55 | Small City | Smart phone | 3 |
21 | 56-65 | Small City | Smart phone | 10 |
22 | Over 65 | Small City | Smart phone | 2 |
23 | 18-35 | Small City | Tablet | 5 |
24 | 36-45 | Small City | Tablet | 5 |
25 | 46-55 | Small City | Tablet | 4 |
26 | 56-65 | Small City | Tablet | 10 |
27 | Over 65 | Small City | Tablet | 2 |
28 | 18-35 | Town | Smart phone | 6 |
29 | 36-45 | Town | Smart phone | 26 |
30 | 46-55 | Town | Smart phone | 6 |
31 | 56-65 | Town | Smart phone | 3 |
32 | Over 65 | Town | Smart phone | 2 |
33 | 18-35 | Town | Tablet | 8 |
34 | 46-55 | Town | Tablet | 4 |
35 | 56-65 | Town | Tablet | 1 |
36 | Over 65 | Town | Tablet | 13 |
37 | 18-35 | Rural | Smart phone | 17 |
38 | 36-45 | Rural | Smart phone | 17 |
39 | 46-55 | Rural | Smart phone | 11 |
40 | 56-65 | Rural | Smart phone | 18 |
41 | Over 65 | Rural | Smart phone | 15 |
42 | 18-35 | Rural | Tablet | 12 |
43 | 46-55 | Rural | Tablet | 14 |
44 | 56-65 | Rural | Tablet | 15 |
45 | Over 65 | Rural | Tablet | 12 |
The most basic way to look at this data is to summarize the one-way marginal totals. In this example, we look at Age and Device, ignoring Community Size for now. This looks at Age and Device separately. The procedure used is Proc Freq, which is a tabulation procedure. A Weight statement is used to indicate the data are already summarized into counts. The Table statement implements the tabulation request. Both Age and Device are specified in the statement, generating two analyses. We could also issue two Table statements and generate equivalent results. The option chisq requests a test to assess whether all the categories within each factor have an equal probability of occurring. The plots option asks for frequency plots of each marginal distribution.
proc freq data=survey;
weight count;
tables age device/chisq plots(only)=freqplot;
title1 'Age and Device Marginals';
run;
Age and Device Marginals |
Age | Frequency | Percent |
Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
18-35 | 77 | 20.70 | 77 | 20.70 |
36-45 | 88 | 23.66 | 165 | 44.35 |
46-55 | 68 | 18.28 | 233 | 62.63 |
56-65 | 86 | 23.12 | 319 | 85.75 |
Over 65 | 53 | 14.25 | 372 | 100.00 |
Chi-Square Test for Equal Proportions |
|
---|---|
Chi-Square | 11.0914 |
DF | 4 |
Pr > ChiSq | 0.0256 |
Sample Size = 372 |
Device | Frequency | Percent |
Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Smart phone | 227 | 61.02 | 227 | 61.02 |
Tablet | 145 | 38.98 | 372 | 100.00 |
Chi-Square Test for Equal Proportions |
|
---|---|
Chi-Square | 18.0753 |
DF | 1 |
Pr > ChiSq | <.0001 |
Sample Size = 372 |
The output here gives the one-way tabulation for frequency (counts), percent, and the cumulative frequencies and percentages. The Chi-square test indicates that the probabilities of categories within each factor are not equal. Note, the degrees of freedom for each factor is one less than the respective factor levels.
Many times we want to assess the association between two or more categorical factors. In this example, a two-way contingency table is setup to look at the combination of Age and Device. A chi-square test (option chisq) is requested to examine the potential association or independence of the two.
options nonotes;
proc freq data=survey;
weight count;
tables age*device/chisq;
title1 'Age by Device Contingency Table';
run;
Age by Device Contingency Table |
|
|
Statistics for Table of Age by Device |
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 4 | 30.0534 | <.0001 |
Likelihood Ratio Chi-Square | 4 | 30.7686 | <.0001 |
Mantel-Haenszel Chi-Square | 1 | 19.4680 | <.0001 |
Phi Coefficient | 0.2842 | ||
Contingency Coefficient | 0.2734 | ||
Cramer's V | 0.2842 |
Sample Size = 372 |
In the resulting table, four numbers are given in each cell. From top to bottom they are: the number of observations for that cell, the corresponding percentage of that cell in the whole table, the row percentage of the cell, and finally, the column percentage. These last two percentages or marginal distributions are often useful for thinking about possible associations. For example, we could look at row percentages in this table representing the distribution of device types within each age group. If there were no association, the percentages of cell phones and tablets would be similar for every age class. Looking at the table, however, we see that the percentage of smart phone use in younger groups is higher than those in older age groups. The percentages for tablet use follow a reverse trend. This is also evident in the initial plot of the data given above.
The chi-square test option confirms this where the p-value is < 0.0001, indicating an association was detected. In this table there are several tests carried out. Only the first two, Chi-Square and Likelihood Ratio Chi-Square, are relevant and either of these can be reported. Note that, like correlations, this does not imply causality, but just indicates that the distribution of Device types changes with Age classes and the Devices tend to trend in opposite direction as Age increases.
Sometimes it is of interest to more directly model the relationships between two or more categorical factors. One common means for doing this is logistic regression. In logistic regression, a binary categorical “response” is modeled as a function of other factors, which can be either categorical or continuous. A key here is that the response is a binary factor. While modeling can be done with more than two categories in the response, the interpretation becomes much more complex. In logistic regression, we indirectly model the proportions of the binary responses. This is done by selecting one of the categories, often referred to as a “success”, and representing its proportion of success as p. The proportion of “failures” is then 1-p because the proportions of “success” and “failure” must add to 1.0. Note that which category we assign as a “success” or “failure” is immaterial, but SAS will choose the lowest numeric value of the binary response as a “success” (In SAS, this can be reversed with the (descending) option placed after the response in the model statement). Once it is defined, however, the proportion is transformed logrithmically to:
\[ transformed\;\; proportion = ln\left( \frac{p}{1-p}\right) = logit \] The fraction of success to failure in the log function is referred to as the odds of success, p, and the entire term as the log odds or logit. When logistic regression is run with a categorical factor on the right hand side of the model, the procedure will form one logit or log odds for every level of that factor. Results are then usually displayed or reported as the ratio of the odds (odds ratio = OR) of all levels relative to one selected level. By default, SAS determines this level to be the last alphabetical level. This can also be changed if needed (e.g. the ref option in a Class statement for SAS). An alternative is to also just report and interpret the proportions themselves along with the odds.
SAS has several procedures that can run logistic regression. In the example below, Proc Glimmix is used as logistic regression is actually a generalized linear model. For more information on generalized linear models, see the tutorial here. In this case the factor Device is the binary response. In order to get SAS to implement logistic regression, however, we need to get Glimmix to view this response as a numeric binary variable. We could recode the “Smart phone” : “Tablet” character values in the data, but here the Proc Format procedure is used to simply coerce the change to 0 and 1, respectively, without manipulating the data. In the model, a binary distribution is specified where the logit is the default link function. Age class is the factor on the right hand side of the model. The model will then assess the odds and odds ratios of Smart phone usage for each Age class. The Lsmeans statement, with the ilink and odds options, are used to display the predicted proportions of “success” (Smart phone) and the respective odds for each age class. These are also output to a separate data set and the plotted with Proc Sgplot.
proc format;
value $dvf 'Smart phone' = '0'
'Tablet' = '1';
run;
ods graphics;
proc glimmix data=survey method=quad;
weight count;
class age;
model device = age/dist=binary oddsratio;
format device $dvf.;
lsmeans age/cl ilink odds;
ods output LSMeans=odds;
run;
proc sgplot data=odds noautolegend;
styleattrs datacolors=(cx1805A7 cx8805A9 cxF717BB cxF71731 cxF78B17) DATACONTRASTCOLORS=(cx1805A7 cx8805A9 cxF717BB cxF71731 cxF78B17) ;
highlow y=age high=upperodds low=lowerodds/group=age type=line lineattrs=(pattern=solid) highcap=serif lowcap=serif LINEATTRS=(thickness=2);
scatter y=age x=odds/group=age FILLEDOUTLINEDMARKERS markerattrs=(symbol=circlefilled size=14) MARKEROUTLINEATTRS=(color=black) datalabel=age DATALABELATTRS=(Color=black Family=Arial Style=Italic Weight=Bold size=10);
refline 1 /axis=x lineattrs=(pattern=shortdash color=black) ;
yaxis label='Age' TYPE=discrete DISCRETEORDER=data LABELATTRS=( Family=Arial Size=15 Weight=Bold) display=(NOVALUES NOTICKS);
xaxis label='Odds of Smart Phone Use' LABELATTRS=( Family=Arial Size=15 Weight=Bold) VALUEATTRS=(Family=Arial Size=12 Weight=Bold);
run;
Model Information | |
---|---|
Data Set | WORK.SURVEY |
Response Variable | Device |
Response Distribution | Binary |
Link Function | Logit |
Variance Function | Default |
Weight Variable | Count |
Variance Matrix | Diagonal |
Estimation Technique | Maximum Likelihood |
Degrees of Freedom Method | Residual |
Class Level Information | ||
---|---|---|
Class | Levels | Values |
Age | 5 | 18-35 36-45 46-55 56-65 Over 65 |
Number of Observations Read | 45 |
---|---|
Number of Observations Used | 45 |
Response Profile | ||
---|---|---|
Ordered Value |
Device |
Total Frequency |
1 | 0 | 24 |
2 | 1 | 21 |
The GLIMMIX procedure is modeling the probability that Device='0'. |
Dimensions | |
---|---|
Columns in X | 6 |
Columns in Z | 0 |
Subjects (Blocks in V) | 1 |
Max Obs per Subject | 45 |
Optimization Information | |
---|---|
Optimization Technique | Newton-Raphson |
Parameters in Optimization | 5 |
Lower Boundaries | 0 |
Upper Boundaries | 0 |
Fixed Effects | Not Profiled |
Iteration History | |||||
---|---|---|---|---|---|
Iteration | Restarts | Evaluations |
Objective Function |
Change |
Max Gradient |
0 | 0 | 4 | 233.65149305 | . | 3.492468 |
1 | 0 | 3 | 233.35427822 | 0.29721483 | 0.078116 |
2 | 0 | 3 | 233.354175 | 0.00010322 | 0.000033 |
3 | 0 | 3 | 233.354175 | 0.00000000 | 6.73E-12 |
Convergence criterion (GCONV=1E-8) satisfied. |
Fit Statistics | |
---|---|
-2 Log Likelihood | 466.71 |
AIC (smaller is better) | 476.71 |
AICC (smaller is better) | 478.25 |
BIC (smaller is better) | 485.74 |
CAIC (smaller is better) | 490.74 |
HQIC (smaller is better) | 480.08 |
Pearson Chi-Square | 372.00 |
Pearson Chi-Square / DF | 8.27 |
Type III Tests of Fixed Effects | ||||
---|---|---|---|---|
Effect | Num DF | Den DF | F Value | Pr > F |
Age | 4 | 40 | 7.04 | 0.0002 |
Odds Ratio Estimates | |||||
---|---|---|---|---|---|
Age | Age | Estimate | DF | 95% Confidence Limits | |
18-35 | Over 65 | 3.056 | 40 | 1.445 | 6.462 |
36-45 | Over 65 | 6.417 | 40 | 2.932 | 14.042 |
46-55 | Over 65 | 3.025 | 40 | 1.402 | 6.525 |
56-65 | Over 65 | 1.650 | 40 | 0.803 | 3.389 |
Age Least Squares Means | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Age | Estimate |
Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper | Mean |
Standard Error Mean |
Lower Mean |
Upper Mean |
Odds |
Lower Odds |
Upper Odds |
18-35 | 0.6162 | 0.2388 | 40 | 2.58 | 0.0137 | 0.05 | 0.1335 | 1.0989 | 0.6494 | 0.05438 | 0.5333 | 0.7500 | 1.8519 | 1.1428 | 3.0008 |
36-45 | 1.3581 | 0.2643 | 40 | 5.14 | <.0001 | 0.05 | 0.8240 | 1.8922 | 0.7955 | 0.04300 | 0.6951 | 0.8690 | 3.8889 | 2.2796 | 6.6342 |
46-55 | 0.6061 | 0.2538 | 40 | 2.39 | 0.0217 | 0.05 | 0.09327 | 1.1190 | 0.6471 | 0.05795 | 0.5233 | 0.7538 | 1.8333 | 1.0978 | 3.0618 |
56-65 | 1.11E-16 | 0.2157 | 40 | 0.00 | 1.0000 | 0.05 | -0.4359 | 0.4359 | 0.5000 | 0.05392 | 0.3927 | 0.6073 | 1.0000 | 0.6467 | 1.5463 |
Over 65 | -0.5008 | 0.2834 | 40 | -1.77 | 0.0848 | 0.05 | -1.0735 | 0.07195 | 0.3774 | 0.06658 | 0.2547 | 0.5180 | 0.6061 | 0.3418 | 1.0746 |
Some notes on the output: First, in the Response Profile, Device type is listed as 0 or 1, reflecting the formats from the preceding Proc Format definitions. Also, note that the output states the procedure is modeling the probability that Device=‘0’, which was defined to be “Smart phone”. This will be the “p” in the logit function.
Further down are the Type III test of Fixed Effects where the test of differences in the Age factor are given. In this case, Age has a low p-value suggesting the probabilities of Smart phone use are different across the Age categories. This is followed by the Odds Ratio table resulting from the oddsratio option in the model statement. Here there are four values listed. The last category, “Over 65” is the reference level (it was last alphabetically), so the other categories are compared to it. The odds ratios indicate the relative size of each category’s odds compared to this reference level. The odds of Smart phone usage in “18-35” and “46-55” year olds are about 3 times as large as the “over 65” age group, while the “36-45” group is about 6.4 times as large. Although odds ratios are the commonly reported statistic for group comparison in logistic regression, they do not indicate the size of the underlying probabilities.
The LSmeans table completes the information. Here the Estimate column values are the actual logit transformed values. These are not of much use for interpretation. The ilink and odds options, however, provide the estimated probabilities (Mean column) of Smart phone usage and the respective odds. The “36-45” age group has the highest probability at 0.695 or about 70%, while the two oldest groups have probabilities closer to 0.5 or equal probability. In the Odds column, the ratio of success = “Smart phone” to failure = “Tablet” are shown. These indicate how likely Smart phone use is for each group.
Both Odds Ratios and Odds are compared to 1.0. When the probability of success equals that of failure the odds are 1.0, or no preference for either. Also, if two groups have equal odds, their odds ratio will be 1.0. This is indicated in the respective confidence intervals for each statistic. In this example, we see that the “56-65” category had a probability of 0.50 resulting in odds of 1.0. In the odds ratio table above, we see that this category compared to the “Over 65” group had an odds ratio close to 1.0 indicating the odds of Smart phone use in these two groups were similar.
Important: When reporting logistic regression results, It is not sufficient to only report odds ratios. These are relative measurements and do not indicate the magnitude or sizes of the responses. Always present either the underlying probabilities, odds, or both.
From the Odds Ratio and LSmeans tables, we can reconstruct the values and see the relationships among them. If we take the estimated probabilities for each category and compute p/(1-p), we get the odds. For example in the “18-35” group, the estimated probability of Smart phone use is 0.6494. Computing 0.6494/(1-.6494) gives 1.85, the odds for that category. If we further take the odds for “18-35” and divide it by the odds of “Over 65” = 1.85/0.6061 we get the odds ratio for “18-35” = 3.056. As noted above, we can reconstruct the underlying probabilities of categories from their odds, but we cannot do the same with only odds ratios. This is why it is important to present complete results when reporting logistic regression.
There are times, especially in survey data, where there are no obvious “independent/dependent” variable relationships. For example, in this data, the relationship between Age and Community Size may not be obvious in terms of one influencing the other. Regardless, we would still like to evaluate the effects of both or their combination. In these cases, it can be useful to use a class of models called Loglinear models. The procedure for these in SAS is Proc Catmod. Note that Catmod can do other model types, and this example is not meant to cover all those cases. Also, Proc Catmod cannot address or account for random model effects, so it has limitations in that way.
Below, Catmod is used to assess the combined effect of Age and Size in the data (this ignores Device effects). The syntax for loglinear models in Catmod are unique compared to other SAS procedures. Here we use a special construct called response as a place holder in the model because both Age and Size could appear on either the left or right hand sides of the model. The loglin statement following the model tells SAS what effects to evaluate on the right hand side. Here, the full model of main effects and interactions are requested. Catmod can also carry out contrasts. Because categorical models drop the last level of an effect for estimation, the contrasts may look different than other modeling procedures. In Catmod, if the contrast does not involve the last level of an effect, the contrasts are similar to other procedures and the coefficients add to zero. Note, however, the number of coefficients is one less than the number of levels for a factor. Size, for example, has 5 levels, so there are 4 coefficients in the contrast statements. When a contrast involves the last level (Rural in this case), special care needs to be taken in forming contrast coefficient values following this rule: The last effect level is set to be equal to the negative sum of all other coefficients for that effect. To illustrate, let the 5 levels of Size be represented by the greek letters \(\alpha_1\) - \(\alpha_5\). The last level is then set to \(\alpha_5 = -(\alpha_1+\alpha_2+\alpha_3+\alpha_4 )\). A contrast comparing “Large City” to “Rural” would then be:
\[ H_0 : (\alpha_1 - \alpha_5) = (\alpha_1 - (-(\alpha_1+\alpha_2+\alpha_3+\alpha_4 )) =\\ \alpha_1 +\alpha_1+\alpha_2+\alpha_3+\alpha_4 = \\2\alpha_1 + \alpha_2+\alpha_3+\alpha_4\] The coefficients are, therefore, 2 1 1 1. Although these do not add to zero, as expected in other procedures, they are correct for this contrast.
Unfortunately, contrasts for interaction terms can become much more complex. In those cases, it may be better to construct a combined categorical variable for both Age and Size
(In a data step, define it as Age_Size = Age||” “||Size;)
and run that variable in the model and as the loglin effect. While it will have many coefficients (24), the last level “Over 65 Rural” will be dropped and contrasts can be set up as shown above.
proc catmod data=survey order=data;
weight count;
model Size*Age=_response_/pred=prob;
loglin Size Age Size*Age;
contrast 'Large vs Small City' Size 1 0 -1 0;
contrast 'Medium vs Town' Size 0 1 0 -1;
contrast 'Large vs Rural' Size 2 1 1 1;
run;
Data Summary | |||
---|---|---|---|
Response | Size*Age | Response Levels | 25 |
Weight Variable | Count | Populations | 1 |
Data Set | SURVEY | Total Frequency | 372 |
Frequency Missing | 0 | Observations | 45 |
Population Profiles | |
---|---|
Sample | Sample Size |
1 | 372 |
Response Profiles | ||
---|---|---|
Response | Size | Age |
1 | Large City | 18-35 |
2 | Large City | 36-45 |
3 | Large City | 46-55 |
4 | Large City | 56-65 |
5 | Large City | Over 65 |
6 | Medium City | 18-35 |
7 | Medium City | 36-45 |
8 | Medium City | 46-55 |
9 | Medium City | 56-65 |
10 | Medium City | Over 65 |
11 | Small City | 18-35 |
12 | Small City | 36-45 |
13 | Small City | 46-55 |
14 | Small City | 56-65 |
15 | Small City | Over 65 |
16 | Town | 18-35 |
17 | Town | 36-45 |
18 | Town | 46-55 |
19 | Town | 56-65 |
20 | Town | Over 65 |
21 | Rural | 18-35 |
22 | Rural | 36-45 |
23 | Rural | 46-55 |
24 | Rural | 56-65 |
25 | Rural | Over 65 |
Maximum Likelihood Analysis |
---|
Maximum likelihood computations converged. |
Maximum Likelihood Analysis of Variance | |||
---|---|---|---|
Source | DF | Chi-Square | Pr > ChiSq |
Size | 4 | 68.69 | <.0001 |
Age | 4 | 9.74 | 0.0451 |
Size*Age | 16 | 47.55 | <.0001 |
Likelihood Ratio | 0 | . | . |
Analysis of Maximum Likelihood Estimates | |||||
---|---|---|---|---|---|
Parameter | Estimate |
Standard Error |
Chi- Square |
Pr > ChiSq | |
Size | Large City | -0.0770 | 0.1855 | 0.17 | 0.6780 |
Medium City | -0.3998 | 0.1492 | 7.18 | 0.0074 | |
Small City | -0.3275 | 0.1482 | 4.88 | 0.0271 | |
Town | 0.0104 | 0.1341 | 0.01 | 0.9381 | |
Age | 18-35 | 0.1395 | 0.1257 | 1.23 | 0.2671 |
36-45 | 0.2176 | 0.1285 | 2.87 | 0.0903 | |
46-55 | 0.0601 | 0.1266 | 0.23 | 0.6350 | |
56-65 | 0.1948 | 0.1289 | 2.28 | 0.1307 | |
Size*Age | Large City 18-35 | 0.4335 | 0.2527 | 2.94 | 0.0862 |
Large City 36-45 | 0.7784 | 0.2408 | 10.45 | 0.0012 | |
Large City 46-55 | 0.3411 | 0.2600 | 1.72 | 0.1896 | |
Large City 56-65 | 0.2064 | 0.2611 | 0.62 | 0.4293 | |
Medium City 18-35 | 0.00911 | 0.2697 | 0.00 | 0.9730 | |
Medium City 36-45 | -0.8798 | 0.3513 | 6.27 | 0.0123 | |
Medium City 46-55 | 0.1939 | 0.2626 | 0.55 | 0.4602 | |
Medium City 56-65 | 0.3216 | 0.2474 | 1.69 | 0.1937 | |
Small City 18-35 | -0.4687 | 0.3040 | 2.38 | 0.1232 | |
Small City 36-45 | 0.1464 | 0.2513 | 0.34 | 0.5600 | |
Small City 46-55 | -0.2351 | 0.2900 | 0.66 | 0.4175 | |
Small City 56-65 | 0.6800 | 0.2264 | 9.02 | 0.0027 | |
Town 18-35 | 0.0407 | 0.2327 | 0.03 | 0.8612 | |
Town 36-45 | 0.5817 | 0.2073 | 7.87 | 0.0050 | |
Town 46-55 | -0.2163 | 0.2543 | 0.72 | 0.3949 | |
Town 56-65 | -1.2673 | 0.3453 | 13.47 | 0.0002 |
Contrasts of Maximum Likelihood Estimates | |||
---|---|---|---|
Contrast | DF | Chi-Square | Pr > ChiSq |
Large vs Small City | 1 | 0.82 | 0.3642 |
Medium vs Town | 1 | 3.42 | 0.0645 |
Large vs Rural | 1 | 13.43 | 0.0002 |
Maximum Likelihood Predicted Values for Response Functions | |||||
---|---|---|---|---|---|
Function Number |
Observed | Predicted | Residual | ||
Function |
Standard Error |
Function |
Standard Error |
||
1 | -0.3514 | 0.299447 | -0.3514 | 0.299447 | 0 |
2 | 0.071459 | 0.267432 | 0.071459 | 0.267432 | 0 |
3 | -0.52325 | 0.315495 | -0.52325 | 0.315495 | 0 |
4 | -0.52325 | 0.315495 | -0.52325 | 0.315495 | 0 |
5 | -3.29584 | 1.01835 | -3.29584 | 1.018201 | -4.61E-8 |
6 | -1.09861 | 0.3849 | -1.09861 | 0.3849 | 0 |
7 | -1.90954 | 0.535758 | -1.90954 | 0.535759 | 0 |
8 | -0.99325 | 0.370185 | -0.99325 | 0.370185 | 0 |
9 | -0.73089 | 0.33758 | -0.73089 | 0.33758 | 0 |
10 | -1.50408 | 0.451335 | -1.50408 | 0.451336 | 0 |
11 | -1.50408 | 0.451335 | -1.50408 | 0.451336 | 0 |
12 | -0.81093 | 0.346944 | -0.81093 | 0.346944 | 0 |
13 | -1.34993 | 0.424139 | -1.34993 | 0.42414 | 0 |
14 | -0.3001 | 0.29502 | -0.3001 | 0.295021 | 0 |
15 | -1.90954 | 0.535758 | -1.90954 | 0.535759 | 0 |
16 | -0.65678 | 0.329341 | -0.65678 | 0.329341 | 0 |
17 | -0.03774 | 0.27477 | -0.03774 | 0.27477 | 0 |
18 | -0.99325 | 0.370185 | -0.99325 | 0.370185 | 0 |
19 | -1.90954 | 0.535758 | -1.90954 | 0.535759 | 0 |
20 | -0.58779 | 0.322031 | -0.58779 | 0.322031 | 0 |
21 | 0.071459 | 0.267432 | 0.071459 | 0.267432 | 0 |
22 | -0.46262 | 0.309614 | -0.46262 | 0.309614 | 0 |
23 | -0.07696 | 0.277555 | -0.07696 | 0.277556 | 0 |
24 | 0.200671 | 0.2595 | 0.200671 | 0.2595 | 0 |
Maximum Likelihood Predicted Values for Probabilities | ||||||
---|---|---|---|---|---|---|
Size | Age | Observed | Predicted | Residual | ||
Probability |
Standard Error |
Probability |
Standard Error |
|||
Large City | 18-35 | 0.0511 | 0.0114 | 0.0511 | 0.0114 | 0 |
Large City | 36-45 | 0.078 | 0.0139 | 0.078 | 0.0139 | 0 |
Large City | 46-55 | 0.043 | 0.0105 | 0.043 | 0.0105 | 0 |
Large City | 56-65 | 0.043 | 0.0105 | 0.043 | 0.0105 | 0 |
Large City | Over 65 | 0.0027 | 0.0027 | 0.0027 | 0.0027 | -1E-10 |
Medium City | 18-35 | 0.0242 | 0.008 | 0.0242 | 0.008 | 0 |
Medium City | 36-45 | 0.0108 | 0.0053 | 0.0108 | 0.0053 | 0 |
Medium City | 46-55 | 0.0269 | 0.0084 | 0.0269 | 0.0084 | 0 |
Medium City | 56-65 | 0.0349 | 0.0095 | 0.0349 | 0.0095 | 0 |
Medium City | Over 65 | 0.0161 | 0.0065 | 0.0161 | 0.0065 | 0 |
Small City | 18-35 | 0.0161 | 0.0065 | 0.0161 | 0.0065 | 0 |
Small City | 36-45 | 0.0323 | 0.0092 | 0.0323 | 0.0092 | 0 |
Small City | 46-55 | 0.0188 | 0.007 | 0.0188 | 0.007 | 0 |
Small City | 56-65 | 0.0538 | 0.0117 | 0.0538 | 0.0117 | 0 |
Small City | Over 65 | 0.0108 | 0.0053 | 0.0108 | 0.0053 | 0 |
Town | 18-35 | 0.0376 | 0.0099 | 0.0376 | 0.0099 | 0 |
Town | 36-45 | 0.0699 | 0.0132 | 0.0699 | 0.0132 | 0 |
Town | 46-55 | 0.0269 | 0.0084 | 0.0269 | 0.0084 | 0 |
Town | 56-65 | 0.0108 | 0.0053 | 0.0108 | 0.0053 | 0 |
Town | Over 65 | 0.0403 | 0.0102 | 0.0403 | 0.0102 | 0 |
Rural | 18-35 | 0.078 | 0.0139 | 0.078 | 0.0139 | 0 |
Rural | 36-45 | 0.0457 | 0.0108 | 0.0457 | 0.0108 | 0 |
Rural | 46-55 | 0.0672 | 0.013 | 0.0672 | 0.013 | 0 |
Rural | 56-65 | 0.0887 | 0.0147 | 0.0887 | 0.0147 | 0 |
Rural | Over 65 | 0.0726 | 0.0135 | 0.0726 | 0.0135 | 0 |