In an example from Ries and Smith (1963), the choice of detergent brand (Brand
= M or X) is related to three other categorical variables: the softness of the laundry water (Softness
= soft, medium, or hard), the temperature of the water (Temperature
= high or low), and whether the subject was a previous user of Brand M (Previous
= yes or no). The linear response function, which could also be specified as RESPONSE MARGINALS, yields one probability, Pr(brand preference=M), as the response function to be analyzed. Two models are fit in this example:
the first model is a saturated one, containing all of the main effects and interactions, while the second is a reduced model
containing only the main effects. The following statements produce Output 30.1.1 through Output 30.1.4:
data detergent; input Softness $ Brand $ Previous $ Temperature $ Count @@; datalines; soft X yes high 19 soft X yes low 57 soft X no high 29 soft X no low 63 soft M yes high 29 soft M yes low 49 soft M no high 27 soft M no low 53 med X yes high 23 med X yes low 47 med X no high 33 med X no low 66 med M yes high 47 med M yes low 55 med M no high 23 med M no low 50 hard X yes high 24 hard X yes low 37 hard X no high 42 hard X no low 68 hard M yes high 43 hard M yes low 52 hard M no high 30 hard M no low 42 ;
title 'Detergent Preference Study'; proc catmod data=detergent; response 1 0; weight Count; model Brand=Softness|Previous|Temperature / freq prob; title2 'Saturated Model'; run;
The “Data Summary” table (Output 30.1.1) indicates that you have two response levels and twelve populations.
Output 30.1.1: Detergent Preference Study: Linear Model Analysis
Detergent Preference Study |
Saturated Model |
Data Summary | |||
---|---|---|---|
Response | Brand | Response Levels | 2 |
Weight Variable | Count | Populations | 12 |
Data Set | DETERGENT | Total Frequency | 1008 |
Frequency Missing | 0 | Observations | 24 |
The “Population Profiles” table in Output 30.1.2 displays the ordering of independent variable levels as used in the table of parameter estimates.
Output 30.1.2: Population Profiles
Population Profiles | ||||
---|---|---|---|---|
Sample | Softness | Previous | Temperature | Sample Size |
1 | hard | no | high | 72 |
2 | hard | no | low | 110 |
3 | hard | yes | high | 67 |
4 | hard | yes | low | 89 |
5 | med | no | high | 56 |
6 | med | no | low | 116 |
7 | med | yes | high | 70 |
8 | med | yes | low | 102 |
9 | soft | no | high | 56 |
10 | soft | no | low | 116 |
11 | soft | yes | high | 48 |
12 | soft | yes | low | 106 |
Since Brand
M is the first level in the “Response Profiles” table (Output 30.1.3), the RESPONSE statement causes Pr(Brand
=M) to be the single response function modeled.
Output 30.1.3: Response Profiles, Frequencies, and Probabilities
Response Profiles | |
---|---|
Response | Brand |
1 | M |
2 | X |
Response Frequencies | ||
---|---|---|
Sample | Response Number | |
1 | 2 | |
1 | 30 | 42 |
2 | 42 | 68 |
3 | 43 | 24 |
4 | 52 | 37 |
5 | 23 | 33 |
6 | 50 | 66 |
7 | 47 | 23 |
8 | 55 | 47 |
9 | 27 | 29 |
10 | 53 | 63 |
11 | 29 | 19 |
12 | 49 | 57 |
Response Probabilities | ||
---|---|---|
Sample | Response Number | |
1 | 2 | |
1 | 0.41667 | 0.58333 |
2 | 0.38182 | 0.61818 |
3 | 0.64179 | 0.35821 |
4 | 0.58427 | 0.41573 |
5 | 0.41071 | 0.58929 |
6 | 0.43103 | 0.56897 |
7 | 0.67143 | 0.32857 |
8 | 0.53922 | 0.46078 |
9 | 0.48214 | 0.51786 |
10 | 0.45690 | 0.54310 |
11 | 0.60417 | 0.39583 |
12 | 0.46226 | 0.53774 |
The “Analysis of Variance” table in Output 30.1.4 shows that all of the interactions are nonsignificant.
Output 30.1.4: Analysis of Variance
Analysis of Variance | |||
---|---|---|---|
Source | DF | Chi-Square | Pr > ChiSq |
Intercept | 1 | 983.13 | <.0001 |
Softness | 2 | 0.09 | 0.9575 |
Previous | 1 | 22.68 | <.0001 |
Softness*Previous | 2 | 3.85 | 0.1457 |
Temperature | 1 | 3.67 | 0.0555 |
Softness*Temperature | 2 | 0.23 | 0.8914 |
Previous*Temperature | 1 | 2.26 | 0.1324 |
Softnes*Previou*Temperat | 2 | 0.76 | 0.6850 |
Residual | 0 | . | . |
Therefore, a main-effects model is fit with the following statements:
model Brand=Softness Previous Temperature / clparm noprofile design; title2 'Main-Effects Model'; run; quit;
The PROC CATMOD statement is not required due to the interactive capability of the CATMOD procedure. The NOPROFILE option suppresses the redisplay of the “Response Profiles” table. The CLPARM option produces 95% confidence limits for the parameter estimates. Output 30.1.5 through Output 30.1.7 are produced.
The design matrix in Output 30.1.5 displays the results of the differential-effects modeling used in PROC CATMOD.
Output 30.1.5: Main-Effects Design Matrix
Detergent Preference Study |
Main-Effects Model |
Data Summary | |||
---|---|---|---|
Response | Brand | Response Levels | 2 |
Weight Variable | Count | Populations | 12 |
Data Set | DETERGENT | Total Frequency | 1008 |
Frequency Missing | 0 | Observations | 24 |
Response Functions and Design Matrix | ||||||
---|---|---|---|---|---|---|
Sample | Response Function |
Design Matrix | ||||
1 | 2 | 3 | 4 | 5 | ||
1 | 0.41667 | 1 | 1 | 0 | 1 | 1 |
2 | 0.38182 | 1 | 1 | 0 | 1 | -1 |
3 | 0.64179 | 1 | 1 | 0 | -1 | 1 |
4 | 0.58427 | 1 | 1 | 0 | -1 | -1 |
5 | 0.41071 | 1 | 0 | 1 | 1 | 1 |
6 | 0.43103 | 1 | 0 | 1 | 1 | -1 |
7 | 0.67143 | 1 | 0 | 1 | -1 | 1 |
8 | 0.53922 | 1 | 0 | 1 | -1 | -1 |
9 | 0.48214 | 1 | -1 | -1 | 1 | 1 |
10 | 0.45690 | 1 | -1 | -1 | 1 | -1 |
11 | 0.60417 | 1 | -1 | -1 | -1 | 1 |
12 | 0.46226 | 1 | -1 | -1 | -1 | -1 |
The analysis of variance table in Output 30.1.6 shows that previous use of Brand M, together with the temperature of the laundry water, is a significant factor in whether a subject prefers Brand M laundry detergent. The table also shows that the additive model fits since the goodness-of-fit statistic (the residual chi-square) is nonsignificant.
Output 30.1.6: ANOVA Table for the Main-Effects Model
Analysis of Variance | |||
---|---|---|---|
Source | DF | Chi-Square | Pr > ChiSq |
Intercept | 1 | 1004.93 | <.0001 |
Softness | 2 | 0.24 | 0.8859 |
Previous | 1 | 20.96 | <.0001 |
Temperature | 1 | 3.95 | 0.0468 |
Residual | 7 | 8.26 | 0.3100 |
The chi-square test in Output 30.1.7 shows that the Softness
parameters are not significantly different from zero; as expected, the Wald confidence limits for these two estimates contain
zero. So softness of the water is not a factor in choosing Brand M.
Output 30.1.7: WLS Estimates for the Main-Effects Model
Analysis of Weighted Least Squares Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | Estimate | Standard Error |
Chi- Square |
Pr > ChiSq | 95% Confidence Limits | ||
Intercept | 0.5080 | 0.0160 | 1004.93 | <.0001 | 0.4766 | 0.5394 | |
Softness | hard | -0.00256 | 0.0218 | 0.01 | 0.9066 | -0.0454 | 0.0402 |
med | 0.0104 | 0.0218 | 0.23 | 0.6342 | -0.0323 | 0.0530 | |
Previous | no | -0.0711 | 0.0155 | 20.96 | <.0001 | -0.1015 | -0.0407 |
Temperature | high | 0.0319 | 0.0161 | 3.95 | 0.0468 | 0.000446 | 0.0634 |
The negative coefficient for Previous
(–0.0711) indicates that the first level of Previous
(which is shown to be 'no') is associated with a smaller probability of preferring Brand M than the second level of Previous
(with coefficient constrained to be 0.0711 since the parameter estimates for a given effect must sum to zero). In other words,
previous users of Brand M are much more likely to prefer it than those who have never used it before.
Similarly, the positive coefficient for Temperature
indicates that the first level of Temperature
(which, from the “Population Profiles” table, is 'high') has a larger probability of preferring Brand M than the second level of Temperature
. In other words, those who do their laundry in hot water are more likely to prefer Brand M than those who do their laundry
in cold water.