The CATMOD Procedure

Example 30.1 Linear Response Function, r=2 Responses

In an example from Ries and Smith (1963), the choice of detergent brand (Brand = M or X) is related to three other categorical variables: the softness of the laundry water (Softness = soft, medium, or hard), the temperature of the water (Temperature = high or low), and whether the subject was a previous user of Brand M (Previous = yes or no). The linear response function, which could also be specified as RESPONSE MARGINALS, yields one probability, Pr(brand preference=M), as the response function to be analyzed. Two models are fit in this example: the first model is a saturated one, containing all of the main effects and interactions, while the second is a reduced model containing only the main effects. The following statements produce Output 30.1.1 through Output 30.1.4:

data detergent;
   input Softness $ Brand $ Previous $ Temperature $ Count @@;
   datalines;
soft X yes high 19   soft X yes low 57
soft X no  high 29   soft X no  low 63 
soft M yes high 29   soft M yes low 49
soft M no  high 27   soft M no  low 53
med  X yes high 23   med  X yes low 47
med  X no  high 33   med  X no  low 66
med  M yes high 47   med  M yes low 55
med  M no  high 23   med  M no  low 50
hard X yes high 24   hard X yes low 37
hard X no  high 42   hard X no  low 68
hard M yes high 43   hard M yes low 52
hard M no  high 30   hard M no  low 42
;
title 'Detergent Preference Study';
proc catmod data=detergent;
   response 1 0;
   weight Count;
   model Brand=Softness|Previous|Temperature / freq prob;
   title2 'Saturated Model';
run;

The Data Summary table (Output 30.1.1) indicates that you have two response levels and twelve populations.

Output 30.1.1: Detergent Preference Study: Linear Model Analysis

Detergent Preference Study
Saturated Model

The CATMOD Procedure

Data Summary
Response Brand Response Levels 2
Weight Variable Count Populations 12
Data Set DETERGENT Total Frequency 1008
Frequency Missing 0 Observations 24


The Population Profiles table in Output 30.1.2 displays the ordering of independent variable levels as used in the table of parameter estimates.

Output 30.1.2: Population Profiles

Population Profiles
Sample Softness Previous Temperature Sample Size
1 hard no high 72
2 hard no low 110
3 hard yes high 67
4 hard yes low 89
5 med no high 56
6 med no low 116
7 med yes high 70
8 med yes low 102
9 soft no high 56
10 soft no low 116
11 soft yes high 48
12 soft yes low 106


Since Brand M is the first level in the Response Profiles table (Output 30.1.3), the RESPONSE statement causes Pr(Brand=M) to be the single response function modeled.

Output 30.1.3: Response Profiles, Frequencies, and Probabilities

Response Profiles
Response Brand
1 M
2 X

Response Frequencies
Sample Response Number
1 2
1 30 42
2 42 68
3 43 24
4 52 37
5 23 33
6 50 66
7 47 23
8 55 47
9 27 29
10 53 63
11 29 19
12 49 57

Response Probabilities
Sample Response Number
1 2
1 0.41667 0.58333
2 0.38182 0.61818
3 0.64179 0.35821
4 0.58427 0.41573
5 0.41071 0.58929
6 0.43103 0.56897
7 0.67143 0.32857
8 0.53922 0.46078
9 0.48214 0.51786
10 0.45690 0.54310
11 0.60417 0.39583
12 0.46226 0.53774


The Analysis of Variance table in Output 30.1.4 shows that all of the interactions are nonsignificant.

Output 30.1.4: Analysis of Variance

Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Intercept 1 983.13 <.0001
Softness 2 0.09 0.9575
Previous 1 22.68 <.0001
Softness*Previous 2 3.85 0.1457
Temperature 1 3.67 0.0555
Softness*Temperature 2 0.23 0.8914
Previous*Temperature 1 2.26 0.1324
Softnes*Previou*Temperat 2 0.76 0.6850
Residual 0 . .


Therefore, a main-effects model is fit with the following statements:

   model Brand=Softness Previous Temperature
       / clparm noprofile design;
   title2 'Main-Effects Model';
run;
quit;

The PROC CATMOD statement is not required due to the interactive capability of the CATMOD procedure. The NOPROFILE option suppresses the redisplay of the Response Profiles table. The CLPARM option produces 95% confidence limits for the parameter estimates. Output 30.1.5 through Output 30.1.7 are produced.

The design matrix in Output 30.1.5 displays the results of the differential-effects modeling used in PROC CATMOD.

Output 30.1.5: Main-Effects Design Matrix

Detergent Preference Study
Main-Effects Model

The CATMOD Procedure

Data Summary
Response Brand Response Levels 2
Weight Variable Count Populations 12
Data Set DETERGENT Total Frequency 1008
Frequency Missing 0 Observations 24

Response Functions and Design Matrix
Sample Response
Function
Design Matrix
1 2 3 4 5
1 0.41667 1 1 0 1 1
2 0.38182 1 1 0 1 -1
3 0.64179 1 1 0 -1 1
4 0.58427 1 1 0 -1 -1
5 0.41071 1 0 1 1 1
6 0.43103 1 0 1 1 -1
7 0.67143 1 0 1 -1 1
8 0.53922 1 0 1 -1 -1
9 0.48214 1 -1 -1 1 1
10 0.45690 1 -1 -1 1 -1
11 0.60417 1 -1 -1 -1 1
12 0.46226 1 -1 -1 -1 -1


The analysis of variance table in Output 30.1.6 shows that previous use of Brand M, together with the temperature of the laundry water, is a significant factor in whether a subject prefers Brand M laundry detergent. The table also shows that the additive model fits since the goodness-of-fit statistic (the residual chi-square) is nonsignificant.

Output 30.1.6: ANOVA Table for the Main-Effects Model

Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Intercept 1 1004.93 <.0001
Softness 2 0.24 0.8859
Previous 1 20.96 <.0001
Temperature 1 3.95 0.0468
Residual 7 8.26 0.3100


The chi-square test in Output 30.1.7 shows that the Softness parameters are not significantly different from zero; as expected, the Wald confidence limits for these two estimates contain zero. So softness of the water is not a factor in choosing Brand M.

Output 30.1.7: WLS Estimates for the Main-Effects Model

Analysis of Weighted Least Squares Estimates
Parameter   Estimate Standard
Error
Chi-
Square
Pr > ChiSq 95% Confidence Limits
Intercept   0.5080 0.0160 1004.93 <.0001 0.4766 0.5394
Softness hard -0.00256 0.0218 0.01 0.9066 -0.0454 0.0402
  med 0.0104 0.0218 0.23 0.6342 -0.0323 0.0530
Previous no -0.0711 0.0155 20.96 <.0001 -0.1015 -0.0407
Temperature high 0.0319 0.0161 3.95 0.0468 0.000446 0.0634


The negative coefficient for Previous (–0.0711) indicates that the first level of Previous (which is shown to be 'no') is associated with a smaller probability of preferring Brand M than the second level of Previous (with coefficient constrained to be 0.0711 since the parameter estimates for a given effect must sum to zero). In other words, previous users of Brand M are much more likely to prefer it than those who have never used it before.

Similarly, the positive coefficient for Temperature indicates that the first level of Temperature (which, from the Population Profiles table, is 'high') has a larger probability of preferring Brand M than the second level of Temperature. In other words, those who do their laundry in hot water are more likely to prefer Brand M than those who do their laundry in cold water.