The CATMOD Procedure

Example 30.1 Linear Response Function, r=2 Responses

In an example from Ries and Smith (1963), the choice of detergent brand (Brand = M or X) is related to three other categorical variables: the softness of the laundry water (Softness = soft, medium, or hard), the temperature of the water (Temperature = high or low), and whether the subject was a previous user of Brand M (Previous = yes or no). The linear response function, which could also be specified as RESPONSE MARGINALS, yields one probability, Pr(brand preference=M), as the response function to be analyzed. Two models are fit in this example: the first model is a saturated one, containing all of the main effects and interactions, while the second is a reduced model containing only the main effects. The following statements produce Output 30.1.1 through Output 30.1.4:

data detergent;
   input Softness $ Brand $ Previous $ Temperature $ Count @@;
   datalines;
soft X yes high 19   soft X yes low 57
soft X no  high 29   soft X no  low 63 
soft M yes high 29   soft M yes low 49
soft M no  high 27   soft M no  low 53
med  X yes high 23   med  X yes low 47
med  X no  high 33   med  X no  low 66
med  M yes high 47   med  M yes low 55
med  M no  high 23   med  M no  low 50
hard X yes high 24   hard X yes low 37
hard X no  high 42   hard X no  low 68
hard M yes high 43   hard M yes low 52
hard M no  high 30   hard M no  low 42
;

title 'Detergent Preference Study';
proc catmod data=detergent;
   response 1 0;
   weight Count;
   model Brand=Softness|Previous|Temperature / freq prob;
   title2 'Saturated Model';
run;

The “Data Summary” table (Output 30.1.1) indicates that you have two response levels and twelve populations.

Output 30.1.1: Detergent Preference Study: Linear Model Analysis

Detergent Preference Study

Saturated Model

The CATMOD Procedure

Data Summary
Response	Brand	Response Levels	2
Weight Variable	Count	Populations	12
Data Set	DETERGENT	Total Frequency	1008
Frequency Missing	0	Observations	24

The “Population Profiles” table in Output 30.1.2 displays the ordering of independent variable levels as used in the table of parameter estimates.

Output 30.1.2: Population Profiles

Population Profiles
Sample	Softness	Previous	Temperature	Sample Size
1	hard	no	high	72
2	hard	no	low	110
3	hard	yes	high	67
4	hard	yes	low	89
5	med	no	high	56
6	med	no	low	116
7	med	yes	high	70
8	med	yes	low	102
9	soft	no	high	56
10	soft	no	low	116
11	soft	yes	high	48
12	soft	yes	low	106

Since Brand M is the first level in the “Response Profiles” table (Output 30.1.3), the RESPONSE statement causes Pr(Brand=M) to be the single response function modeled.

Output 30.1.3: Response Profiles, Frequencies, and Probabilities

Response Profiles
Response	Brand
1	M
2	X

Response Frequencies
Sample	Response Number
Sample	1	2
1	30	42
2	42	68
3	43	24
4	52	37
5	23	33
6	50	66
7	47	23
8	55	47
9	27	29
10	53	63
11	29	19
12	49	57

Response Probabilities
Sample	Response Number
Sample	1	2
1	0.41667	0.58333
2	0.38182	0.61818
3	0.64179	0.35821
4	0.58427	0.41573
5	0.41071	0.58929
6	0.43103	0.56897
7	0.67143	0.32857
8	0.53922	0.46078
9	0.48214	0.51786
10	0.45690	0.54310
11	0.60417	0.39583
12	0.46226	0.53774

The “Analysis of Variance” table in Output 30.1.4 shows that all of the interactions are nonsignificant.

Output 30.1.4: Analysis of Variance

Analysis of Variance
Source	DF	Chi-Square	Pr > ChiSq
Intercept	1	983.13	<.0001
Softness	2	0.09	0.9575
Previous	1	22.68	<.0001
Softness*Previous	2	3.85	0.1457
Temperature	1	3.67	0.0555
Softness*Temperature	2	0.23	0.8914
Previous*Temperature	1	2.26	0.1324
SoftnesPreviouTemperat	2	0.76	0.6850
Residual	0	.	.

Therefore, a main-effects model is fit with the following statements:

   model Brand=Softness Previous Temperature
       / clparm noprofile design;
   title2 'Main-Effects Model';
run;
quit;

The PROC CATMOD statement is not required due to the interactive capability of the CATMOD procedure. The NOPROFILE option suppresses the redisplay of the “Response Profiles” table. The CLPARM option produces 95% confidence limits for the parameter estimates. Output 30.1.5 through Output 30.1.7 are produced.

The design matrix in Output 30.1.5 displays the results of the differential-effects modeling used in PROC CATMOD.

Output 30.1.5: Main-Effects Design Matrix

Detergent Preference Study

Main-Effects Model

The CATMOD Procedure

Data Summary
Response	Brand	Response Levels	2
Weight Variable	Count	Populations	12
Data Set	DETERGENT	Total Frequency	1008
Frequency Missing	0	Observations	24

Response Functions and Design Matrix
Sample	Response Function	Design Matrix
Sample	Response Function	1	2	3	4	5
1	0.41667	1	1	0	1	1
2	0.38182	1	1	0	1	-1
3	0.64179	1	1	0	-1	1
4	0.58427	1	1	0	-1	-1
5	0.41071	1	0	1	1	1
6	0.43103	1	0	1	1	-1
7	0.67143	1	0	1	-1	1
8	0.53922	1	0	1	-1	-1
9	0.48214	1	-1	-1	1	1
10	0.45690	1	-1	-1	1	-1
11	0.60417	1	-1	-1	-1	1
12	0.46226	1	-1	-1	-1	-1

The analysis of variance table in Output 30.1.6 shows that previous use of Brand M, together with the temperature of the laundry water, is a significant factor in whether a subject prefers Brand M laundry detergent. The table also shows that the additive model fits since the goodness-of-fit statistic (the residual chi-square) is nonsignificant.

Output 30.1.6: ANOVA Table for the Main-Effects Model

Analysis of Variance
Source	DF	Chi-Square	Pr > ChiSq
Intercept	1	1004.93	<.0001
Softness	2	0.24	0.8859
Previous	1	20.96	<.0001
Temperature	1	3.95	0.0468
Residual	7	8.26	0.3100

The chi-square test in Output 30.1.7 shows that the Softness parameters are not significantly different from zero; as expected, the Wald confidence limits for these two estimates contain zero. So softness of the water is not a factor in choosing Brand M.

Output 30.1.7: WLS Estimates for the Main-Effects Model

Analysis of Weighted Least Squares Estimates
Parameter		Estimate	Standard Error	Chi- Square	Pr > ChiSq	95% Confidence Limits
Intercept		0.5080	0.0160	1004.93	<.0001	0.4766	0.5394
Softness	hard	-0.00256	0.0218	0.01	0.9066	-0.0454	0.0402
	med	0.0104	0.0218	0.23	0.6342	-0.0323	0.0530
Previous	no	-0.0711	0.0155	20.96	<.0001	-0.1015	-0.0407
Temperature	high	0.0319	0.0161	3.95	0.0468	0.000446	0.0634

The negative coefficient for Previous (–0.0711) indicates that the first level of Previous (which is shown to be 'no') is associated with a smaller probability of preferring Brand M than the second level of Previous (with coefficient constrained to be 0.0711 since the parameter estimates for a given effect must sum to zero). In other words, previous users of Brand M are much more likely to prefer it than those who have never used it before.

Similarly, the positive coefficient for Temperature indicates that the first level of Temperature (which, from the “Population Profiles” table, is 'high') has a larger probability of preferring Brand M than the second level of Temperature. In other words, those who do their laundry in hot water are more likely to prefer Brand M than those who do their laundry in cold water.