Example 90.7 Domain Analysis

Recall the example in the section Getting Started: SURVEYREG  Procedure, which analyzed a stratified simple random sample from a junior high school to examine how household income and the number of children in a household affect students’ average weekly spending for ice cream. You can use the same sample to analyze the average weekly spending among male and female students. Because student gender is unrelated to the design of the sample, this kind of analysis is called domain analysis (subgroup analysis).

This example shows how you can use PROC SURVEYREG  to perform domain analysis. The data set follows:

data IceCreamDataDomain;
   input Grade Spending Income Gender$ @@;
   datalines; 
7   7  39  M   7   7  38  F   8  12  47  F 
9  10  47  M   7   1  34  M   7  10  43  M
7   3  44  M   8  20  60  F   8  19  57  M
7   2  35  M   7   2  36  F   9  15  51  F
8  16  53  F   7   6  37  F   7   6  41  M
7   6  39  M   9  15  50  M   8  17  57  F
8  14  46  M   9   8  41  M   9   8  41  F
9   7  47  F   7   3  39  F   7  12  50  M
7   4  43  M   9  14  46  F   8  18  58  M
9   9  44  F   7   2  37  F   7   1  37  M
7   4  44  M   7  11  42  M   9   8  41  M 
8  10  42  M   8  13  46  F   7   2  40  F
9   6  45  F   9  11  45  M   7   2  36  F
7   9  46  F
;

data IceCreamDataDomain; 
   set IceCreamDataDomain; 
   if Grade=7 then Prob=20/1824;
   if Grade=8 then Prob=9/1025;
   if Grade=9 then Prob=11/1151;
   Weight=1/Prob;
run;

In the data set IceCreamDataDomain, the variable Grade indicates a student’s grade, which is the stratification variable. The variable Spending contains the dollar amount of each student’s average weekly spending for ice cream. The variable Income specifies the household income, in thousands of dollars. The variable Gender indicates a student’s gender. The sampling weights are created by using the reciprocals of the probabilities of selection, as follows:

data StudentTotals;
   input Grade _TOTAL_; 
   datalines;
7 1824
8 1025
9 1151
;

In the data set StudentTotals, the variable Grade is the stratification variable, and the variable _TOTAL_ contains the total numbers of students in the strata in the survey population.

The following statements demonstrate how you can analyze the relationship between spending and income among male and female students:

title1 'Ice Cream Spending Analysis';
title2 'Domain Analysis by Gender';
proc surveyreg data=IceCreamDataDomain total=StudentTotals;
   strata Grade; 
   model Spending = Income; 
   domain Gender;
   weight Weight;
run;

Output 90.7.1 gives a summary of the domains.

Output 90.7.1 Domain Analysis Summary
Ice Cream Spending Analysis
Domain Analysis by Gender

The SURVEYREG Procedure
 
Gender=F
 
Domain Regression Analysis for Variable Spending

Domain Summary
Number of Observations 40
Number of Observations in Domain 19
Number of Observations Not in Domain 21
Sum of Weights in Domain 1926.9
Weighted Mean of Spending 9.37611
Weighted Sum of Spending 18066.5

Ice Cream Spending Analysis
Domain Analysis by Gender

The SURVEYREG Procedure
 
Gender=M
 
Domain Regression Analysis for Variable Spending

Domain Summary
Number of Observations 40
Number of Observations in Domain 21
Number of Observations Not in Domain 19
Sum of Weights in Domain 2073.1
Weighted Mean of Spending 8.92305
Weighted Sum of Spending 18498.7

Output 90.7.2 shows the parameter estimates for the model within each domain.

Output 90.7.2 Parameter Estimates within Domain
Ice Cream Spending Analysis
Domain Analysis by Gender

The SURVEYREG Procedure
 
Gender=F
 
Domain Regression Analysis for Variable Spending

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept -23.751681 2.30795437 -10.29 <.0001
Income 0.735366 0.04757001 15.46 <.0001

Note: The denominator degrees of freedom for the t tests is 37.


Ice Cream Spending Analysis
Domain Analysis by Gender

The SURVEYREG Procedure
 
Gender=M
 
Domain Regression Analysis for Variable Spending

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept -23.213291 2.13361241 -10.88 <.0001
Income 0.729419 0.04589801 15.89 <.0001

Note: The denominator degrees of freedom for the t tests is 37.


For this particular example, the effect Income is significant for both models built within subgroups of male and female students, and the models are quite similar. In many other cases, regression models vary from subgroup to subgroup.