Example 88.4 Analyzing Survey Data with Missing Values

As described in the section Missing Values, the SURVEYMEANS  procedure excludes an observation from the analysis if it has a missing value for the analysis variable or a nonpositive value for the WEIGHT variable.

However, if there is evidence indicating that the nonrespondents are different from the respondents for your study, you can use the NOMCAR option to compute descriptive statistics among respondents while still counting the number of nonrespondents.

We use the ice cream example in the section Stratified Sampling to illustrate how to perform similar analysis when there are missing values.

Suppose that some of the students failed to provide the amounts spent on ice cream, as shown in the following data set, IceCream:

data IceCream;
   input Grade Spending @@; 
   if Grade=7 then Prob=20/1824;
   if Grade=8 then Prob=9/1025;
   if Grade=9 then Prob=11/1151;
   Weight=1/Prob;    
   datalines; 
7 7  7  7  8  .  9 10  7  .  7 10  7  3  8 20  8 19  7 2
7 .  9 15  8 16  7  6  7  6  7  6  9 15  8 17  8 14  9 .
9 8  9  7  7  3  7 12  7  4  9 14  8 18  9  9  7  2  7 1
7 4  7 11  9  8  8  .  8 13  7  .  9  .  9 11  7  2  7 9
;

data StudentTotals;
   input Grade _total_; 
   datalines;
7 1824
8 1025
9 1151
;

Considering the possibility that those students who did not respond spend differently than those students who did respond, you can use the NOMCAR option to request the analysis to treat the respondents as a domain rather than exclude the nonrespondents.

The following SAS statements produce the desired analysis:

title 'Analysis of Ice Cream Spending';
proc surveymeans data=IceCream total=StudentTotals nomcar mean sum;
   strata Grade; 
   var Spending;
   weight Weight;
run;

Output 88.4.1 summarizes the analysis including the variance estimation method.

Output 88.4.1 Analysis of Incomplete Ice Cream Data Excluding Observations with Missing Values
Analysis of Ice Cream Spending

The SURVEYMEANS Procedure

Data Summary
Number of Strata 3
Number of Observations 40
Sum of Weights 4000

Variance Estimation
Method Taylor Series
Missing Values NOMCAR

Output 88.4.2 shows the mean and total estimates when treating respondents as a domain in the student population. Although the point estimates are the same as the analysis without the NOMCAR option, for this particular example, the variance estimations are slightly higher when you assume that the missingness is not completely at random.

Output 88.4.2 Analysis of Incomplete Ice Cream Data Excluding Observations with Missing Values
Statistics
Variable Mean Std Error of Mean Sum Std Dev
Spending 9.770542 0.652347 32139 3515.126876