Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The SURVEYMEANS Procedure

Statistical Computations

t Test for the Mean

If you specify the keyword T, PROC SURVEYMEANS computes the t value for testing that the population mean equals zero, H_0: \bar{Y}=0. The test statistic equals
t(\hat{{\bar{Y}}})={\hat{{\bar{Y}}}} / {{StdErr}(\hat{{\bar{Y}}})}
The two-sided p-value for this test is
\mathrm {Prob}(| T|\gt| t(\hat{{\bar{Y}}})|)
where T is a random variable with the t distribution with df degrees of freedom.

PROC SURVEYMEANS calculates the degrees of freedom for the t test as the number of clusters minus the number of strata. If there are no clusters, then df equals the number of observations minus the number of strata. If the design is not stratified, then df equals the number of clusters minus one. The procedure displays df for the t test if you specify the keyword DF in the PROC SURVEYMEANS statement.

If missing values or missing weights are present in your data, the number of strata, the number of observations, and the number of clusters are counted based on the observations in non-empty strata. See the section "Missing Values" for details. For degrees of freedom in domain analysis, see the section "Domain Analysis".

Domain Analysis

When you use a DOMAIN statement to request a domain analysis, the procedure computes the requested statistics for each domain.

For a domain D, let ID be the corresponding indicator variable:

I_{D}(h,i,j)=\{ 1 & {if observation (h,i,j) belongs to D} \ 0 & {otherwise} .

Let

z_{hij}=y_{hij}I_D(h,i,j)=\{ y_{hij} & {if observation (h,i,j) belongs to D} \ 0 & {otherwise} .

The requested statistics for variable y in domain D are computed based on the values of z.

Domain Mean The estimated mean of y in the domain D is

\hat{\bar{Y}_D}=( \sum_{h=1}^H\sum_{i=1}^{n_h} \sum_{j=1}^{m_{hi}} v_{hij} z_{hij} ) / v_{\cdot\cdot\cdot}

where

v_{hij} &=& w_{hij}I_D(h,i,j)=\{ w_{hij} & {if observation } (h,i,j) { bel... ...dot\cdot\cdot} &=& \sum_{h=1}^H\sum_{i=1}^{n_h} \sum_{j=1}^{m_{hi}} v_{hij}

The variance of \hat{\bar{Y}_D} is estimated by

\hat{V}(\hat{\bar{Y}_D})=\sum_{h=1}^H { \frac{n_h(1-f_h)}{n_h-1} \sum_{i=1}^{n_h} {(r_{hi\cdot}-\bar{r}_{h\cdot\cdot})^2}}
where
r_{hi\cdot}&=& ( \sum_{j=1}^{m_{hi}}v_{hij}(z_{hij}- \hat{\bar{Y}_D}) ) / v_... ...cdot\cdot} \ \bar{r}_{h\cdot\cdot} &=& ( \sum_{i=1}^{n_h}r_{hi\cdot} ) / n_h

Domain Total The estimated total in domain D is

\hat{Y}_D=\sum_{h=1}^H\sum_{i=1}^{n_h} \sum_{j=1}^{m_{hi}} v_{hij} z_{hij}
and its estimated variance is

\hat{V}(\hat{Y}_D)=\sum_{h=1}^H { \frac{n_h(1-f_h)}{n_h-1} \sum_{i=1}^{n_h} {(z_{hi\cdot}-\bar{z}_{h\cdot\cdot})^2}}
where
z_{hi\cdot}&=& \sum_{j=1}^{m_{hi}} v_{hij} z_{hij}\ \bar{z}_{h\cdot\cdot} &=& ( \sum_{i=1}^{n_h}z_{hi\cdot} ) / n_h

Degrees of Freedom For domain analysis, PROC SURVEYMEANS computes the degrees of freedom for t tests as the number of clusters in the non-empty strata minus the number of non-empty strata. When the sample design has no clusters, the degrees of freedom equals the number of observations in non-empty strata minus the number of non-empty strata. As discussed in the section "Missing Values", missing values and missing weights can result in empty strata. In domain analysis, an empty stratum can also occur when the stratum contains no observations in the specified domain. If no observations in a whole stratum belong to a domain, then this stratum is called an empty stratum for that domain.

For example,

   data new;
      input str clu y w d; 
      datalines;
   1 1 . 40 9 
   1 2 2  . 9
   1 3 . 25 9
   2 4 5 20 9
   2 5 8 15 9
   3 6 5 30 7 
   3 7 9 89 7
   3 8 6 23 7
   ;
   proc surveymeans df nobs nclu nmiss; 
      strata str;
      cluster clu;  
      var y;
      weight w;
      domain d;
   run;

Table 13.2: Calculations of df for Y
  Domain D=7 Domain D=9
Non Empty StrataSTR=3STR=2
Clusters Used in the AnalysisCLU=6, CLU=7, and CLU=8CLU=4 and CLU=5
df3-1=22-1=1

Although there are three strata in the data set, STR=1 is an empty stratum for variable Y because of missing values and missing weights. In addition, no observations in stratum STR=3 belong to domain D=9. Therefore, STR=3 becomes an empty stratum as well for variable Y in domain D=9. As a result, the total number of non-empty strata for domain D=9 is one. The non-empty stratum for domain D=9 and variable Y is stratum STR=2. The total number of clusters for domain D=9 is two, which belong to stratum STR=2. Thus, for variable Y in domain D=9, the degrees of freedom for the t tests of the domain mean is df=2-1=1. Similarly, for domain D=7, strata STR=1 and STR=2 are both empty strata, so the total number of strata is one (STR=3), and the total number of clusters is three ( CLU=6, CLU=7, and CLU=8). Table 13.2 illustrates how domains affect the total number of clusters and total number of strata in the df calculation. Figure 13.1 shows the df computed by the procedure.

 
The SURVEYMEANS Procedure

Domain Analysis: d
d Variable N N Miss Clusters DF
7 y 3 0 3 2
9 y 2 2 2 1
Figure 13.1: Degrees of Freedoms in Domain Analysis

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.