Example 100.4 Variance Estimation by Using Replicate Weights

Consider the data set LibrarySurvey from Getting Started: SURVEYPHREG Procedure. The selected sample contains 100 transactions from ten branch libraries. A set of replicate weights and jackknife coefficients are created by randomly assigning observation units in disjoint groups of nearly equal size within each stratum. A total of 46 different groups are created. The data set LibraryRepWeights is similar to the data set LibrarySurvey except that it also contains replicate weights repwt_1 to repwt_46. Each column of replicate weights is obtained by deleting one group of observations and adjusting the sampling weights for the other groups in that stratum (Rust, 1985).

The data set LibraryJKCOEF contains the jackknife coefficient for every replicate sample. The variable replicate denotes the replicate number, donorstratum denotes the stratum identification for that replicate, and jkcoefficient denotes the jackknife coefficient for that replicate sample.

data LibrarySurvey;
   set LibrarySurvey;
   randomorder = ranuni(12345);
proc sort data = LibrarySurvey out = LibrarySurvey;
   by Branch randomorder;
data LibrarySurvey;
   set LibrarySurvey;
   array nGroup{10} (2 2 2 4 4 4 4 8 8 8);
   GroupPSU = mod(_N_,nGroup{Branch});
   drop randomorder nGroup1 nGroup2 nGroup3 nGroup4
        nGroup5 nGroup6 nGroup7 nGroup8 nGroup9 nGroup10;

proc surveymeans data = LibrarySurvey varmethod = jk
               (outweights = LibraryRepWeights outjkcoefs = LibraryJKCOEF);
   weight SamplingWeight;
   strata Branch;
   cluster GroupPSU;
   var Age;

It is not necessary to provide replicate weights to compute jackknife variance estimates using the SURVEYPHREG procedure. If you do not specify the replicate weights, then the procedure creates replicate weights for you. For this illustration, assume that LibraryRepWeights and LibraryJKCOEF are the only two data sets available for analysis.

The following SAS statements request a proportional hazards regression of lenBorrow on Age. The variable Returned is the censor indicator, and the value 0 indicates a censored observation. The WEIGHT statement specifies the sampling weight variable, and the REPWEIGHTS statement specifies replicate weight variables RepWt_1 to RepWt_46. The JKCOEFS= option in the REPWEIGHTS statement specifies the jackknife coefficient for each replicate sample. The VARMETHOD= option in the MODEL statement requests the jackknife variance estimation method. A STRATA statement is not required when the REPWEIGHTS statement is specified.

proc surveyphreg data = LibraryRepWeights varmethod = jk;
   weight SamplingWeight;
   repweights RepWt_: / jkcoefs = LibraryJKCOEF;
   model lenBorrow*Returned(0) = Age;

Output 100.4.1 displays some summary information. The "Number of Observations," "Censored Summary," and "Weighted Censored Summary" tables are exactly the same as in the example discussed in Getting Started: SURVEYPHREG Procedure. The "Variance Estimation" table displays information about the variance estimation, such as the name of the variance estimation method and the number of replicate samples.

Output 100.4.1: Summary Statistics for Overall Analysis


Number of Observations Read 100
Number of Observations Used 100
Sum of Weights Read 11616.79
Sum of Weights Used 11616.79

Summary of the Number of Event and Censored
Total Event Censored Percent
100 90 10 10.00

Summary of the Weighted Number of Event
and Censored Values
Total Event Censored Percent
11616.79 10449.22 1167.57 10.05

Variance Estimation
Method Jackknife
Number of Replicates 46

Output 100.4.2 shows that the estimated regression coefficient is 0.0616 with a standard error of 0.009. The denominator degrees of freedom (46) for the t test is equal to the number of replicates used. Note that the estimated proportional hazards regression coefficient is the same as the estimated proportional hazards regression coefficient in the example in Getting Started: SURVEYPHREG Procedure, but the standard error and the denominator degrees of freedom are different. This is not surprising because these two examples use the same estimator to estimate the regression coefficients but different estimators to estimate the variance.

Output 100.4.2: Inferences Based on Survey Design for Overall Analysis

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
t Value Pr > |t| Hazard
Age 46 0.061593 0.009159 6.73 <.0001 1.064