Example 97.4 Variance Estimation by Using Replicate Weights

Consider the data set `LibrarySurvey` from Getting Started: SURVEYPHREG Procedure. The selected sample contains 100 transactions from ten branch libraries. A set of replicate weights and jackknife coefficients are created by randomly assigning observation units in disjoint groups of nearly equal size within each stratum. A total of 46 different groups are created. The data set `LibraryRepWeights` is similar to the data set `LibrarySurvey` except that it also contains replicate weights `repwt_1` to `repwt_46`. Each column of replicate weights is obtained by deleting one group of observations and adjusting the sampling weights for the other groups in that stratum (Rust, 1985).

The data set `LibraryJKCOEF` contains the jackknife coefficient for every replicate sample. The variable `replicate` denotes the replicate number, `donorstratum` denotes the stratum identification for that replicate, and `jkcoefficient` denotes the jackknife coefficient for that replicate sample.

```data LibrarySurvey;
set LibrarySurvey;
randomorder = ranuni(12345);
run;
proc sort data = LibrarySurvey out = LibrarySurvey;
by Branch randomorder;
run;
data LibrarySurvey;
set LibrarySurvey;
array nGroup{10} (2 2 2 4 4 4 4 8 8 8);
GroupPSU = mod(_N_,nGroup{Branch});
drop randomorder nGroup1 nGroup2 nGroup3 nGroup4
nGroup5 nGroup6 nGroup7 nGroup8 nGroup9 nGroup10;
run;

proc surveymeans data = LibrarySurvey varmethod = jk
(outweights = LibraryRepWeights outjkcoefs = LibraryJKCOEF);
weight SamplingWeight;
strata Branch;
cluster GroupPSU;
var Age;
run;
```

It is not necessary to provide replicate weights to compute jackknife variance estimates using the SURVEYPHREG procedure. If you do not specify the replicate weights, then the procedure creates replicate weights for you. For this illustration, assume that `LibraryRepWeights` and `LibraryJKCOEF` are the only two data sets available for analysis.

The following SAS statements request a proportional hazards regression of `lenBorrow` on `Age`. The variable `Returned` is the censor indicator, and the value 0 indicates a censored observation. The WEIGHT statement specifies the sampling weight variable, and the REPWEIGHTS statement specifies replicate weight variables `RepWt_1` to `RepWt_46`. The JKCOEFS= option in the REPWEIGHTS statement specifies the jackknife coefficient for each replicate sample. The VARMETHOD= option in the MODEL statement requests the jackknife variance estimation method. A STRATA statement is not required when the REPWEIGHTS statement is specified.

```proc surveyphreg data = LibraryRepWeights varmethod = jk;
weight SamplingWeight;
repweights RepWt_: / jkcoefs = LibraryJKCOEF;
model lenBorrow*Returned(0) = Age;
run;
```

Output 97.4.1 displays some summary information. The Number of Observations, Censored Summary, and Weighted Censored Summary tables are exactly the same as in the example discussed in Getting Started: SURVEYPHREG Procedure. The Variance Estimation table displays information about the variance estimation, such as the name of the variance estimation method and the number of replicate samples.

Output 97.4.1: Summary Statistics for Overall Analysis

The SURVEYPHREG Procedure

 Number of Observations Read 100 100 11616.8 11616.8

Summary of the Number of Event and Censored
Values
Total Event Censored Percent
Censored
100 90 10 10.00

Summary of the Weighted Number of Event
and Censored Values
Total Event Censored Percent
Censored
11616.79 10449.22 1167.57 10.05

Variance Estimation
Method Jackknife
Replicate Weights WORK.LIBRARYREPWEIGHTS
Number of Replicates 46

Output 97.4.2 shows that the estimated regression coefficient is 0.0616 with a standard error of 0.009. The denominator degrees of freedom (46) for the t test is equal to the number of replicates used. Note that the estimated proportional hazards regression coefficient is the same as the estimated proportional hazards regression coefficient in the example in Getting Started: SURVEYPHREG Procedure, but the standard error and the denominator degrees of freedom are different. This is not surprising because these two examples use the same estimator to estimate the regression coefficients but different estimators to estimate the variance.

Output 97.4.2: Inferences Based on Survey Design for Overall Analysis

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
Age 46 0.061593 0.009159 6.73 <.0001 1.064