Consider the data set LibrarySurvey from Getting Started: SURVEYPHREG Procedure. The selected sample contains 100 transactions from ten branch libraries. A set of replicate weights and jackknife coefficients are created by randomly assigning observation units in disjoint groups of nearly equal size within each stratum. A total of 46 different groups are created. The data set LibraryRepWeights is similar to the data set LibrarySurvey except that it also contains replicate weights repwt_1 to repwt_46. Each column of replicate weights is obtained by deleting one group of observations and adjusting the sampling weights for the other groups in that stratum (Rust; 1985).
The data set LibraryJKCOEF contains the jackknife coefficient for every replicate sample. The variable replicate denotes the replicate number, donorstratum denotes the stratum identification for that replicate, and jkcoefficient denotes the jackknife coefficient for that replicate sample.
data LibrarySurvey; set LibrarySurvey; randomorder = ranuni(12345); proc sort data = LibrarySurvey out = LibrarySurvey; by Branch randomorder; run; data LibrarySurvey; set LibrarySurvey; array nGroup{10} (2 2 2 4 4 4 4 8 8 8); GroupPSU = mod(_N_,nGroup{Branch}); drop randomorder nGroup1 nGroup2 nGroup3 nGroup4 nGroup5 nGroup6 nGroup7 nGroup8 nGroup9 nGroup10; run; proc surveymeans data = LibrarySurvey varmethod = jk (outweights = LibraryRepWeights outjkcoefs = LibraryJKCOEF); weight SamplingWeight; strata Branch; cluster GroupPSU; var Age; run;
It is not necessary to provide replicate weights to compute jackknife variance estimates using the SURVEYPHREG procedure. If you do not specify the replicate weights, then the procedure creates replicate weights for you. For this illustration, assume that LibraryRepWeights and LibraryJKCOEF are the only two data sets available for analysis.
The following SAS statements request a proportional hazards regression of lenBorrow on Age. The variable Returned is the censor indicator, and the value 0 indicates a censored observation. The WEIGHT statement specifies the sampling weight variable, and the REPWEIGHTS statement specifies replicate weight variables RepWt_1 to RepWt_46. The JKCOEFS= option in the REPWEIGHTS statement specifies the jackknife coefficient for each replicate sample. The VARMETHOD= option in the MODEL statement requests the jackknife variance estimation method. A STRATA statement is not required when the REPWEIGHTS statement is specified.
proc surveyphreg data = LibraryRepWeights varmethod = jk; weight SamplingWeight; repweights RepWt_: / jkcoefs = LibraryJKCOEF; model lenBorrow*Returned(0) = Age; run;
Output 89.4.1 displays some summary information. The "Number of Observations," "Censored Summary," and "Weighted Censored Summary" tables are exactly the same as in the example discussed in Getting Started: SURVEYPHREG Procedure. The "Variance Estimation" table displays information about the variance estimation, such as the name of the variance estimation method and the number of replicate samples.
Number of Observations Read | 100 |
---|---|
Number of Observations Used | 100 |
Sum of Weights Read | 11616.79 |
Sum of Weights Used | 11616.79 |
Summary of the Number of Event and Censored Values |
|||
---|---|---|---|
Total | Event | Censored | Percent Censored |
100 | 90 | 10 | 10.00 |
Summary of the Weighted Number of Event and Censored Values |
|||
---|---|---|---|
Total | Event | Censored | Percent Censored |
11616.79 | 10449.22 | 1167.57 | 10.05 |
Variance Estimation | |
---|---|
Method | Jackknife |
Replicate Weights | WORK.LIBRARYREPWEIGHTS |
Number of Replicates | 46 |
Output 89.4.2 shows that the estimated regression coefficient is 0.0616 with a standard error of 0.009. The denominator degrees of freedom (46) for the test is equal to the number of replicates used. Note that the estimated proportional hazards regression coefficient is the same as the estimated proportional hazards regression coefficient in the example in Getting Started: SURVEYPHREG Procedure, but the standard error and the denominator degrees of freedom are different. This is not surprising because these two examples use the same estimator to estimate the regression coefficients but different estimators to estimate the variance.
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
Age | 46 | 0.061593 | 0.009159 | 6.73 | <.0001 | 1.064 |