Consider the data set LibrarySurvey
from Getting Started: SURVEYPHREG Procedure. The selected sample contains 100 transactions from ten branch libraries. A set of replicate weights and jackknife coefficients
are created by randomly assigning observation units in disjoint groups of nearly equal size within each stratum. A total of
46 different groups are created. The data set LibraryRepWeights
is similar to the data set LibrarySurvey
except that it also contains replicate weights repwt_1
to repwt_46
. Each column of replicate weights is obtained by deleting one group of observations and adjusting the sampling weights for
the other groups in that stratum (Rust, 1985).
The data set LibraryJKCOEF
contains the jackknife coefficient for every replicate sample. The variable replicate
denotes the replicate number, donorstratum
denotes the stratum identification for that replicate, and jkcoefficient
denotes the jackknife coefficient for that replicate sample.
data LibrarySurvey; set LibrarySurvey; randomorder = ranuni(12345); run; proc sort data = LibrarySurvey out = LibrarySurvey; by Branch randomorder; run; data LibrarySurvey; set LibrarySurvey; array nGroup{10} (2 2 2 4 4 4 4 8 8 8); GroupPSU = mod(_N_,nGroup{Branch}); drop randomorder nGroup1 nGroup2 nGroup3 nGroup4 nGroup5 nGroup6 nGroup7 nGroup8 nGroup9 nGroup10; run; proc surveymeans data = LibrarySurvey varmethod = jk (outweights = LibraryRepWeights outjkcoefs = LibraryJKCOEF); weight SamplingWeight; strata Branch; cluster GroupPSU; var Age; run;
It is not necessary to provide replicate weights to compute jackknife variance estimates using the SURVEYPHREG procedure.
If you do not specify the replicate weights, then the procedure creates replicate weights for you. For this illustration,
assume that LibraryRepWeights
and LibraryJKCOEF
are the only two data sets available for analysis.
The following SAS statements request a proportional hazards regression of lenBorrow
on Age
. The variable Returned
is the censor indicator, and the value 0 indicates a censored observation. The WEIGHT statement specifies the sampling weight
variable, and the REPWEIGHTS statement specifies replicate weight variables RepWt_1
to RepWt_46
. The JKCOEFS= option in the REPWEIGHTS statement specifies the jackknife coefficient for each replicate sample. The VARMETHOD=
option in the MODEL statement requests the jackknife variance estimation method. A STRATA statement is not required when the
REPWEIGHTS statement is specified.
proc surveyphreg data = LibraryRepWeights varmethod = jk; weight SamplingWeight; repweights RepWt_: / jkcoefs = LibraryJKCOEF; model lenBorrow*Returned(0) = Age; run;
Output 97.4.1 displays some summary information. The “Number of Observations,” “Censored Summary,” and “Weighted Censored Summary” tables are exactly the same as in the example discussed in Getting Started: SURVEYPHREG Procedure. The “Variance Estimation” table displays information about the variance estimation, such as the name of the variance estimation method and the number of replicate samples.
Output 97.4.1: Summary Statistics for Overall Analysis
Number of Observations Read | 100 |
---|---|
Number of Observations Used | 100 |
Sum of Weights Read | 11616.79 |
Sum of Weights Used | 11616.79 |
Summary of the Number of Event and Censored Values |
|||
---|---|---|---|
Total | Event | Censored | Percent Censored |
100 | 90 | 10 | 10.00 |
Summary of the Weighted Number of Event and Censored Values |
|||
---|---|---|---|
Total | Event | Censored | Percent Censored |
11616.79 | 10449.22 | 1167.57 | 10.05 |
Variance Estimation | |
---|---|
Method | Jackknife |
Replicate Weights | WORK.LIBRARYREPWEIGHTS |
Number of Replicates | 46 |
Output 97.4.2 shows that the estimated regression coefficient is 0.0616 with a standard error of 0.009. The denominator degrees of freedom (46) for the t test is equal to the number of replicates used. Note that the estimated proportional hazards regression coefficient is the same as the estimated proportional hazards regression coefficient in the example in Getting Started: SURVEYPHREG Procedure, but the standard error and the denominator degrees of freedom are different. This is not surprising because these two examples use the same estimator to estimate the regression coefficients but different estimators to estimate the variance.
Output 97.4.2: Inferences Based on Survey Design for Overall Analysis
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
Age | 46 | 0.061593 | 0.009159 | 6.73 | <.0001 | 1.064 |