Getting Started: SURVEYPHREG Procedure

This section uses a data set that is obtained by stratified random sampling from a simulated finite population to illustrate some of the basic features of PROC SURVEYPHREG.

Suppose the library system for a small county wants to study the length of time that books are borrowed over a specified study period, adjusting for the age of the borrower and accounting for the fact that some books are never returned. Suppose there are 10 branch libraries in the county. Assume that a list of 11,617 (simulated) transactions is available for the study period October 1, 2008, to December 31, 2008, and assume that this list can be used as the sampling frame. A stratified random sample with replacement is used to select 100 transactions, where branch libraries are the strata. The total number of transactions within branches range from 510 to 2,011 for the study period. The total sample size of 100 transactions is allocated proportionally across branches based on the number of transactions. For each selected transaction, telephone interviews were conducted to find out additional characteristics of the borrower. The data set LibrarySurvey contains the following variables for all units (transactions) in the sample:

  • Branch, the library branch from which the book was borrowed

  • SampleWeight, the survey sampling weight for the transaction

  • CheckOut, the date the book was borrowed

  • CheckIn, the date the book was returned, with a missing value if the book was not returned by December 31, 2008

  • Age, the age of the borrower

data LibrarySurvey;
   input Branch         2.
         SamplingWeight 7.2
         CheckOut       date10.
         CheckIn        date10.
 1 103.60 08NOV2008 13NOV2008 18
 1 103.60 01OCT2008 07OCT2008 30
 1 103.60 05NOV2008 06NOV2008 73
 1 103.60 25OCT2008 26OCT2008 53
 1 103.60 09NOV2008 10NOV2008 55
 2 127.50 10DEC2008 15DEC2008 39
 2 127.50 19DEC2008         . 33
 2 127.50 26NOV2008 27NOV2008 41
 2 127.50 03NOV2008 07NOV2008 33

   ... more lines ...   

10 118.35 14NOV2008 17NOV2008 29
10 118.35 11DEC2008 13DEC2008 35
10 118.35 21NOV2008 23NOV2008 46
data LibrarySurvey;
   set LibrarySurvey;
   Returned = (CheckIn ^= .);
   if (Returned) then
      lenBorrow = CheckIn                - CheckOut;
   lenBorrow = input('31Dec2008',date9.) - CheckOut;

PROC SURVEYPHREG can be used to estimate the regression parameters of a proportional hazards model and the design-based variance of the estimated coefficients. The design-based variance is useful when the finite population is considered fixed, as in this example. See Lohr (2010) and Särndal, Swensson, and Wretman (1992) for details.

The following statements request a proportional hazards regression of lenBorrow on Age with Returned as the censor indicator. A transaction is considered to be censored if its check-in date is missing. The WEIGHT statement specifies the sampling weight variable (SamplingWeight), and the STRATA statement specifies the stratification variable (Branch).

proc surveyphreg data = LibrarySurvey;
   weight SamplingWeight;
   strata Branch;
   model lenBorrow*Returned(0) = Age;

Summary information about the model, number of observations, survey design, censored values, and variance estimation method are shown in Figure 93.1. The Model Information table summarizes the model you fit. The Number of Observations table displays the number of observations read and used by the procedure. This table also displays the sum of weights read and used. The sum of weights read (11,616.79) can be used as an estimator of the population size, and the sum of weights used can be used as an estimator of the respondent size in the population. The Design Summary table displays survey design information such as stratification and clustering. This example implements a stratified design with 10 strata. The Censored Summary and Weighted Censored Summary tables display the (weighted) number of censored and event units. Weighted counts can be used as estimators of the corresponding finite population quantities. For example, Figure 93.1 shows that 10% of the sampled units are censored and an estimated 10.05% of the population units are censored.

Figure 93.1: Summary Statistics


Model Information
Dependent Variable lenBorrow
Censoring Variable Returned
Censoring Value(s) 0
Weight Variable SamplingWeight
Stratum Variable Branch
Ties Handling BRESLOW

Number of Observations Read 100
Number of Observations Used 100
Sum of Weights Read 11616.79
Sum of Weights Used 11616.79

Design Summary
Number of Strata 10

Summary of the Number of Event and Censored
Total Event Censored Percent
100 90 10 10.00

Summary of the Weighted Number of Event
and Censored Values
Total Event Censored Percent
11616.79 10449.22 1167.57 10.05

Variance Estimation
Method Taylor Series

Parameter estimates and their standard errors are shown in Figure 93.2. The estimated regression coefficient is highly significant with a value of 0.062, indicating a positive association between age and the length of time books are borrowed (recall that these are simulated data). In this example, the procedure uses the STRATA and WEIGHT statements to incorporate stratification and unequal weighting, respectively, into variance estimation. The degrees of freedom are calculated as the number of sampling units (100) minus the number of strata (10). Note that the estimated variance reported in Figure 93.2 ignores the finite population correction (fpc). You can use the TOTAL= or RATE= option in the PROC statement to include an fpc in your variance estimator.

Figure 93.2: Weighted Estimates and Their Standard Errors

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Age 90 0.061593 0.008366 7.36 <.0001 1.064