The SURVEYPHREG Procedure

Balanced Repeated Replication (BRR) Method

The balanced repeated replication (BRR) method requires that the full sample be drawn by using a stratified sample design with two primary sampling units (PSUs) per stratum. The BRR method constructs half-sample replicates by deleting one PSU per stratum according to a Hadamard matrix and doubling the original weight of the other PSU in that stratum. Let H be the total number of strata. The total number of replicates R is the smallest multiple of 4 that is greater than H. However, if you prefer a larger number of replicates, you can specify the REPS=n method-option. If a $n\times n$ Hadamard matrix cannot be constructed, the number of replicates is increased until a Hadamard matrix becomes available.

Each replicate is obtained by deleting one PSU per stratum according to a corresponding Hadamard matrix and adjusting the original weights for the remaining PSUs. The new weights are called replicate weights.

Replicates are constructed by using the first H columns of the $R\times R$ Hadamard matrix. The rth ($r=1, 2, ..., R$) replicate is drawn from the full sample according to the rth row of the Hadamard matrix as follows:

  • If the $(r,h)$ element of the Hadamard matrix is 1, then the first PSU of stratum h is included in the rth replicate and the second PSU of stratum h is excluded.

  • If the $(r,h)$ element of the Hadamard matrix is –1, then the second PSU of stratum h is included in the rth replicate and the first PSU of stratum h is excluded.

The replicate weights of the remaining PSUs in each half sample are then doubled to their original weights. For more detail about the BRR method, see Wolter (2007) and Lohr (2010).

By default, an appropriate Hadamard matrix is generated automatically to create the replicates. You can display the Hadamard matrix by specifying the VARMETHOD=BRR(PRINTH) method-option. If you provide a Hadamard matrix by specifying the VARMETHOD=BRR(HADAMARD=) method-option, then the replicates are generated according to the provided Hadamard matrix. You can use the VARMETHOD=BRR(OUTWEIGHTS=) method-option to store the replicate weights in a SAS data set.

Let $\hat{\bbeta }$ be the estimated proportional hazards regression coefficients from the full sample, and let ${\hat{\bbeta }_ r}$ be the estimated proportional hazards regression coefficients from the rth replicate by using replicate weights. PROC SURVEYPHREG estimates the covariance matrix of $\hat{\bbeta }$ by

\[  \widehat{\mb {V}}(\hat{\bbeta }) = \frac{1}{R} \sum _{r=1}^ R \left( {\hat{\bbeta }_ r} - \hat{\bbeta } \right) \left( {\hat{\bbeta }_ r} - \hat{\bbeta } \right)’  \]

with H degrees of freedom, where H is the number of strata.

If you specify the CENTER=REPLICATES method-option, then PROC SURVEYPHREG computes the covariance matrix of $\hat{\bbeta }$ by

\[  \widehat{\mb {V}}(\hat{\bbeta }) = \frac{1}{R} \sum _{r=1}^ R \left( {\hat{\bbeta }_ r} - \overline{\hat{\bbeta }_ r} \right) \left( {\hat{\bbeta }_ r} - \overline{\hat{\bbeta }_ r} \right)’  \]

where $\overline{\hat{\bbeta }_ r}$ is the average of the replicate estimates as follows:

\[  \overline{\hat{\bbeta }_ r} = \frac{1}{R} \sum _{r=1}^ R \hat{\bbeta _ r}  \]

If one or more components of $\hat\bbeta _ r$ cannot be calculated for some replicates, then the variance estimate is computed by using only the replicates for which the proportional hazards regression coefficients can be estimated. Estimability and nonconvergence are the two most common reasons why $\hat\bbeta _ r$ might not be available for a replicate sample even if $\hat\bbeta $ is defined for the full sample. Let $R_ a$ be the number of replicates where $\hat\bbeta _ r$ is available, and let $R-R_ a$ be the number of replicates where $\hat\bbeta _ r$ is not available. Without loss of generality, assume that $\hat{\bbeta }_ r$ is available only for the first $R_ a$ replicates; then the BRR variance estimator is

\[  \widehat{\mb {V}}(\hat{\bbeta }) = \frac{1}{R_ a} \sum _{r=1}^{R_ a} \left( {\hat{\bbeta }_ r} - \hat{\bbeta } \right) \left( {\hat{\bbeta }_ r} - \hat{\bbeta } \right)’  \]

with degrees of freedom equal to the minimum of H and $R_ a$, where H is the number of strata. Alternatively, you can use the FAY= method-option to request Fay’s BRR method, as discussed in the following section.

Fay’s BRR Method

The traditional BRR method constructs half-sample replicates by deleting one PSU per stratum according to a Hadamard matrix and doubling the original weight of the other PSU. Fay’s BRR method uses the Fay coefficient, $\epsilon $ $(0 \le \epsilon < 1)$, and instead of deleting one PSU per stratum, it multiplies the original weight by the coefficient $\epsilon $. The original weight of the remaining PSU in that stratum is multiplied by $2-\epsilon $. PROC SURVEYPHREG uses $\epsilon = 0.5$ as the default value; alternatively, you can specify a value for $\epsilon $ with the FAY= method-option. When $\epsilon =0$, Fay’s method becomes the traditional BRR method. For more details, see Dippo, Fay, and Morganstein (1984); Fay (1984, 1989); Judkins (1990). Because the traditional BRR method uses only half of the total sample in every replicate, several replicate estimators ($\hat\bbeta _ r$) might be undefined even when the full sample estimator ($\hat\bbeta $) is defined. Fay’s BRR method is especially useful for this situation because it uses all the sampled units in every replicate.

Let $\hat{\bbeta }$ be the estimated proportional hazards regression coefficients from the full sample, and let ${\hat{\bbeta }_ r}$ be the estimated regression coefficients that are obtained from the rth replicate by using replicate weights. PROC SURVEYPHREG estimates the covariance matrix of $\hat{\bbeta }$ by

\[  \widehat{\mb {V}}(\hat{\bbeta }) = \frac{1}{R(1-{\epsilon })^2} \sum _{r=1}^ R \left( {\hat{\bbeta }_ r} - \hat{\bbeta } \right) \left( {\hat{\bbeta }_ r} - \hat{\bbeta } \right)’  \]

with H degrees of freedom, where H is the number of strata.

Hadamard Matrix

PROC SURVEYPHREG uses a Hadamard matrix to construct replicates for BRR variance estimation. You can provide a Hadamard matrix for replicate construction by using the HADAMARD= method-option for VARMETHOD=BRR. Otherwise, PROC SURVEYPHREG generates an appropriate Hadamard matrix. You can display the Hadamard matrix by specifying the PRINTH method-option.

A Hadamard matrix $\bA $ of dimension R is a square matrix that has all elements equal to 1 or –1 such that $\bA ’\bA = R\bI $, where $\bI $ is an identity matrix of appropriate order. The dimension of a Hadamard matrix must equal 1, 2, or a multiple of 4.

For example, the following matrix is a Hadamard matrix of dimension k = 8:

\[  \begin{array}{rrrrrrrr} 1 &  1 &  1 &  1 &  1 &  1 &  1 &  1\\ 1 &  -1 &  1 &  -1 &  1 &  -1 &  1 &  -1\\ 1 &  1 &  -1 &  -1 &  1 &  1 &  -1 &  -1\\ 1 &  -1 &  -1 &  1 &  1 &  -1 &  -1 &  1\\ 1 &  1 &  1 &  1 &  -1 &  -1 &  -1 &  -1\\ 1 &  -1 &  1 &  -1 &  -1 &  1 &  -1 &  1\\ 1 &  1 &  -1 &  -1 &  -1 &  -1 &  1 &  1\\ 1 &  -1 &  -1 &  1 &  -1 &  1 &  1 &  -1 \end{array}  \]

For BRR replicate construction, the dimension of the Hadamard matrix must be at least H, where H denotes the number of first-stage strata in your design. If a Hadamard matrix of a given dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS data set in the HADAMARD= method-option. You must ensure that the matrix that you provide is actually a Hadamard matrix; PROC SURVEYPHREG does not check the validity of your Hadamard matrix.

See the section Balanced Repeated Replication (BRR) Method for details about how the Hadamard matrix is used to construct replicates for BRR variance estimation.