The SURVEYPHREG Procedure

Taylor Series Linearization

The Taylor series linearization method is the default variance estimation method used by PROC SURVEYPHREG. See the section Notation and Estimation for definitions of the notation used in this section. Let

\[  S^{(r)}(\bbeta ,t) = \sum _{A} w_{hij}y_{hij}(t) \exp \left( \bbeta ’\bZ _{hij}(t) \right) \bZ _{hij}^{\bigotimes r}(t)  \]

where $r = 0, 1$. Let A be the set of indices in the selected sample. Let

\[  \mb {a}^{\bigotimes r} = \left\{  \begin{array}{lrl} \mb {a} \mb {a}‘ & ,&  r = 1 \\ I_{\text {dim}(\mb {a})} &  , &  r = 0 \end{array} \right.  \]

and let $I_{\text {dim}(\mb {a})}$ be the identity matrix of appropriate dimension.

Let $ \bar{\bZ }(\bbeta ,t) = \frac{S^{(1)}(\bbeta ,t)}{S^{(0)}(\bbeta ,t)} $. The score residual for the $(h,i,j)$ subject is

$\displaystyle  \bL _{hij}(\bbeta )  $
$\displaystyle  =  $
$\displaystyle  \Delta _{hij}\biggl \{ \bZ _{hij}(t_{hij}) - \bar{\bZ }(\bbeta , t_{hij})\biggr \}   $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  - \sum _{(h\prime , i\prime , j\prime ) \in A} \Delta _{h\prime i\prime j\prime } \frac{w_{h\prime i\prime j\prime }Y_{hij}(t_{h\prime i\prime j\prime }) \exp \left( \bbeta  \bZ _{hij}(t_{h\prime i\prime j\prime }) \right) }{S^{(0)}(\bbeta ,t_{h\prime i\prime j\prime })} \biggl \{ \bZ _{hij}(t_{h\prime i\prime j\prime }) - \bar{\bZ }(\bbeta ,t_{h\prime i\prime j\prime })\biggr \}   $

For TIES=EFRON, the computation of the score residuals is modified to comply with the Efron partial likelihood. See the section Residuals for more information.

The Taylor series estimate of the covariance matrix of $\hat\bbeta $ is

\[  \hat{\bV }(\hat\bbeta ) = \mc {I}^{-1}(\hat{\bbeta }) \mb {G} \mc {I}^{-1}(\hat{\bbeta })  \]

where $\mc {I}(\hat{\bbeta })$ is the observed information matrix and the $p \times p$ matrix $\mb {G}$ is defined as

\[  \mb {G}=\frac{n-1}{n-p} \sum _{h=1}^ H { \frac{n_ h(1-f_ h)}{n_ h-1} \sum _{i=1}^{n_ h} { (\mb {e}_{hi+}-\bar{\mb {e}}_{h\cdot \cdot })’ (\mb {e}_{hi+}-\bar{\mb {e}}_{h\cdot \cdot }) } }  \]

The observed residuals, their sums and means are defined as follows:

$\displaystyle  \mb {e}_{hij}  $
$\displaystyle = $
$\displaystyle  w_{hij} \mb {L}_{hij}(\hat\bbeta ) $
$\displaystyle  \mb {e}_{hi+} $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^{m_{hi}}\mb {e}_{hij}  $
$\displaystyle \bar{\mb {e}}_{h\cdot \cdot }  $
$\displaystyle = $
$\displaystyle  \frac1{n_ h}\sum _{i=1}^{n_ h}\mb {e}_{hi+}  $

The factor $(n-1)/(n-p)$ in the computation of the matrix $\mb {G}$ reduces the small sample bias that is associated with using the estimated function to calculate deviations (Fuller et al. (1989), pp. 77–81). For simple random sampling, this factor contributes to the degrees of freedom correction applied to the residual mean square for ordinary least squares in which p parameters are estimated. By default, the procedure uses this adjustment in the variance estimation. If you do not want to use this multiplier in the variance estimator, then specify the VADJUST=NONE option in the MODEL statement.