The STDRATE Procedure

Direct Standardization

Subsections:

Normal Distribution Confidence Intervals for Standardized Rate and Risk
Lognormal Distribution Confidence Intervals for Standardized Rate and Risk
Gamma Distribution Confidence Interval for Standardized Rate
Comparing Standardized Rates and Comparing Standardized Risks

Direct standardization uses the weights from a reference population to compute the standardized rate of a study group as the weighted average of stratum-specific rates in the study population. The standardized rate is computed as

${\hat\lambda }_{ds} = \frac{ \sum _{j} {\mathcal T}_{rj} \, {\hat\lambda }_{sj} }{ {\mathcal T}_{r} }$

where ${\hat\lambda }_{sj}$ is the rate in the jth stratum of the study population, ${\mathcal T}_{rj}$ is the population-time in the jth stratum of the reference population, and ${\mathcal T}_{r}= \sum _{k} {\mathcal T}_{rk}$ is the population-time in the reference population.

Similarly, direct standardization uses the weights from a reference population to compute the standardized risk of a study group as the weighted average of stratum-specific risks in the study population. The standardized risk is computed as

${\hat\gamma }_{ds} = \frac{ \sum _{j} {\mathcal N}_{rj} \, {\hat\gamma }_{sj} }{ {\mathcal N}_{r} }$

where ${\hat\gamma }_{sj}$ is the risk in the jth stratum of the study population, ${\mathcal N}_{rj}$ is the number of observations in the jth stratum of the reference population, and ${\mathcal N}_{r}= \sum _{k} {\mathcal N}_{rk}$ is the total number of observations in the reference population.

That is, the directly standardized rate and risk of a study population are weighted averages of the stratum-specific rates and risks, respectively, where the weights are the corresponding strata population sizes in the reference population. The direct standardization can be used when the study population is large enough to provide stable stratum-specific rates or risks. When the same reference population is used for multiple study populations, directly standardized rates and risks provide valid comparisons between study populations.

The variances of the directly standardized rate and risk are

$V({\hat\lambda }_{ds}) = V \left( \frac{ \sum _{j} {\mathcal T}_{rj} \, {\hat\lambda }_{sj} }{ {\mathcal T}_{r} } \right) = \frac{ \sum _{j} {\mathcal T}_{rj}^{2} \, V({\hat\lambda }_{sj}) }{ {\mathcal T}_{r}^{2} }$

$V({\hat\gamma }_{ds}) = V \left( \frac{ \sum _{j} {\mathcal N}_{rj} \, {\hat\gamma }_{sj} }{ {\mathcal N}_{r} } \right) = \frac{ \sum _{j} {\mathcal N}_{rj}^{2} \, V({\hat\gamma }_{sj}) }{ {\mathcal N}_{r}^{2} }$

By using the method of statistical differentials (Elandt-Johnson and Johnson 1980, pp. 70–71), the variance of the logarithm of directly standardized rate and risk can be estimated by

$V( \mbox{log}( {\hat\lambda }_{ds} ) ) = \frac{1}{ {\hat\lambda }_{ds}^{2} } \, V({\hat\lambda }_{ds})$

$V( \mbox{log}( {\hat\gamma }_{ds} ) ) = \frac{1}{ {\hat\gamma }_{ds}^{2} } \, V({\hat\gamma }_{ds})$

The confidence intervals for ${\hat\lambda }_{ds}$ and ${\hat\gamma }_{ds}$ can be constructed based on normal and lognormal distributions. A gamma distribution confidence interval can also be constructed for ${\hat\lambda }_{ds}$ .

In the next four subsections, $\beta =\lambda$ denotes the rate statistic and $\beta =\gamma$ denotes the risk statistic.

Normal Distribution Confidence Intervals for Standardized Rate and Risk

A $(1-\alpha )$ confidence interval for ${\hat\beta }_{ds}$ based on a normal distribution is then given by

$\left( \; {\hat\beta }_{ds} - z \, \sqrt {V( {\hat\beta }_{ds} )} \, , \; \; {\hat\beta }_{ds} + z \, \sqrt {V( {\hat\beta }_{ds} )} \; \right)$

where $z = \Phi ^{-1} (1-\alpha /2)$ is the $(1-\alpha /2)$ quantile of the standard normal distribution.

Lognormal Distribution Confidence Intervals for Standardized Rate and Risk

A $(1-\alpha )$ confidence interval for $\mbox{log}( {\hat\beta }_{ds} )$ based on a normal distribution is given by

$\left( \; \mbox{log}({\hat\beta }_{ds}) - z \, \sqrt {V( \mbox{log}({\hat\beta }_{ds}) )} \, , \; \; \mbox{log}({\hat\beta }_{ds}) + z \, \sqrt {V( \mbox{log}({\hat\beta }_{ds}) )} \; \right)$

where $z = \Phi ^{-1} (1-\alpha /2)$ is the $(1-\alpha /2)$ quantile of the standard normal distribution.

Thus, a $(1-\alpha )$ confidence interval for ${\hat\beta }_{ds}$ based on a lognormal distribution is given by

$\left( \; {\hat\beta }_{ds} \; e^{ -z {\sqrt { V( \mbox{log}({\hat\beta }_{ds}) ) }}} \, , \; \; {\hat\beta }_{ds} \; e^{ z {\sqrt { V( \mbox{log}({\hat\beta }_{ds}) ) }}} \; \right)$

Gamma Distribution Confidence Interval for Standardized Rate

Fay and Feuer (1997) use the relationship between the Poisson and gamma distributions to derive approximate confidence intervals for the standardized rate ${\hat\lambda }_{ds}$ based on the gamma distribution. As in the construction of the asymptotic normal confidence intervals, it is assumed that the number of events has a Poisson distribution, and the standardized rate is a weighted sum of independent Poisson random variables. A confidence interval for ${\hat\lambda }_{ds}$ is then given by

$\left( \; \frac{v}{2 {\hat\lambda }_{ds}} \; (\chi ^{2})^{-1}_{\frac{2 {\hat\lambda }_{ds}^{2}}{v}} \left( \frac{\alpha }{2} \right) \, , \; \; \frac{v+w^{2}_ x}{2 ({\hat\lambda }_{ds}+w_ x)} \; (\chi ^{2})^{-1}_{\frac{2 ({\hat\lambda }_{ds}+w_ x)^{2}}{v+w^{2}_ x}} \left( 1 - \frac{\alpha }{2} \right) \; \right)$

where

$v = \sum _{j} \, w_ j^{2} \, \frac{{\hat\lambda }_{sj}}{{\mathcal T}_{sj}}$

$w_ j = \frac{{\mathcal T}_{rj}}{{\mathcal T}_{r}} \; \frac{1}{{\mathcal T}_{sj}}$

and $w_ x$ is the maximum $w_ j$ .

Tiwari, Clegg, and Zou (2006) propose a less conservative confidence interval for ${\hat\lambda }_{ds}$ with a different upper confidence limit,

$\left( \; \frac{v}{2 {\hat\lambda }_{ds}} \; (\chi ^{2})^{-1}_{\frac{2 {\hat\lambda }_{ds}^{2}}{v}} \left( \frac{\alpha }{2} \right) \, , \; \; \frac{v+w_{2m}}{2 ({\hat\lambda }_{ds}+w_ m)} \; (\chi ^{2})^{-1}_{\frac{2 ({\hat\lambda }_{ds}+w_ m)^{2}}{v+w_{2m}}} \left( 1 - \frac{\alpha }{2} \right) \; \right)$

where $w_{m}$ is the average $w_ j$ and $w_{2m}$ is the average $w_ j^{2}$ .

Comparing Standardized Rates and Comparing Standardized Risks

By using the same reference population, two directly standardized rates or risks from different populations can be compared. Both the difference and ratio statistics can be used in the comparison. Assume that ${\hat\beta }_{1}$ and ${\hat\beta }_{2}$ are directly standardized rates or risks for two populations with variances $V({\hat\beta }_{1})$ and $V({\hat\beta }_{2})$ , respectively. The difference test assumes that the difference statistic

${\hat\beta }_{1} - {\hat\beta }_{2}$

has a normal distribution with mean 0 under the null hypothesis $H_0: {\beta }_{1} = {\beta }_{2}$ . The variance is given by

$V( {\hat\beta }_{1} - {\hat\beta }_{2} ) = V( {\hat\beta }_{1} ) + V( {\hat\beta }_{2} )$

The ratio test assumes that the log ratio statistic,

$\mbox{log} \left( \frac{ {\hat\beta }_{1} }{ {\hat\beta }_{2} } \right)$

has a normal distribution with mean 0 under the null hypothesis $H_0: {\beta }_{1} = {\beta }_{2}$ , or equivalently, $\mbox{log} ({\beta }_{1} / {\beta }_{2}) = 0$ . An estimated variance is given by

$V \left( \mbox{log} \left( \frac{{\hat\beta }_{1}}{{\hat\beta }_{2}} \right) \right) = V( \mbox{log}({\hat\beta }_{1}) ) + V( \mbox{log}({\hat\beta }_{2}) ) = \frac{1}{ {\hat\beta }_{1}^{2} } \; V( {\hat\beta }_{1} ) \; + \; \frac{1}{ {\hat\beta }_{2}^{2} } \; V( {\hat\beta }_{2} )$