66731 - Compute directly standardized stratum-specific rates or risks and confidence intervals

SUPPORT / SAMPLES & SAS NOTES

Support

Usage Note 66731: Compute directly standardized stratum-specific rates or risks and confidence intervals

Direct and indirect standardization of rates or risks is available in the SAS/STAT^® procedure STDRATE. The procedure displays a final table showing the overall standardized rate or risk. If the STATS option is specified in the STRATA statement, a table containing the crude stratum-specific rates or risks is produced. While the stratum-specific standardized rates or risks might also be of interest, these are not provided by PROC STDRATE. The following discusses and illustrates the computation of stratum-specific standardized rates and risks.

The directly standardized rate or risk in stratum j is

ℛ_rj ^ β _sj ℛ_r

where ^ β _sj is the crude rate or risk in stratum j of the study population, ℛ_rj is the population-time (for the rate) or number of observations (for the risk) in the jth stratum of the reference population, and ℛ_r = Σ_k ℛ_rk is the population-time (for the rate) or number of observations (for the risk) in the reference population.

Note that the stratum-specific standardized rate or risk is just the crude rate or risk multiplied by the proportion of population-time or observations for that stratum in the reference population. These proportions serve as weights applied to the observed crude rates or risks in the study population. The sum of the weighted crude rates over the strata yields the overall directly standardized rate or risk provided by PROC STDRATE.

The variance of the directly standardized rate or risk in stratum j is

ℛ_rj² V ( ^ β _sj ) ℛ_r²

where V ( ^ β _sj ) is the variance of the crude stratum-specific rate or risk. For the rate, assuming that the event counts in the study population are Poisson distributed, V ( ^ β _sj ) = ^ β _sj . For the risk, assuming that the event counts in the study population follow a binomial distribution, V ( ^ β _sj ) = [ ^ β _sj (1 - ^ β _sj) ] /𝒩_sj, where 𝒩_sj is the number of trials in stratum j of the study population.

The directly standardized stratum-specific rates or risks can be computed in several ways. The first method uses the above formulas to compute the rates or risks along with 100(1-α)% large sample confidence limits. Alternatively, note that a saturated Poisson model can be used to reproduce the observed crude strata rates in the study population. For risks, a saturated logistic model can be used. The second method fits this model in the NLMIXED procedure and uses its PREDICT statement to compute the product of the crude strata statistics and their corresponding proportion weights from the reference population yielding the standardized strata statistics. The PREDICT statement also provides confidence limits. The third method uses the same model-based approach but fits the model with the GENMOD procedure and uses the NLMeans macro to do the final computation.

The following sections use the example titled "Comparing Directly Standardized Rates" in the STDRATE documentation to illustrate the computation of directly standardized stratum-specific rates for the Alaska population. Computation of the overall standardized rate from the stratum-specific rates is also shown. Similar code can be used to compute directly standardized stratum-specific and overall risks.

Computation from formulas

The following statements add the population-times from the reference population (US) to the Alaska study population data. The above formulas are then computed to obtain the directly standardized stratum-specific rates as well as large sample 100(1-α)% confidence limits. Since the rate per 1,000 person-years is computed in the documentation example, the same multiplier is used in this section and the following sections when computing standardized rates.

      %let alpha=.05;
      %let mult=1000;
      data Strata; 
        merge Alaska US(rename=(PYear=RefPYear)); 
        run;
      proc sql noprint;
        create table DSRj as
        select Age, Sex, 
               &mult*Death/PYear as CrudeRatej,
               &mult*(calculated CrudeRatej)/PYear as Vcrj, 
               sqrt(calculated Vcrj) as SEcrj,
               RefPYear/sum(RefPYear) as RefPropj, 
               (calculated RefPropj)*(calculated CrudeRatej) as DSRj, 
               (calculated Vcrj)*(calculated RefPropj)**2 as Vdsrj,
               sqrt(calculated Vdsrj) as SEdsrj,
               (calculated DSRj)+(calculated SEdsrj)*quantile('normal',1-(&alpha/2)) as Upper,
               (calculated DSRj)-(calculated SEdsrj)*quantile('normal',1-(&alpha/2)) as Lower
        from Strata; 
        quit;
      proc print label;
         id Age Sex;
         var CrudeRatej SEcrj DSRj SEdsrj Lower Upper;
         label CrudeRatej="Crude Rate" SEcrj="Crude Standard Error"
               DSRj="Directly Standardized Rate" SEdsrj="DSR Standard Error";
         title "Directly Standardized Strata Rates";
         run;

Directly Standardized Strata Rates

Age	Sex	Crude Rate	Crude Standard Error	Directly Standardized Rate	DSR Standard Error	Lower	Upper
00-14	Male	0.456	0.07491	0.04995	0.00821	0.03386	0.06605
15-34	Male	0.726	0.08804	0.10371	0.01258	0.07906	0.12836
35-54	Male	1.897	0.13214	0.27594	0.01923	0.23826	0.31363
55-74	Male	10.501	0.54667	0.74437	0.03875	0.66843	0.82032
75+	Male	101.257	4.29424	2.19708	0.09318	2.01446	2.37971
00-14	Female	1.010	0.11440	0.10554	0.01195	0.08212	0.12897
15-34	Female	2.119	0.15751	0.29274	0.02176	0.25009	0.33539
35-54	Female	3.935	0.19798	0.58558	0.02946	0.52783	0.64333
55-74	Female	17.280	0.73350	1.39488	0.05921	1.27883	1.51093
75+	Female	62.200	2.84198	2.31947	0.10598	2.11175	2.52718

These additional statements obtain the overall directly standardized rate that matches the value provided by PROC STDRATE for the Alaska population.

      proc sql noprint;
        create table OverallDSR as
        select sum(DSRj) as DSR, sqrt(uss(SEdsrj)) as SEdsr, 
         (calculated DSR)+(calculated SEdsr)*quantile('normal',1-(&alpha/2)) as Upper,
         (calculated DSR)-(calculated SEdsr)*quantile('normal',1-(&alpha/2)) as Lower
        from DSRj;
        quit;
      proc print data=OverallDSR(obs=1) label noobs;
         var DSR SEdsr Lower Upper;
         label DSR="Rate" SEdsr="Standard Error";
         title "Overall Directly Standardized Rate";
         run;

Overall Directly Standardized Rate

Rate	Standard Error	Lower	Upper
8.06928	0.16432	7.74723	8.39134

Computation using saturated Poisson model (NLMIXED)

The following statements compute the weights (RefProp, the population-time proportions in the US reference population). The log of the population-time is also computed in each stratum for use as an offset in the Poisson model. These are combined with the Alaska data in data set Strata. A Poisson model on the crude rates is fit in PROC NLMIXED. The model includes the LnPYear offset as well as parameters for both the Sex and Age main effects and their interaction resulting in a saturated model. For details about modeling rates, see this note. Since the model is saturated, the predicted values from the first PREDICT statement reproduce the observed crude rates. The second PREDICT statement multiplies the reference population weights by the crude rates to produce the directly standardized stratum-specific rates, their standard errors and confidence limits.

      proc sql noprint;
         create table RefProps as
         select PYear/sum(PYear) as RefProp
         from US; 
         quit;
      data Strata; 
         merge Alaska RefProps; 
         LnPYear=log(PYear);
         run;
      %let alpha=0.05;
      %let mult=1000;
      proc nlmixed data=Strata df=1e8 alpha=&alpha;
         lambda=exp( b0 + b1*(Sex="Male")+
                     b2*(Age="00-14")+b3*(Age="15-34")+b4*(Age="35-54")+b5*(Age="55-74")+
                     b6*(Age="00-14")*(Sex="Male")+b7*(Age="15-34")*(Sex="Male")+
                     b8*(Age="35-54")*(Sex="Male")+b9*(Age="55-74")*(Sex="Male")+
                     LnPYear );
         model Death ~ poisson(lambda);
         predict &mult*lambda/PYear out=CrudeRates(rename=(pred=CrudeRate stderrpred=CrudeSE));
         predict (&mult*lambda/PYear)*RefProp out=StdRates;
         run;
      data StdRates;
         merge CrudeRates StdRates;
         run;
      proc print data=StdRates label;
         id Age Sex;
         var CrudeRate CrudeSE pred stderrpred lower upper;
         label CrudeRate="Crude Rate" CrudeSE="Crude Standard Error" 
               pred="Directly Standardized Strata Rate" stderrpred="Standard Error";
         title "Directly Standardized Strata Rates";
         run;

Directly Standardized Strata Rates

Age	Sex	Crude Rate	Crude Standard Error	Directly Standardized Strata Rate	Standard Error	Lower Confidence Limit	Upper Confidence Limit
00-14	Male	0.456	0.07491	0.04995	0.00821	0.03386	0.06605
15-34	Male	0.726	0.08804	0.10371	0.01258	0.07906	0.12836
35-54	Male	1.897	0.13214	0.27594	0.01923	0.23826	0.31363
55-74	Male	10.501	0.54667	0.74437	0.03875	0.66843	0.82032
75+	Male	101.257	4.29423	2.19708	0.09318	2.01446	2.37971
00-14	Female	1.010	0.11440	0.10554	0.01195	0.08212	0.12897
15-34	Female	2.119	0.15751	0.29274	0.02176	0.25010	0.33539
35-54	Female	3.935	0.19798	0.58558	0.02946	0.52783	0.64333
55-74	Female	17.280	0.73349	1.39488	0.05921	1.27884	1.51093
75+	Female	62.200	2.84196	2.31947	0.10598	2.11175	2.52718

These additional statements combine the stratum-specific rates to compute the overall directly standardized rate as shown by PROC STDRATE. The same table with the overall rate is displayed as shown in the previous section.

      proc means data=StdRates noprint;
         var pred stderrpred;
         output out=overallDSR sum(pred)=dsr uss(stderrpred)=Vdsr;
         run;
      data overallDSR; set overallDSR;
         SEdsr=sqrt(Vdsr);
         Upper=dsr+SEdsr*quantile('normal',1-(&alpha/2));
         Lower=dsr-SEdsr*quantile('normal',1-(&alpha/2));
         run;
      proc print data=overallDSR label noobs;
         var dsr sedsr lower upper;
         label dsr="Rate" sedsr="Standard Error";
         title "Overall Directly Standardized Rate";
         run;

Computation using saturated Poisson model (GENMOD and NLMeans)

The equivalent of the method shown above using PROC NLMIXED can be done using PROC GENMOD followed by the NLMeans macro. As before, the log of the population-time, LnPYear, is computed in each stratum. The Poisson model includes LnPYear as an offset as well as parameters for both the Sex and Age main effects and their interaction resulting in a saturated model. For details about modeling rates, see this note. Since this is a saturated model, the LS-means, transformed by the ILINK option, reproduce the crude rates. The contrast coefficients on the model parameters that define the rates are produced by the E option and saved by the ODS OUTPUT statement. The fitted model is saved by the STORE statement. These are used by the NLMeans macro.

      data Alaska; set Alaska;
        LnPYear=log(PYear);
        run;
      proc genmod data=Alaska;
        class Sex(order=data) Age;
        model Death=Sex|Age / dist=poisson link=log offset=LnPYear;
        lsmeans Sex*Age / ilink e;
        ods output coef=Coeffs;
        store SatPoi;
        run;

As discussed above, the crude rates multiplied by the weights from the reference population are the standardized rates. The NLMeans macro can do this multiplication by specifying a suitable matrix in the contrasts= option of the macro. For this purpose, the matrix, L, must be defined such that the product of L and the vector of crude rates computed from the model produces the desired vector of directly standardized strata rates. So, L needs to be a diagonal matrix with the weights on the diagonal. The multiplier is included in the product to produce rates per 1,000 person-years. The following statements first compute the weights and then produce the diagonal matrix L of weights in data set RefProps. The variables containing the matrix must be named K1, K2, and so on. A variable containing labels for the standardized rates is also created from the values of Age and Sex. Since there is only a single set of LS-means to be processed, the required variable, Set, is assigned the value 1 for all observations, indicating they are all part of set 1. Finally, the NLMeans macro is called, specifying the saved model, SatPoi, the data set of coefficients from the LSMEANS statement, Coeffs, and the data set containing the L matrix, RefProps. A title is specified for the displayed table of results.

      %let alpha=0.05;
      %let mult=1000;
      proc sql noprint;
        create table RefProps as
        select Age, Sex,
               cats(Sex,Age) as Label,
               PYear/sum(PYear) as RefProp
        from US; 
        quit;
      data RefProps; set RefProps;
        Set=1;
        array k k1-k10;
        do i=1 to dim(k);
          k(i)=0; if i=_n_ then k(i)=&mult*RefProp;
        end; 
        run;
      %NLMeans(instore=SatPoi, coef=Coeffs, link=log, contrasts=RefProps,
               title=Directly Standardized Strata Rates)

Direct Standardized Strata Rates

Label	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq	Alpha	Lower	Upper
Male00-14	0.04995	0.008212	37	<.0001	0.05	0.03386	0.06605
Male15-34	0.1037	0.01258	68	<.0001	0.05	0.07906	0.1284
Male35-54	0.2759	0.01923	206	<.0001	0.05	0.2383	0.3136
Male55-74	0.7444	0.03875	369	<.0001	0.05	0.6684	0.8203
Male75+	2.1971	0.09318	556	<.0001	0.05	2.0145	2.3797
Female00-14	0.1055	0.01195	78	<.0001	0.05	0.08212	0.1290
Female15-34	0.2927	0.02176	181	<.0001	0.05	0.2501	0.3354
Female35-54	0.5856	0.02946	395	<.0001	0.05	0.5278	0.6433
Female55-74	1.3949	0.05921	555	<.0001	0.05	1.2788	1.5109
Female75+	2.3195	0.1060	479	<.0001	0.05	2.1118	2.5272

These statements compute the overall standardized rate using the output data set, EST, from the NLMeans macro which contains the strata-specific rates. The same table with the overall rate is displayed as shown in the first section above.

      proc means data=Est noprint;
         var Estimate StandardError;
         output out=overallDSR sum(Estimate)=dsr uss(StandardError)=Vdsr;
         run;
      data overallDSR; set overallDSR;
         SEdsr=sqrt(Vdsr);
         Upper=dsr+SEdsr*quantile('normal',1-(&alpha/2));
         Lower=dsr-SEdsr*quantile('normal',1-(&alpha/2));
         run;
      proc print data=overallDSR label noobs;
         var dsr sedsr lower upper;
         label dsr="Directly Standardized Rate" sedsr="Standard Error";
         title "Overall Directly Standardized Rate";
         run;

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	N/A	Aster Data nCluster on Linux x64
		DB2 Universal Database on AIX
		DB2 Universal Database on Linux x64
		Netezza TwinFin 32-bit SMP Hosts
		Netezza TwinFin 32bit blade
		Netezza TwinFin 64-bit S-Blades
		Netezza TwinFin 64-bit SMP Hosts
		Teradata on Linux
		Cloud Foundry
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for AArch64
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX
		z/OS
		z/OS 64-bit
		IBM AS/400
		OpenVMS VAX
		N/A
		Android Operating System
		Apple Mobile Operating System
		Chrome Web Browser
		Macintosh
		Macintosh on x64
		Microsoft Windows 10
		Microsoft Windows 7
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8 x64
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		SAS Cloud
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 8.1 Pro x64
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012 Datacenter
Microsoft Windows Server 2012 Std
Microsoft Windows Server 2016
Microsoft Windows Server 2019
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Procedures ==> STDRATE SAS Reference ==> Macro

Date Modified:	2020-10-08 16:40:45
Date Created:	2020-10-07 15:03:29