SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 66731: Compute directly standardized stratum-specific rates or risks and confidence intervals

DetailsAboutRate It

Direct and indirect standardization of rates or risks is available in the SAS/STAT® procedure STDRATE. The procedure displays a final table showing the overall standardized rate or risk. If the STATS option is specified in the STRATA statement, a table containing the crude stratum-specific rates or risks is produced. While the stratum-specific standardized rates or risks might also be of interest, these are not provided by PROC STDRATE. The following discusses and illustrates the computation of stratum-specific standardized rates and risks.

The directly standardized rate or risk in stratum j is

rj ^ β sj r

where ^ β sj is the crude rate or risk in stratum j of the study population, rj is the population-time (for the rate) or number of observations (for the risk) in the jth stratum of the reference population, and r = Σk rk is the population-time (for the rate) or number of observations (for the risk) in the reference population.

Note that the stratum-specific standardized rate or risk is just the crude rate or risk multiplied by the proportion of population-time or observations for that stratum in the reference population. These proportions serve as weights applied to the observed crude rates or risks in the study population. The sum of the weighted crude rates over the strata yields the overall directly standardized rate or risk provided by PROC STDRATE.

The variance of the directly standardized rate or risk in stratum j is

rj2 V ( ^ β sj ) r2

where V ( ^ β sj ) is the variance of the crude stratum-specific rate or risk. For the rate, assuming that the event counts in the study population are Poisson distributed, V ( ^ β sj ) = ^ β sj . For the risk, assuming that the event counts in the study population follow a binomial distribution, V ( ^ β sj ) = [ ^ β sj (1 - ^ β sj) ] /𝒩sj, where 𝒩sj is the number of trials in stratum j of the study population.

The directly standardized stratum-specific rates or risks can be computed in several ways. The first method uses the above formulas to compute the rates or risks along with 100(1-α)% large sample confidence limits. Alternatively, note that a saturated Poisson model can be used to reproduce the observed crude strata rates in the study population. For risks, a saturated logistic model can be used. The second method fits this model in the NLMIXED procedure and uses its PREDICT statement to compute the product of the crude strata statistics and their corresponding proportion weights from the reference population yielding the standardized strata statistics. The PREDICT statement also provides confidence limits. The third method uses the same model-based approach but fits the model with the GENMOD procedure and uses the NLMeans macro to do the final computation.

The following sections use the example titled "Comparing Directly Standardized Rates" in the STDRATE documentation to illustrate the computation of directly standardized stratum-specific rates for the Alaska population. Computation of the overall standardized rate from the stratum-specific rates is also shown. Similar code can be used to compute directly standardized stratum-specific and overall risks.

Computation from formulas

The following statements add the population-times from the reference population (US) to the Alaska study population data. The above formulas are then computed to obtain the directly standardized stratum-specific rates as well as large sample 100(1-α)% confidence limits. Since the rate per 1,000 person-years is computed in the documentation example, the same multiplier is used in this section and the following sections when computing standardized rates.

      %let alpha=.05;
      %let mult=1000;
      data Strata; 
        merge Alaska US(rename=(PYear=RefPYear)); 
        run;
      proc sql noprint;
        create table DSRj as
        select Age, Sex, 
               &mult*Death/PYear as CrudeRatej,
               &mult*(calculated CrudeRatej)/PYear as Vcrj, 
               sqrt(calculated Vcrj) as SEcrj,
               RefPYear/sum(RefPYear) as RefPropj, 
               (calculated RefPropj)*(calculated CrudeRatej) as DSRj, 
               (calculated Vcrj)*(calculated RefPropj)**2 as Vdsrj,
               sqrt(calculated Vdsrj) as SEdsrj,
               (calculated DSRj)+(calculated SEdsrj)*quantile('normal',1-(&alpha/2)) as Upper,
               (calculated DSRj)-(calculated SEdsrj)*quantile('normal',1-(&alpha/2)) as Lower
        from Strata; 
        quit;
      proc print label;
         id Age Sex;
         var CrudeRatej SEcrj DSRj SEdsrj Lower Upper;
         label CrudeRatej="Crude Rate" SEcrj="Crude Standard Error"
               DSRj="Directly Standardized Rate" SEdsrj="DSR Standard Error";
         title "Directly Standardized Strata Rates";
         run;
Directly Standardized Strata Rates

Age Sex Crude Rate Crude Standard
Error
Directly Standardized
Rate
DSR Standard
Error
Lower Upper
00-14 Male 0.456 0.07491 0.04995 0.00821 0.03386 0.06605
15-34 Male 0.726 0.08804 0.10371 0.01258 0.07906 0.12836
35-54 Male 1.897 0.13214 0.27594 0.01923 0.23826 0.31363
55-74 Male 10.501 0.54667 0.74437 0.03875 0.66843 0.82032
75+ Male 101.257 4.29424 2.19708 0.09318 2.01446 2.37971
00-14 Female 1.010 0.11440 0.10554 0.01195 0.08212 0.12897
15-34 Female 2.119 0.15751 0.29274 0.02176 0.25009 0.33539
35-54 Female 3.935 0.19798 0.58558 0.02946 0.52783 0.64333
55-74 Female 17.280 0.73350 1.39488 0.05921 1.27883 1.51093
75+ Female 62.200 2.84198 2.31947 0.10598 2.11175 2.52718

These additional statements obtain the overall directly standardized rate that matches the value provided by PROC STDRATE for the Alaska population.

      proc sql noprint;
        create table OverallDSR as
        select sum(DSRj) as DSR, sqrt(uss(SEdsrj)) as SEdsr, 
         (calculated DSR)+(calculated SEdsr)*quantile('normal',1-(&alpha/2)) as Upper,
         (calculated DSR)-(calculated SEdsr)*quantile('normal',1-(&alpha/2)) as Lower
        from DSRj;
        quit;
      proc print data=OverallDSR(obs=1) label noobs;
         var DSR SEdsr Lower Upper;
         label DSR="Rate" SEdsr="Standard Error";
         title "Overall Directly Standardized Rate";
         run;
Overall Directly Standardized Rate

Rate Standard Error Lower Upper
8.06928 0.16432 7.74723 8.39134

Computation using saturated Poisson model (NLMIXED)

The following statements compute the weights (RefProp, the population-time proportions in the US reference population). The log of the population-time is also computed in each stratum for use as an offset in the Poisson model. These are combined with the Alaska data in data set Strata. A Poisson model on the crude rates is fit in PROC NLMIXED. The model includes the LnPYear offset as well as parameters for both the Sex and Age main effects and their interaction resulting in a saturated model. For details about modeling rates, see this note. Since the model is saturated, the predicted values from the first PREDICT statement reproduce the observed crude rates. The second PREDICT statement multiplies the reference population weights by the crude rates to produce the directly standardized stratum-specific rates, their standard errors and confidence limits.

      proc sql noprint;
         create table RefProps as
         select PYear/sum(PYear) as RefProp
         from US; 
         quit;
      data Strata; 
         merge Alaska RefProps; 
         LnPYear=log(PYear);
         run;
      %let alpha=0.05;
      %let mult=1000;
      proc nlmixed data=Strata df=1e8 alpha=α
         lambda=exp( b0 + b1*(Sex="Male")+
                     b2*(Age="00-14")+b3*(Age="15-34")+b4*(Age="35-54")+b5*(Age="55-74")+
                     b6*(Age="00-14")*(Sex="Male")+b7*(Age="15-34")*(Sex="Male")+
                     b8*(Age="35-54")*(Sex="Male")+b9*(Age="55-74")*(Sex="Male")+
                     LnPYear );
         model Death ~ poisson(lambda);
         predict &mult*lambda/PYear out=CrudeRates(rename=(pred=CrudeRate stderrpred=CrudeSE));
         predict (&mult*lambda/PYear)*RefProp out=StdRates;
         run;
      data StdRates;
         merge CrudeRates StdRates;
         run;
      proc print data=StdRates label;
         id Age Sex;
         var CrudeRate CrudeSE pred stderrpred lower upper;
         label CrudeRate="Crude Rate" CrudeSE="Crude Standard Error" 
               pred="Directly Standardized Strata Rate" stderrpred="Standard Error";
         title "Directly Standardized Strata Rates";
         run;
Directly Standardized Strata Rates

Age Sex Crude Rate Crude Standard
Error
Directly Standardized
Strata Rate
Standard Error Lower Confidence
Limit
Upper Confidence
Limit
00-14 Male 0.456 0.07491 0.04995 0.00821 0.03386 0.06605
15-34 Male 0.726 0.08804 0.10371 0.01258 0.07906 0.12836
35-54 Male 1.897 0.13214 0.27594 0.01923 0.23826 0.31363
55-74 Male 10.501 0.54667 0.74437 0.03875 0.66843 0.82032
75+ Male 101.257 4.29423 2.19708 0.09318 2.01446 2.37971
00-14 Female 1.010 0.11440 0.10554 0.01195 0.08212 0.12897
15-34 Female 2.119 0.15751 0.29274 0.02176 0.25010 0.33539
35-54 Female 3.935 0.19798 0.58558 0.02946 0.52783 0.64333
55-74 Female 17.280 0.73349 1.39488 0.05921 1.27884 1.51093
75+ Female 62.200 2.84196 2.31947 0.10598 2.11175 2.52718

These additional statements combine the stratum-specific rates to compute the overall directly standardized rate as shown by PROC STDRATE. The same table with the overall rate is displayed as shown in the previous section.

      proc means data=StdRates noprint;
         var pred stderrpred;
         output out=overallDSR sum(pred)=dsr uss(stderrpred)=Vdsr;
         run;
      data overallDSR; set overallDSR;
         SEdsr=sqrt(Vdsr);
         Upper=dsr+SEdsr*quantile('normal',1-(&alpha/2));
         Lower=dsr-SEdsr*quantile('normal',1-(&alpha/2));
         run;
      proc print data=overallDSR label noobs;
         var dsr sedsr lower upper;
         label dsr="Rate" sedsr="Standard Error";
         title "Overall Directly Standardized Rate";
         run;

Computation using saturated Poisson model (GENMOD and NLMeans)

The equivalent of the method shown above using PROC NLMIXED can be done using PROC GENMOD followed by the NLMeans macro. As before, the log of the population-time, LnPYear, is computed in each stratum. The Poisson model includes LnPYear as an offset as well as parameters for both the Sex and Age main effects and their interaction resulting in a saturated model. For details about modeling rates, see this note. Since this is a saturated model, the LS-means, transformed by the ILINK option, reproduce the crude rates. The contrast coefficients on the model parameters that define the rates are produced by the E option and saved by the ODS OUTPUT statement. The fitted model is saved by the STORE statement. These are used by the NLMeans macro.

      data Alaska; set Alaska;
        LnPYear=log(PYear);
        run;
      proc genmod data=Alaska;
        class Sex(order=data) Age;
        model Death=Sex|Age / dist=poisson link=log offset=LnPYear;
        lsmeans Sex*Age / ilink e;
        ods output coef=Coeffs;
        store SatPoi;
        run;

As discussed above, the crude rates multiplied by the weights from the reference population are the standardized rates. The NLMeans macro can do this multiplication by specifying a suitable matrix in the contrasts= option of the macro. For this purpose, the matrix, L, must be defined such that the product of L and the vector of crude rates computed from the model produces the desired vector of directly standardized strata rates. So, L needs to be a diagonal matrix with the weights on the diagonal. The multiplier is included in the product to produce rates per 1,000 person-years. The following statements first compute the weights and then produce the diagonal matrix L of weights in data set RefProps. The variables containing the matrix must be named K1, K2, and so on. A variable containing labels for the standardized rates is also created from the values of Age and Sex. Since there is only a single set of LS-means to be processed, the required variable, Set, is assigned the value 1 for all observations, indicating they are all part of set 1. Finally, the NLMeans macro is called, specifying the saved model, SatPoi, the data set of coefficients from the LSMEANS statement, Coeffs, and the data set containing the L matrix, RefProps. A title is specified for the displayed table of results.

      %let alpha=0.05;
      %let mult=1000;
      proc sql noprint;
        create table RefProps as
        select Age, Sex,
               cats(Sex,Age) as Label,
               PYear/sum(PYear) as RefProp
        from US; 
        quit;
      data RefProps; set RefProps;
        Set=1;
        array k k1-k10;
        do i=1 to dim(k);
          k(i)=0; if i=_n_ then k(i)=&mult*RefProp;
        end; 
        run;
      %NLMeans(instore=SatPoi, coef=Coeffs, link=log, contrasts=RefProps,
               title=Directly Standardized Strata Rates)
Direct Standardized Strata Rates

Label Estimate Standard Error Wald Chi-Square Pr > ChiSq Alpha Lower Upper
Male00-14 0.04995 0.008212 37 <.0001 0.05 0.03386 0.06605
Male15-34 0.1037 0.01258 68 <.0001 0.05 0.07906 0.1284
Male35-54 0.2759 0.01923 206 <.0001 0.05 0.2383 0.3136
Male55-74 0.7444 0.03875 369 <.0001 0.05 0.6684 0.8203
Male75+ 2.1971 0.09318 556 <.0001 0.05 2.0145 2.3797
Female00-14 0.1055 0.01195 78 <.0001 0.05 0.08212 0.1290
Female15-34 0.2927 0.02176 181 <.0001 0.05 0.2501 0.3354
Female35-54 0.5856 0.02946 395 <.0001 0.05 0.5278 0.6433
Female55-74 1.3949 0.05921 555 <.0001 0.05 1.2788 1.5109
Female75+ 2.3195 0.1060 479 <.0001 0.05 2.1118 2.5272

These statements compute the overall standardized rate using the output data set, EST, from the NLMeans macro which contains the strata-specific rates. The same table with the overall rate is displayed as shown in the first section above.

      proc means data=Est noprint;
         var Estimate StandardError;
         output out=overallDSR sum(Estimate)=dsr uss(StandardError)=Vdsr;
         run;
      data overallDSR; set overallDSR;
         SEdsr=sqrt(Vdsr);
         Upper=dsr+SEdsr*quantile('normal',1-(&alpha/2));
         Lower=dsr-SEdsr*quantile('normal',1-(&alpha/2));
         run;
      proc print data=overallDSR label noobs;
         var dsr sedsr lower upper;
         label dsr="Directly Standardized Rate" sedsr="Standard Error";
         title "Overall Directly Standardized Rate";
         run;


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemN/AAster Data nCluster on Linux x64
DB2 Universal Database on AIX
DB2 Universal Database on Linux x64
Netezza TwinFin 32-bit SMP Hosts
Netezza TwinFin 32bit blade
Netezza TwinFin 64-bit S-Blades
Netezza TwinFin 64-bit SMP Hosts
Teradata on Linux
Cloud Foundry
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for AArch64
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
z/OS
z/OS 64-bit
IBM AS/400
OpenVMS VAX
N/A
Android Operating System
Apple Mobile Operating System
Chrome Web Browser
Macintosh
Macintosh on x64
Microsoft Windows 10
Microsoft Windows 7
Microsoft Windows 8 Enterprise 32-bit
Microsoft Windows 8 Enterprise x64
Microsoft Windows 8 Pro 32-bit
Microsoft Windows 8 Pro x64
Microsoft Windows 8 x64
Microsoft Windows Server 2008 R2
Microsoft Windows Server 2012 R2 Datacenter
Microsoft Windows Server 2012 R2 Std
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
SAS Cloud
Microsoft Windows 8.1 Enterprise 32-bit
Microsoft Windows 8.1 Enterprise x64
Microsoft Windows 8.1 Pro 32-bit
Microsoft Windows 8.1 Pro x64
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012 Datacenter
Microsoft Windows Server 2012 Std
Microsoft Windows Server 2016
Microsoft Windows Server 2019
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.