Direct and indirect standardization of rates or risks is available in the SAS/STAT® procedure STDRATE. The procedure displays a final table showing the overall standardized rate or risk. If the STATS option is specified in the STRATA statement, a table containing the crude stratum-specific rates or risks is produced. While the stratum-specific standardized rates or risks might also be of interest, these are not provided by PROC STDRATE. The following discusses and illustrates the computation of stratum-specific standardized rates and risks.
The directly standardized rate or risk in stratum j is
ℛrj ^ β sj ℛr
where ^ β sj is the crude rate or risk in stratum j of the study population, ℛrj is the population-time (for the rate) or number of observations (for the risk) in the jth stratum of the reference population, and ℛr = Σk ℛrk is the population-time (for the rate) or number of observations (for the risk) in the reference population.
Note that the stratum-specific standardized rate or risk is just the crude rate or risk multiplied by the proportion of population-time or observations for that stratum in the reference population. These proportions serve as weights applied to the observed crude rates or risks in the study population. The sum of the weighted crude rates over the strata yields the overall directly standardized rate or risk provided by PROC STDRATE.
The variance of the directly standardized rate or risk in stratum j is
ℛrj2 V ( ^ β sj ) ℛr2
where V ( ^ β sj ) is the variance of the crude stratum-specific rate or risk. For the rate, assuming that the event counts in the study population are Poisson distributed, V ( ^ β sj ) = ^ β sj . For the risk, assuming that the event counts in the study population follow a binomial distribution, V ( ^ β sj ) = [ ^ β sj (1 - ^ β sj) ] /𝒩sj, where 𝒩sj is the number of trials in stratum j of the study population.
The directly standardized stratum-specific rates or risks can be computed in several ways. The first method uses the above formulas to compute the rates or risks along with 100(1-α)% large sample confidence limits. Alternatively, note that a saturated Poisson model can be used to reproduce the observed crude strata rates in the study population. For risks, a saturated logistic model can be used. The second method fits this model in the NLMIXED procedure and uses its PREDICT statement to compute the product of the crude strata statistics and their corresponding proportion weights from the reference population yielding the standardized strata statistics. The PREDICT statement also provides confidence limits. The third method uses the same model-based approach but fits the model with the GENMOD procedure and uses the NLMeans macro to do the final computation.
The following sections use the example titled "Comparing Directly Standardized Rates" in the STDRATE documentation to illustrate the computation of directly standardized stratum-specific rates for the Alaska population. Computation of the overall standardized rate from the stratum-specific rates is also shown. Similar code can be used to compute directly standardized stratum-specific and overall risks.
The following statements add the population-times from the reference population (US) to the Alaska study population data. The above formulas are then computed to obtain the directly standardized stratum-specific rates as well as large sample 100(1-α)% confidence limits. Since the rate per 1,000 person-years is computed in the documentation example, the same multiplier is used in this section and the following sections when computing standardized rates.
%let alpha=.05; %let mult=1000; data Strata; merge Alaska US(rename=(PYear=RefPYear)); run; proc sql noprint; create table DSRj as select Age, Sex, &mult*Death/PYear as CrudeRatej, &mult*(calculated CrudeRatej)/PYear as Vcrj, sqrt(calculated Vcrj) as SEcrj, RefPYear/sum(RefPYear) as RefPropj, (calculated RefPropj)*(calculated CrudeRatej) as DSRj, (calculated Vcrj)*(calculated RefPropj)**2 as Vdsrj, sqrt(calculated Vdsrj) as SEdsrj, (calculated DSRj)+(calculated SEdsrj)*quantile('normal',1-(&alpha/2)) as Upper, (calculated DSRj)-(calculated SEdsrj)*quantile('normal',1-(&alpha/2)) as Lower from Strata; quit; proc print label; id Age Sex; var CrudeRatej SEcrj DSRj SEdsrj Lower Upper; label CrudeRatej="Crude Rate" SEcrj="Crude Standard Error" DSRj="Directly Standardized Rate" SEdsrj="DSR Standard Error"; title "Directly Standardized Strata Rates"; run;
|
These additional statements obtain the overall directly standardized rate that matches the value provided by PROC STDRATE for the Alaska population.
proc sql noprint; create table OverallDSR as select sum(DSRj) as DSR, sqrt(uss(SEdsrj)) as SEdsr, (calculated DSR)+(calculated SEdsr)*quantile('normal',1-(&alpha/2)) as Upper, (calculated DSR)-(calculated SEdsr)*quantile('normal',1-(&alpha/2)) as Lower from DSRj; quit; proc print data=OverallDSR(obs=1) label noobs; var DSR SEdsr Lower Upper; label DSR="Rate" SEdsr="Standard Error"; title "Overall Directly Standardized Rate"; run;
|
The following statements compute the weights (RefProp, the population-time proportions in the US reference population). The log of the population-time is also computed in each stratum for use as an offset in the Poisson model. These are combined with the Alaska data in data set Strata. A Poisson model on the crude rates is fit in PROC NLMIXED. The model includes the LnPYear offset as well as parameters for both the Sex and Age main effects and their interaction resulting in a saturated model. For details about modeling rates, see this note. Since the model is saturated, the predicted values from the first PREDICT statement reproduce the observed crude rates. The second PREDICT statement multiplies the reference population weights by the crude rates to produce the directly standardized stratum-specific rates, their standard errors and confidence limits.
proc sql noprint; create table RefProps as select PYear/sum(PYear) as RefProp from US; quit; data Strata; merge Alaska RefProps; LnPYear=log(PYear); run; %let alpha=0.05; %let mult=1000; proc nlmixed data=Strata df=1e8 alpha=α lambda=exp( b0 + b1*(Sex="Male")+ b2*(Age="00-14")+b3*(Age="15-34")+b4*(Age="35-54")+b5*(Age="55-74")+ b6*(Age="00-14")*(Sex="Male")+b7*(Age="15-34")*(Sex="Male")+ b8*(Age="35-54")*(Sex="Male")+b9*(Age="55-74")*(Sex="Male")+ LnPYear ); model Death ~ poisson(lambda); predict &mult*lambda/PYear out=CrudeRates(rename=(pred=CrudeRate stderrpred=CrudeSE)); predict (&mult*lambda/PYear)*RefProp out=StdRates; run; data StdRates; merge CrudeRates StdRates; run; proc print data=StdRates label; id Age Sex; var CrudeRate CrudeSE pred stderrpred lower upper; label CrudeRate="Crude Rate" CrudeSE="Crude Standard Error" pred="Directly Standardized Strata Rate" stderrpred="Standard Error"; title "Directly Standardized Strata Rates"; run;
|
These additional statements combine the stratum-specific rates to compute the overall directly standardized rate as shown by PROC STDRATE. The same table with the overall rate is displayed as shown in the previous section.
proc means data=StdRates noprint; var pred stderrpred; output out=overallDSR sum(pred)=dsr uss(stderrpred)=Vdsr; run; data overallDSR; set overallDSR; SEdsr=sqrt(Vdsr); Upper=dsr+SEdsr*quantile('normal',1-(&alpha/2)); Lower=dsr-SEdsr*quantile('normal',1-(&alpha/2)); run; proc print data=overallDSR label noobs; var dsr sedsr lower upper; label dsr="Rate" sedsr="Standard Error"; title "Overall Directly Standardized Rate"; run;
The equivalent of the method shown above using PROC NLMIXED can be done using PROC GENMOD followed by the NLMeans macro. As before, the log of the population-time, LnPYear, is computed in each stratum. The Poisson model includes LnPYear as an offset as well as parameters for both the Sex and Age main effects and their interaction resulting in a saturated model. For details about modeling rates, see this note. Since this is a saturated model, the LS-means, transformed by the ILINK option, reproduce the crude rates. The contrast coefficients on the model parameters that define the rates are produced by the E option and saved by the ODS OUTPUT statement. The fitted model is saved by the STORE statement. These are used by the NLMeans macro.
data Alaska; set Alaska; LnPYear=log(PYear); run; proc genmod data=Alaska; class Sex(order=data) Age; model Death=Sex|Age / dist=poisson link=log offset=LnPYear; lsmeans Sex*Age / ilink e; ods output coef=Coeffs; store SatPoi; run;
As discussed above, the crude rates multiplied by the weights from the reference population are the standardized rates. The NLMeans macro can do this multiplication by specifying a suitable matrix in the contrasts= option of the macro. For this purpose, the matrix, L, must be defined such that the product of L and the vector of crude rates computed from the model produces the desired vector of directly standardized strata rates. So, L needs to be a diagonal matrix with the weights on the diagonal. The multiplier is included in the product to produce rates per 1,000 person-years. The following statements first compute the weights and then produce the diagonal matrix L of weights in data set RefProps. The variables containing the matrix must be named K1, K2, and so on. A variable containing labels for the standardized rates is also created from the values of Age and Sex. Since there is only a single set of LS-means to be processed, the required variable, Set, is assigned the value 1 for all observations, indicating they are all part of set 1. Finally, the NLMeans macro is called, specifying the saved model, SatPoi, the data set of coefficients from the LSMEANS statement, Coeffs, and the data set containing the L matrix, RefProps. A title is specified for the displayed table of results.
%let alpha=0.05; %let mult=1000; proc sql noprint; create table RefProps as select Age, Sex, cats(Sex,Age) as Label, PYear/sum(PYear) as RefProp from US; quit; data RefProps; set RefProps; Set=1; array k k1-k10; do i=1 to dim(k); k(i)=0; if i=_n_ then k(i)=&mult*RefProp; end; run; %NLMeans(instore=SatPoi, coef=Coeffs, link=log, contrasts=RefProps, title=Directly Standardized Strata Rates)
|
These statements compute the overall standardized rate using the output data set, EST, from the NLMeans macro which contains the strata-specific rates. The same table with the overall rate is displayed as shown in the first section above.
proc means data=Est noprint; var Estimate StandardError; output out=overallDSR sum(Estimate)=dsr uss(StandardError)=Vdsr; run; data overallDSR; set overallDSR; SEdsr=sqrt(Vdsr); Upper=dsr+SEdsr*quantile('normal',1-(&alpha/2)); Lower=dsr-SEdsr*quantile('normal',1-(&alpha/2)); run; proc print data=overallDSR label noobs; var dsr sedsr lower upper; label dsr="Directly Standardized Rate" sedsr="Standard Error"; title "Overall Directly Standardized Rate"; run;
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | N/A | Aster Data nCluster on Linux x64 | ||
DB2 Universal Database on AIX | ||||
DB2 Universal Database on Linux x64 | ||||
Netezza TwinFin 32-bit SMP Hosts | ||||
Netezza TwinFin 32bit blade | ||||
Netezza TwinFin 64-bit S-Blades | ||||
Netezza TwinFin 64-bit SMP Hosts | ||||
Teradata on Linux | ||||
Cloud Foundry | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for AArch64 | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX | ||||
z/OS | ||||
z/OS 64-bit | ||||
IBM AS/400 | ||||
OpenVMS VAX | ||||
N/A | ||||
Android Operating System | ||||
Apple Mobile Operating System | ||||
Chrome Web Browser | ||||
Macintosh | ||||
Macintosh on x64 | ||||
Microsoft Windows 10 | ||||
Microsoft Windows 7 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8 x64 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
SAS Cloud | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Pro x64 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows Server 2016 | ||||
Microsoft Windows Server 2019 | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Procedures ==> STDRATE SAS Reference ==> Macro |
Date Modified: | 2020-10-08 16:40:45 |
Date Created: | 2020-10-07 15:03:29 |