Contents: | Purpose / History / Requirements / Usage / Details / See Also / References |
Version | Update Notes |
1.3 (01APR02) | Maintained compatibility with both SAS 8 and SAS 9 variance formula control for different strata (requested by J. Potterat) |
1.2 (01JAN01) | Cosmetic improvements. |
1.1 (01SEP00) | Initial coding. |
%inc "<location of your file containing the SREGSUB macro>";
Following this statement, you may call the %SREGSUB macro. See the Results tab for an example.
The following parameters are available in the %SREGSUB macro.
%SREGSUB( DATA= sas-data-set, ALPHA= alpha-value, STRATA= variable, CLUSTER= variable, POPSIZE= variable, WEIGHT= variable, CLASS= variables, MODEL= dependent= independents < / options >, CONTRAST= 'label' effect values < ... & effect values > < /options >, ESTIMATE= 'label' effect values < ... effect values > < /options >, SUBPOP= variable operator number, OUTPUT= ODSTableName=sas-data-set < ... ODSTableName=sas-data-set>, VARMULT= < FULLER >, COLWIDTH= number, FMTCELL= format, TITLE= 'text' )
Most of the macro parameters correspond to the statements and options used in PROC SURVEYREG. The required parameters are DATA=, WEIGHT=, and MODEL=. The POPSIZE= parameter serves as a replacement for the TOTAL= option used in the procedure. The SUBPOP= parameter can be used to restrict the regression analysis to a specified subpopulation.
Required Macro Parameters
DATA= sas-data-set specifies the data set to be analyzed. WEIGHT= variable specifies the analysis weight. MODEL= dependent= independents < / options > specifies the dependent variable and the independent effects. All the variables used in the model must be numeric. The model options are: NOINT omits the intercept from the model CLPARM requests confidence limits for the parameter estimates. It also requests confidence limits for the linear functions specified in the ESTIMATE= parameter.
Optional Macro Parameters
ALPHA= alpha-value specifies the confidence level for confidence limits. The default value of ALPHA=.05 produces 95% confidence limits. STRATA= variable specifies the first-stage strata. CLUSTER= variable specifies the first-stage sample units for a clustered or multistage sample. The first-stage sample units are commonly referred to as the primary sampling units (PSUs). POPSIZE= variable can be used to compute finite population correction factors with a variable containing the stratum population totals (or overall population if your sample is not stratified). The POPSIZE= parameter should only be used to compute fpc factors for single-stage samples that were selected without replacement and with equal probabilities. Conversely, it should not be used to compute fpc factors for single-stage samples selected with replacement, single-stage samples selected with unequal probabilities, or multistage samples selected in any fashion. (For more information on sample design capabilities and when to use the POPSIZE= parameter, see SAMPLE DESIGN SPECIFICATIONS.) Appropriate single-stage samples for computing fpc factors include simple random, stratified random, simple one-stage cluster, or stratified one-stage cluster. For a stratified random sample, the population totals would be the number of observational units in each stratum population. For a stratified one-stage cluster sample, the population totals would be the number of clusters in each stratum population. The POPSIZE= variable can also contain special values that tell %SREGSUB to compute different variance formulas for different strata. The applicable variance formula for each stratum corresponds to the type of sampling performed in each stratum. The value "0" indicates sample units were selected with certainty (probability equal to one). The value "-1" indicates sample units were selected with replacement. For example, all three possible variance formulas could be specified using the following codes: ------------------------------------------------------------------------ | STRATA POPSIZE Variance Formula | ------------------------------------------------------------------------ | 1 0 with certainty, variance equals zero | | 2 -1 with replacement, no fpc | | 3 1000 without replacement, fpc=(1-n/1000) | ------------------------------------------------------------------------ CLASS= variables specifies the classification variables to be used in the model. The variables must be numeric. CONTRAST= 'label' effect values < ... & effect values > < /options > performs a custom hypothesis test by specifying an L vector or matrix for testing the hypothesis LB=0. You can specify nine additional contrasts that are numbered from CONTRAST2 to CONTRAST10. For example, to test the pairwise effects for a class variable A with three levels, the specifications would be: contrast= 'A: 1 vs 2' a 1 -1 0, contrast2= 'A: 1 vs 3' a 1 0 -1, contrast3= 'A: 2 vs 3' a 0 1 -1, Multiple-degree-of-freedom hypotheses tests can be specified by separating the rows of the L matrix with ampersands. For example, to test the overall A effect, the specification would be: contrast= 'Overall A' a 1 0 -1 & a 0 1 -1, The options in the CONTRAST parameter are: E displays the entire L vector. SINGULAR=value specifies the sensitivity for checking estimability. ESTIMATE= 'label' effect values < ... effect values > < /options > estimates linear functions of the parameters by multiplying the vector L by the parameter estimate vector b resulting in Lb. You can specify an additional nine linear functions that are numbered from ESTIMATE2 to ESTIMATE10. For example, to estimate pairwise differences for a class variable A with three levels, the specifications would be: estimate= 'A: 1 vs 2' a 1 -1 0, estimate2= 'A: 1 vs 3' a 1 0 -1, estimate3= 'A: 2 vs 3' a 0 1 -1, The options in the ESTIMATE parameter are: E displays the entire L vector DIVISOR=number specifies a value by which to divide all coefficients so that fractional coefficients can be entered as integers SINGULAR=value specifies the sensitivity for checking estimability SUBPOP= variable operator number restricts the regression analysis to a subpopulation of the overall population. The subpopulation variable must be numeric. The available comparison operators are listed below: ------------------------------------------------------------------------ | Symbol Mnemonic Definition | | Equivalent | ------------------------------------------------------------------------ | = EQ equal to | | ^= or ~= NE not equal to | | > GT greater than | | < LT less than | | >= GE greater than or equal to | | <= LE less than or equal to | | IN equal to one from a list of numbers | ------------------------------------------------------------------------ For the IN comparison operator, the syntax for 'number' would actually be a list of numbers enclosed in parentheses. OUTPUT= ODSTableName=sas-data-set < ... ODSTableName=sas-data-set > outputs statistics to new SAS data sets. The ODSTableName refers to the ODS table names used in PROC SURVEYREG. The available ODS Table Names in the %SREGSUB macro are listed below: ------------------------------------------------------------------------ | ODS Table Name Description Statement | ------------------------------------------------------------------------ | Effects Test of Model Effects MODEL | | ParameterEstimates Estimated Regression Coefficients MODEL | | Contrasts Analysis of Contrasts CONTRAST | | Estimates Analysis of Estimable Functions ESTIMATE | ------------------------------------------------------------------------ VARMULT= < FULLER > specifies multiplying the linearized covariance matrix of the regression coefficients by the multiplier [(n-1)/(n-p)], where n is the sample size and p is the number of parameters in the model. Supported by Fuller (1975), the SURVEYREG procedure strictly attaches this multiplier to the covariance matrix. To provide consistency with other literature, where the linearized covariance matrix is presented without the multiplier (e.g., Korn/Graubard 1999), the %SREGSUB macro does not strictly attach the multiplier. Also, by not attaching the multiplier, the variance estimates computed for a cell mean model would be consistent with the variance estimates computed for univariate subgroup means. For example, the following calls to the %SREGSUB and %SMSUB macros would compute the same variance estimates for the mean of variable-y by gender. %sregsub(data=sample, weight=w, class=gender, model=y=gender /noint); %smsub(data=sample, weight=w, var=y, table=gender);
Optional Macro Parameters for Output Appearance
COLWIDTH= number specifies the width of the first column in the output tables. The default is 26 spaces. FMTCELL= format specifies numeric format for values in the table cells. TITLE= 'text' specifies a title for the output listing.
Overview
The %SREGSUB macro provides linear regression capabilities currently not available in PROC SURVEYREG. This includes:
Because survey samples are from finite populations, the correct variance estimator for regression coefficients within a specified subgroup requires all the observations in your sample, including observations outside the subgroup. Consequently, elimination of observations outside the subgroup can lead to incorrect variance estimates. Therefore, unlike classical SAS procedures, you should not use the BY statement or the WHERE statement to compute regression estimates in PROC SURVEYREG.
The above consideration for subgroup variance estimation also applies when there is missing data in your data set. If any of your model variables contain missing values, the variance estimates pertain to the part of the population that, if sampled, would give a response to all model variables. Hence, the macro treats the non-missing values among all model variables as a subgroup of model-respondents. Analogous to any subgroup, the variance estimation for model-respondents requires all the observations in your sample, including the model-nonrespondents (i.e. the missing values). Of course, with nontrivial amounts of missing data, the unknown bias contribution to the total error for the target population will not be respresented by the variance estimates.
The subgroup variance estimators used in the macro are obtained by attaching a zero-one subgroup indicator to the analysis weight in the variance estimators used in PROC SURVEYREG. Thereby, the subgroup variance estimates are computed over all sample observations. For details on subgroup estimation and for variance estimation in the presence of missing values, refer to Cochran (1977), Levy/Lemeshow(1999), or Korn/Graubard (1999).
Acronyms in Documentation:
Sample Design Specifications
The SURVEYREG procedure and the corresponding %SREGSUB macro provide unbiased variance estimators for any sample design that falls under one of following three categories:
Number of Stages | Replacement Method | Selection Probabilities |
single-stage | with replacement | equal or unequal |
single-stage | without replacement | equal |
multistage | with replacement at first stage | equal or unequal |
For multistage samples, the variance estimator only uses the between-PSU variance component and is unbiased if the PSUs are selected with replacement. The variance components for the subsequent sample-stage units within the PSUs are not needed because the PSUs are independent when selected with replacement. This corresponds to the classical analysis of variance for nested random models. [Refer to Cochran (1977, section 11.9).]
Conversely, the SURVEYREG procedure and the %SREGSUB macro currently do not compute the unbiased variance estimators for any sample design that falls under one of these remaining three categories:
Number of Stages | Replacement Method | Selection Probabilities |
single-stage | without replacement | equal or unequal |
multistage | without replacement at first stage | equal |
multistage | without replacement at first stage | unequal |
The unbiased variance estimators for these sample designs require additional computation. For multistage samples using without replacement sampling at the first stage, the unbiased variance estimator not only adjusts the between-PSU variance component (with fpc factors if equal probabilities), but also requires computing additional variance components associated with subsequent sample-stage units. [Refer to Cochran, (section 10.4).] Also, if without replacement sampling is performed with unequal selection probabilities, the unbiased variance estimator requires incorporating the joint selection probabilities instead of computing fpc factors. [Refer to Cochran, (section 9A.7).]
At first glance, the variance estimation limitations for without replacement sampling may seem restrictive. However, a common practice is to approximate the variances for these more complicated designs by assuming the PSUs were selected with replacement. If the PSU sampling fractions are small (say, less than 10 percent), the approximation is very close to the variances that would be obtained from the unbiased estimators. In fact, for multistage samples with equal PSU selection probabilities, the approximation results in negligible overestimation, and for conservative practices, may even be preferable. [Refer to Korn and Graubard (1999, section 2.3).]
NOTE ON POPSIZE= PARAMETER: As stated in the SYNTAX section, the POPSIZE= parameter can be used to compute fpc factors for single-stage samples selected without replacement and with equal probabilities. It should not be used for multistage samples. For example, if the POPSIZE= parameter were used on a multistage sample that had equal selection probabilites at the first stage, the variance estimates would correctly reduce the between-PSU variance component with fpc factors, but would be missing the additional variance components associated with subsequent sample-stage units. Hence, the variances would be underestimated.
Fuller, W.A. (1975), "Regression Analysis for Sample Survey", Sankhya, 37 (3), Series C, 117-132.
Levy, P.S, Lemeshow, S. (1999), Sampling of Populations, Third Edition, New York: John Wiley & Sons, Inc.
Korn, E.L., Graubard, B.I. (1999), Analysis of Health Surveys, New York: John Wiley & Sons, Inc.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
Note: For additional examples featuring the SUBPOP=, and POPSIZE= parameters, see the examples for the %SMSUB macro.
/*---------------------------------------------------------------------- | | | Health Status Survey | | Simple Random Sample | | | | Dependent variable: | | bp (systolic blood pressure) | | | | Independent variables: | | age (age, in years) | | bmi (body mass index, kg/m2) | | exercise (1=regular exercise, 2=no regular exercise) | | alcohol (1=heavy use, 2=moderate use, 3=no use) | | | | Restricted subpopulation: age >= 25 | | | ----------------------------------------------------------------------*/ data bp; input bp age bmi exercise alcohol w @@; cards; 124 26 29 1 3 20 136 20 34 2 2 20 116 37 28 1 1 20 113 51 30 1 3 20 137 43 37 1 1 20 126 36 28 2 2 20 111 13 31 1 2 20 143 50 33 2 1 20 120 22 28 2 2 20 . 28 30 2 3 20 . 61 31 1 2 20 118 36 33 1 1 20 132 25 31 1 1 20 101 37 25 1 2 20 . 34 33 2 3 20 150 53 34 2 1 20 136 45 37 2 3 20 125 33 29 1 2 20 131 39 33 2 2 20 116 37 23 2 2 20 139 21 36 2 1 20 112 30 32 1 3 20 . 35 33 1 1 20 143 72 34 2 2 20 137 15 30 2 2 20 124 27 28 1 2 20 129 35 29 1 3 20 151 43 35 1 2 20 138 45 32 2 3 20 126 24 30 2 2 20 126 36 33 2 3 20 . 20 26 2 2 20 124 35 28 2 3 20 136 58 38 1 1 20 118 38 31 2 3 20 119 14 27 1 1 20 120 54 28 1 2 20 112 40 27 2 2 20 121 44 28 2 3 20 122 9 33 1 2 20 141 44 37 1 1 20 . 42 29 1 3 20 120 54 25 2 3 20 115 8 26 1 2 20 126 26 30 2 3 20 124 26 32 1 3 20 . 11 27 2 2 20 . 46 36 1 2 20 138 33 34 2 1 20 141 24 30 2 1 20 ; %include 'sregsub.sas'; %sregsub(data=bp, weight=w, class=exercise alcohol, model= bp= age bmi exercise alcohol exercise*alcohol, contrast='alc 1 vs 2: exercise=1' alcohol 1 -1 0 exercise*alcohol 1 -1 0 0 0 0, contrast2='alc 1 vs 2: exercise=2' alcohol 1 -1 0 exercise*alcohol 0 0 0 1 -1 0, subpop= age >= 25) run;
The SURVEYREG SUBGROUP Macro Data Summary Subpopulation : age >= 25 Dependent Variable: bp Total Observations 50 Weight Sum: 1000 Subpopulation Observations 38 Weight Sum: 760 Subpopulation Nonmissing Observations 32 Weight Sum: 640 Subpopulation Missing Observations 6 Weight Sum: 120 Number of Strata 1 Number of PSUs 50 Denominator Degrees of Freedom 49 R-square 0.66325 bp Mean 127.21875 Test of Model Effects Subpopulation : age >= 25 Dependent Variable: bp ----------------------------------------------------------------- |Effect | Num DF | Wald F | Pr > F | |------------------------+------------+------------+------------| |Model | 7.00| 11.93| 0.0000| |Intercept | .| .| .| |age | 1.00| 0.52| 0.4763| |bmi | 1.00| 29.42| 0.0000| |exercise | 1.00| 7.08| 0.0105| |alcohol | 2.00| 1.73| 0.1874| |exercise*alcohol | 2.00| 2.78| 0.0717| ----------------------------------------------------------------- Estimated Regression Coefficients Subpopulation : age >= 25 Dependent Variable: bp ------------------------------------------------------------------------------ |Parameter | | Standard | | | | | Estimate | Error | t Value | Pr > |t| | |------------------------+------------+------------+------------+------------| |Intercept | 55.9649| 11.9773| 4.67| 0.0000| |age | 0.0728| 0.1015| 0.72| 0.4763| |bmi | 2.2039| 0.4064| 5.42| 0.0000| |exercise 1 | -5.0112| 4.5072| -1.11| 0.2716| |exercise 2 | 0.0000| 0.0000| .| .| |alcohol 1 | 10.2015| 3.5027| 2.91| 0.0054| |alcohol 2 | 2.4587| 3.1108| 0.79| 0.4331| |alcohol 3 | 0.0000| 0.0000| .| .| |exercise*alcohol 1 1 | -9.0381| 5.6932| -1.59| 0.1188| |exercise*alcohol 1 2 | 4.0482| 6.4240| 0.63| 0.5315| |exercise*alcohol 1 3 | 0.0000| 0.0000| .| .| |exercise*alcohol 2 1 | 0.0000| 0.0000| .| .| |exercise*alcohol 2 2 | 0.0000| 0.0000| .| .| |exercise*alcohol 2 3 | 0.0000| 0.0000| .| .| ------------------------------------------------------------------------------ Analysis of Contrasts Subpopulation : age >= 25 Dependent Variable: bp ----------------------------------------------------------------- |Contrast | Num DF | Wald F | Pr > F | |------------------------+------------+------------+------------| |alc 1 vs 2: exercise=1 | 1.00| 1.00| 0.3234| |alc 1 vs 2: exercise=2 | 1.00| 4.23| 0.0451| -----------------------------------------------------------------
Right-click on the link below and select Save to save
the %SREGSUB macro definition
to a file. It is recommended that you name the file
sregsub.sas
.
Type: | Sample |
Topic: | SAS Reference ==> Procedures ==> SURVEYREG Analytics ==> Survey Sampling and Analysis Analytics ==> Regression |
Date Modified: | 2007-08-11 03:03:07 |
Date Created: | 2005-01-13 15:02:46 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/STAT | z/OS | 8 TS M0 | 9.1 TS1M3 SP4 |
OpenVMS VAX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
OS/2 | 8 TS M0 | 8.2 TS2M0 | ||
Microsoft Windows 95/98 | 8 TS M0 | 8.2 TS2M0 | ||
Microsoft Windows 2000 Advanced Server | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Microsoft Windows 2000 Datacenter Server | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Microsoft Windows 2000 Server | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Microsoft Windows 2000 Professional | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Microsoft Windows NT Workstation | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Microsoft Windows XP Professional | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Windows Millennium Edition (Me) | 8 TS M0 | 8.2 TS2M0 | ||
ABI+ for Intel Architecture | 8 TS M0 | 9.1 TS1M3 SP4 | ||
AIX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
HP-UX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
IRIX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
OpenVMS Alpha | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Solaris | 8 TS M0 | 9.1 TS1M3 SP4 | ||
Tru64 UNIX | 8 TS M0 | 9.1 TS1M3 SP4 |