![]() | ![]() | ![]() | ![]() | ![]() |
| Contents: | Purpose / History / Requirements / Usage / Details / See Also / References |
| Version | Update Notes |
| 1.3 (01APR02) | Maintained compatibility with both SAS 8 and SAS 9 variance formula control for different strata (requested by J. Potterat) |
| 1.2 (01JAN01) | Cosmetic improvements. |
| 1.1 (01SEP00) | Initial coding. |
%inc "<location of your file containing the SREGSUB macro>";
Following this statement, you may call the %SREGSUB macro. See the Results tab for an example.
The following parameters are available in the %SREGSUB macro.
%SREGSUB(
DATA= sas-data-set,
ALPHA= alpha-value,
STRATA= variable,
CLUSTER= variable,
POPSIZE= variable,
WEIGHT= variable,
CLASS= variables,
MODEL= dependent= independents < / options >,
CONTRAST= 'label' effect values < ... & effect values > < /options >,
ESTIMATE= 'label' effect values < ... effect values > < /options >,
SUBPOP= variable operator number,
OUTPUT= ODSTableName=sas-data-set < ... ODSTableName=sas-data-set>,
VARMULT= < FULLER >,
COLWIDTH= number,
FMTCELL= format,
TITLE= 'text'
)
Most of the macro parameters correspond to the statements and options used in PROC SURVEYREG. The required parameters are DATA=, WEIGHT=, and MODEL=. The POPSIZE= parameter serves as a replacement for the TOTAL= option used in the procedure. The SUBPOP= parameter can be used to restrict the regression analysis to a specified subpopulation.
Required Macro Parameters
DATA= sas-data-set
specifies the data set to be analyzed.
WEIGHT= variable
specifies the analysis weight.
MODEL= dependent= independents < / options >
specifies the dependent variable and the independent effects. All the
variables used in the model must be numeric.
The model options are:
NOINT
omits the intercept from the model
CLPARM
requests confidence limits for the parameter estimates. It also
requests confidence limits for the linear functions specified in
the ESTIMATE= parameter.
Optional Macro Parameters
ALPHA= alpha-value
specifies the confidence level for confidence limits. The default value
of ALPHA=.05 produces 95% confidence limits.
STRATA= variable
specifies the first-stage strata.
CLUSTER= variable
specifies the first-stage sample units for a clustered or multistage
sample. The first-stage sample units are commonly referred to as the
primary sampling units (PSUs).
POPSIZE= variable
can be used to compute finite population correction factors with a
variable containing the stratum population totals (or overall population
if your sample is not stratified). The POPSIZE= parameter should only be
used to compute fpc factors for single-stage samples that were selected
without replacement and with equal probabilities. Conversely, it should
not be used to compute fpc factors for single-stage samples selected with
replacement, single-stage samples selected with unequal probabilities, or
multistage samples selected in any fashion. (For more information on
sample design capabilities and when to use the POPSIZE= parameter, see
SAMPLE DESIGN SPECIFICATIONS.)
Appropriate single-stage samples for computing fpc factors include simple
random, stratified random, simple one-stage cluster, or stratified
one-stage cluster. For a stratified random sample, the population totals
would be the number of observational units in each stratum population.
For a stratified one-stage cluster sample, the population totals would be
the number of clusters in each stratum population.
The POPSIZE= variable can also contain special values that tell %SREGSUB to
compute different variance formulas for different strata. The applicable
variance formula for each stratum corresponds to the type of sampling
performed in each stratum. The value "0" indicates sample units were
selected with certainty (probability equal to one). The value "-1"
indicates sample units were selected with replacement.
For example, all three possible variance formulas could be specified using
the following codes:
------------------------------------------------------------------------
| STRATA POPSIZE Variance Formula |
------------------------------------------------------------------------
| 1 0 with certainty, variance equals zero |
| 2 -1 with replacement, no fpc |
| 3 1000 without replacement, fpc=(1-n/1000) |
------------------------------------------------------------------------
CLASS= variables
specifies the classification variables to be used in the model. The
variables must be numeric.
CONTRAST= 'label' effect values < ... & effect values > < /options >
performs a custom hypothesis test by specifying an L vector or matrix for
testing the hypothesis LB=0.
You can specify nine additional contrasts that are numbered from CONTRAST2
to CONTRAST10. For example, to test the pairwise effects for a class
variable A with three levels, the specifications would be:
contrast= 'A: 1 vs 2' a 1 -1 0,
contrast2= 'A: 1 vs 3' a 1 0 -1,
contrast3= 'A: 2 vs 3' a 0 1 -1,
Multiple-degree-of-freedom hypotheses tests can be specified by separating
the rows of the L matrix with ampersands. For example, to test the
overall A effect, the specification would be:
contrast= 'Overall A' a 1 0 -1 &
a 0 1 -1,
The options in the CONTRAST parameter are:
E displays the entire L vector.
SINGULAR=value specifies the sensitivity for checking estimability.
ESTIMATE= 'label' effect values < ... effect values > < /options >
estimates linear functions of the parameters by multiplying the vector L
by the parameter estimate vector b resulting in Lb.
You can specify an additional nine linear functions that are numbered from
ESTIMATE2 to ESTIMATE10. For example, to estimate pairwise differences
for a class variable A with three levels, the specifications would be:
estimate= 'A: 1 vs 2' a 1 -1 0,
estimate2= 'A: 1 vs 3' a 1 0 -1,
estimate3= 'A: 2 vs 3' a 0 1 -1,
The options in the ESTIMATE parameter are:
E displays the entire L vector
DIVISOR=number specifies a value by which to divide all coefficients
so that fractional coefficients can be entered as
integers
SINGULAR=value specifies the sensitivity for checking estimability
SUBPOP= variable operator number
restricts the regression analysis to a subpopulation of the overall
population. The subpopulation variable must be numeric. The available
comparison operators are listed below:
------------------------------------------------------------------------
| Symbol Mnemonic Definition |
| Equivalent |
------------------------------------------------------------------------
| = EQ equal to |
| ^= or ~= NE not equal to |
| > GT greater than |
| < LT less than |
| >= GE greater than or equal to |
| <= LE less than or equal to |
| IN equal to one from a list of numbers |
------------------------------------------------------------------------
For the IN comparison operator, the syntax for 'number' would actually be
a list of numbers enclosed in parentheses.
OUTPUT= ODSTableName=sas-data-set < ... ODSTableName=sas-data-set >
outputs statistics to new SAS data sets. The ODSTableName
refers to the ODS table names used in PROC SURVEYREG. The available ODS
Table Names in the %SREGSUB macro are listed below:
------------------------------------------------------------------------
| ODS Table Name Description Statement |
------------------------------------------------------------------------
| Effects Test of Model Effects MODEL |
| ParameterEstimates Estimated Regression Coefficients MODEL |
| Contrasts Analysis of Contrasts CONTRAST |
| Estimates Analysis of Estimable Functions ESTIMATE |
------------------------------------------------------------------------
VARMULT= < FULLER >
specifies multiplying the linearized covariance matrix of the regression
coefficients by the multiplier [(n-1)/(n-p)], where n is the sample size
and p is the number of parameters in the model. Supported by Fuller
(1975), the SURVEYREG procedure strictly attaches this multiplier to the
covariance matrix.
To provide consistency with other literature, where the linearized
covariance matrix is presented without the multiplier (e.g., Korn/Graubard
1999), the %SREGSUB macro does not strictly attach the multiplier. Also,
by not attaching the multiplier, the variance estimates computed for a
cell mean model would be consistent with the variance estimates computed
for univariate subgroup means. For example, the following calls to the
%SREGSUB and %SMSUB macros would compute the same variance estimates for the
mean of variable-y by gender.
%sregsub(data=sample, weight=w, class=gender, model=y=gender /noint);
%smsub(data=sample, weight=w, var=y, table=gender);
Optional Macro Parameters for Output Appearance
COLWIDTH= number
specifies the width of the first column in the output tables. The default
is 26 spaces.
FMTCELL= format
specifies numeric format for values in the table cells.
TITLE= 'text'
specifies a title for the output listing.
Overview
The %SREGSUB macro provides linear regression capabilities currently not available in PROC SURVEYREG. This includes:
Because survey samples are from finite populations, the correct variance estimator for regression coefficients within a specified subgroup requires all the observations in your sample, including observations outside the subgroup. Consequently, elimination of observations outside the subgroup can lead to incorrect variance estimates. Therefore, unlike classical SAS procedures, you should not use the BY statement or the WHERE statement to compute regression estimates in PROC SURVEYREG.
The above consideration for subgroup variance estimation also applies when there is missing data in your data set. If any of your model variables contain missing values, the variance estimates pertain to the part of the population that, if sampled, would give a response to all model variables. Hence, the macro treats the non-missing values among all model variables as a subgroup of model-respondents. Analogous to any subgroup, the variance estimation for model-respondents requires all the observations in your sample, including the model-nonrespondents (i.e. the missing values). Of course, with nontrivial amounts of missing data, the unknown bias contribution to the total error for the target population will not be respresented by the variance estimates.
The subgroup variance estimators used in the macro are obtained by attaching a zero-one subgroup indicator to the analysis weight in the variance estimators used in PROC SURVEYREG. Thereby, the subgroup variance estimates are computed over all sample observations. For details on subgroup estimation and for variance estimation in the presence of missing values, refer to Cochran (1977), Levy/Lemeshow(1999), or Korn/Graubard (1999).
Acronyms in Documentation:
Sample Design Specifications
The SURVEYREG procedure and the corresponding %SREGSUB macro provide unbiased variance estimators for any sample design that falls under one of following three categories:
| Number of Stages | Replacement Method | Selection Probabilities |
| single-stage | with replacement | equal or unequal |
| single-stage | without replacement | equal |
| multistage | with replacement at first stage | equal or unequal |
For multistage samples, the variance estimator only uses the between-PSU variance component and is unbiased if the PSUs are selected with replacement. The variance components for the subsequent sample-stage units within the PSUs are not needed because the PSUs are independent when selected with replacement. This corresponds to the classical analysis of variance for nested random models. [Refer to Cochran (1977, section 11.9).]
Conversely, the SURVEYREG procedure and the %SREGSUB macro currently do not compute the unbiased variance estimators for any sample design that falls under one of these remaining three categories:
| Number of Stages | Replacement Method | Selection Probabilities |
| single-stage | without replacement | equal or unequal |
| multistage | without replacement at first stage | equal |
| multistage | without replacement at first stage | unequal |
The unbiased variance estimators for these sample designs require additional computation. For multistage samples using without replacement sampling at the first stage, the unbiased variance estimator not only adjusts the between-PSU variance component (with fpc factors if equal probabilities), but also requires computing additional variance components associated with subsequent sample-stage units. [Refer to Cochran, (section 10.4).] Also, if without replacement sampling is performed with unequal selection probabilities, the unbiased variance estimator requires incorporating the joint selection probabilities instead of computing fpc factors. [Refer to Cochran, (section 9A.7).]
At first glance, the variance estimation limitations for without replacement sampling may seem restrictive. However, a common practice is to approximate the variances for these more complicated designs by assuming the PSUs were selected with replacement. If the PSU sampling fractions are small (say, less than 10 percent), the approximation is very close to the variances that would be obtained from the unbiased estimators. In fact, for multistage samples with equal PSU selection probabilities, the approximation results in negligible overestimation, and for conservative practices, may even be preferable. [Refer to Korn and Graubard (1999, section 2.3).]
NOTE ON POPSIZE= PARAMETER: As stated in the SYNTAX section, the POPSIZE= parameter can be used to compute fpc factors for single-stage samples selected without replacement and with equal probabilities. It should not be used for multistage samples. For example, if the POPSIZE= parameter were used on a multistage sample that had equal selection probabilites at the first stage, the variance estimates would correctly reduce the between-PSU variance component with fpc factors, but would be missing the additional variance components associated with subsequent sample-stage units. Hence, the variances would be underestimated.
Fuller, W.A. (1975), "Regression Analysis for Sample Survey", Sankhya, 37 (3), Series C, 117-132.
Levy, P.S, Lemeshow, S. (1999), Sampling of Populations, Third Edition, New York: John Wiley & Sons, Inc.
Korn, E.L., Graubard, B.I. (1999), Analysis of Health Surveys, New York: John Wiley & Sons, Inc.
Note: For additional examples featuring the SUBPOP=, and POPSIZE= parameters, see the examples for the %SMSUB macro.
/*----------------------------------------------------------------------
| |
| Health Status Survey |
| Simple Random Sample |
| |
| Dependent variable: |
| bp (systolic blood pressure) |
| |
| Independent variables: |
| age (age, in years) |
| bmi (body mass index, kg/m2) |
| exercise (1=regular exercise, 2=no regular exercise) |
| alcohol (1=heavy use, 2=moderate use, 3=no use) |
| |
| Restricted subpopulation: age >= 25 |
| |
----------------------------------------------------------------------*/
data bp;
input bp age bmi exercise alcohol w @@;
cards;
124 26 29 1 3 20 136 20 34 2 2 20 116 37 28 1 1 20
113 51 30 1 3 20 137 43 37 1 1 20 126 36 28 2 2 20
111 13 31 1 2 20 143 50 33 2 1 20 120 22 28 2 2 20
. 28 30 2 3 20 . 61 31 1 2 20 118 36 33 1 1 20
132 25 31 1 1 20 101 37 25 1 2 20 . 34 33 2 3 20
150 53 34 2 1 20 136 45 37 2 3 20 125 33 29 1 2 20
131 39 33 2 2 20 116 37 23 2 2 20 139 21 36 2 1 20
112 30 32 1 3 20 . 35 33 1 1 20 143 72 34 2 2 20
137 15 30 2 2 20 124 27 28 1 2 20 129 35 29 1 3 20
151 43 35 1 2 20 138 45 32 2 3 20 126 24 30 2 2 20
126 36 33 2 3 20 . 20 26 2 2 20 124 35 28 2 3 20
136 58 38 1 1 20 118 38 31 2 3 20 119 14 27 1 1 20
120 54 28 1 2 20 112 40 27 2 2 20 121 44 28 2 3 20
122 9 33 1 2 20 141 44 37 1 1 20 . 42 29 1 3 20
120 54 25 2 3 20 115 8 26 1 2 20 126 26 30 2 3 20
124 26 32 1 3 20 . 11 27 2 2 20 . 46 36 1 2 20
138 33 34 2 1 20 141 24 30 2 1 20
;
%include 'sregsub.sas';
%sregsub(data=bp,
weight=w,
class=exercise alcohol,
model= bp= age bmi exercise alcohol exercise*alcohol,
contrast='alc 1 vs 2: exercise=1'
alcohol 1 -1 0
exercise*alcohol 1 -1 0 0 0 0,
contrast2='alc 1 vs 2: exercise=2'
alcohol 1 -1 0
exercise*alcohol 0 0 0 1 -1 0,
subpop= age >= 25)
run;
The SURVEYREG SUBGROUP Macro
Data Summary
Subpopulation : age >= 25
Dependent Variable: bp
Total Observations 50 Weight Sum: 1000
Subpopulation Observations 38 Weight Sum: 760
Subpopulation Nonmissing Observations 32 Weight Sum: 640
Subpopulation Missing Observations 6 Weight Sum: 120
Number of Strata 1
Number of PSUs 50
Denominator Degrees of Freedom 49
R-square 0.66325
bp Mean 127.21875
Test of Model Effects
Subpopulation : age >= 25
Dependent Variable: bp
-----------------------------------------------------------------
|Effect | Num DF | Wald F | Pr > F |
|------------------------+------------+------------+------------|
|Model | 7.00| 11.93| 0.0000|
|Intercept | .| .| .|
|age | 1.00| 0.52| 0.4763|
|bmi | 1.00| 29.42| 0.0000|
|exercise | 1.00| 7.08| 0.0105|
|alcohol | 2.00| 1.73| 0.1874|
|exercise*alcohol | 2.00| 2.78| 0.0717|
-----------------------------------------------------------------
Estimated Regression Coefficients
Subpopulation : age >= 25
Dependent Variable: bp
------------------------------------------------------------------------------
|Parameter | | Standard | | |
| | Estimate | Error | t Value | Pr > |t| |
|------------------------+------------+------------+------------+------------|
|Intercept | 55.9649| 11.9773| 4.67| 0.0000|
|age | 0.0728| 0.1015| 0.72| 0.4763|
|bmi | 2.2039| 0.4064| 5.42| 0.0000|
|exercise 1 | -5.0112| 4.5072| -1.11| 0.2716|
|exercise 2 | 0.0000| 0.0000| .| .|
|alcohol 1 | 10.2015| 3.5027| 2.91| 0.0054|
|alcohol 2 | 2.4587| 3.1108| 0.79| 0.4331|
|alcohol 3 | 0.0000| 0.0000| .| .|
|exercise*alcohol 1 1 | -9.0381| 5.6932| -1.59| 0.1188|
|exercise*alcohol 1 2 | 4.0482| 6.4240| 0.63| 0.5315|
|exercise*alcohol 1 3 | 0.0000| 0.0000| .| .|
|exercise*alcohol 2 1 | 0.0000| 0.0000| .| .|
|exercise*alcohol 2 2 | 0.0000| 0.0000| .| .|
|exercise*alcohol 2 3 | 0.0000| 0.0000| .| .|
------------------------------------------------------------------------------
Analysis of Contrasts
Subpopulation : age >= 25
Dependent Variable: bp
-----------------------------------------------------------------
|Contrast | Num DF | Wald F | Pr > F |
|------------------------+------------+------------+------------|
|alc 1 vs 2: exercise=1 | 1.00| 1.00| 0.3234|
|alc 1 vs 2: exercise=2 | 1.00| 4.23| 0.0451|
-----------------------------------------------------------------
Right-click on the link below and select Save to save
the %SREGSUB macro definition
to a file. It is recommended that you name the file
sregsub.sas.
| Type: | Sample |
| Topic: | SAS Reference ==> Procedures ==> SURVEYREG Analytics ==> Survey Sampling and Analysis Analytics ==> Regression |
| Date Modified: | 2007-08-11 03:03:07 |
| Date Created: | 2005-01-13 15:02:46 |
| Product Family | Product | Host | SAS Release | |
| Starting | Ending | |||
| SAS System | SAS/STAT | z/OS | 8 TS M0 | 9.1 TS1M3 SP4 |
| OpenVMS VAX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| OS/2 | 8 TS M0 | 8.2 TS2M0 | ||
| Microsoft Windows 95/98 | 8 TS M0 | 8.2 TS2M0 | ||
| Microsoft Windows 2000 Advanced Server | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Microsoft Windows 2000 Datacenter Server | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Microsoft Windows 2000 Server | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Microsoft Windows 2000 Professional | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Microsoft Windows NT Workstation | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Microsoft Windows XP Professional | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Windows Millennium Edition (Me) | 8 TS M0 | 8.2 TS2M0 | ||
| ABI+ for Intel Architecture | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| AIX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| HP-UX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| IRIX | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| OpenVMS Alpha | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Solaris | 8 TS M0 | 9.1 TS1M3 SP4 | ||
| Tru64 UNIX | 8 TS M0 | 9.1 TS1M3 SP4 | ||





