![]() | ![]() | ![]() | ![]() | ![]() |
| Contents: | Purpose / History / Requirements / Usage / Details / See Also / References |
(In the SAS 9.2 release, the NOMCAR option was added to the SURVEYMEANS procedure to incorporate missing values into the variance computations for the mean and total statistic. In a future SAS release, the procedure will also incorporate missing values into the variance computation for the ratio statistic.)
| Version | Update Notes |
| 3.1 | Has the same functionality as 3.0, but has minor modifications to maintain compatibility with the SAS 9.2 release. |
| 3.0 |
Specify multiple table requests (TABLES=) Specify multiple contrast requests (CONTRAST=) Create contrasts for any statistic (mean, total, or ratio) Create data sets from any output object (OUTPUT=) Compute ratio estimates (RATIO=) Create formats for subgroup variables (FORMAT=) Create Microsoft Word tables (TABLES /STYLE=WORDREADY) Specify numerical formats (STATISTICS /Fkeyword=format) Suppress printing of output (STATISTICS /NOPRINT) Override options coded in macro (OPTIONS=) |
| 2.2 | Cosmetic improvements. |
| 2.1 | Added display of contrast vector. |
| 1.3 | Added CONTRAST= parameter Lifted restriction on the number of analysis variables Lifted restriction on maximum value of subgroup levels |
| 1.2 | Added more comparison operators for SUBPOP= parameter. |
| 1.1 | Initial coding. |
%inc "<location of your file containing the SMSUB macro>";
Following this statement, you may call the %SMSUB macro. See the Results tab for some examples.
The following parameters are available in the %SMSUB macro.
%SMSUB(
DATA= SAS-data-set,
STATISTICS= keywords < / options >,
STRATA= variable,
CLUSTER= variable,
POPSIZE= variable,
WEIGHT= variable,
VAR= variables,
RATIO= request < ... & request >
TABLES= requests < / options >,
CONTRAST= 'label' variable vector < ... & 'label' variable vector >,
SUBPOP= variable operator value,
OUTPUT= object=SAS-data-set < ... object=SAS-data-set >,
ALPHA= value,
FORMAT= variable value1='format1' ... valuen='formatn'
< ... & variable value1='format1' ... valuen='formatn' >,
COLWIDTH= value,
OPTIONS= options,
TITLE= 'text'
)
Required Macro Parameters
The required parameters are DATA=, WEIGHT=, and either VAR= or RATIO=. The VAR= and RATIO= parameters cannot be used at the same time.
DATA= SAS-data-set
specifies the data set to be analyzed.
WEIGHT= variable
specifies the analysis weight.
VAR= variables
specifies variables for computing mean or total statistics.
RATIO= request < ... & request >
specifies ratio requests. The variables used in the each request can
be either continuous or categorical. For example, the following
requests use both types of variables:
ratio= y/x & c=1/c=3,
The first request computes the weighted y-total over the weighted
x-total. The second request computes the weight sum of observations in
c=1 over the weight sum of observations in c=3. The ampersand is used
to separate the two requests.
Sample Design Parameters
STRATA= variable
specifies the first-stage strata.
CLUSTER= variable
specifies the first-stage sample units for a clustered or multistage
sample.
POPSIZE= variable
specifies a variable containing stratum population totals (or overall
population if your sample is not stratified) for computing finite
population correction factors. The POPSIZE= parameter should only be
used to compute fpc factors for one-stage samples that were selected
without replacement and with equal probabilities. (For more
information on sample design capabilities and when to use the POPSIZE=
parameter, see SAMPLE DESIGN SPECIFICATIONS.)
For a stratified non-clustered sample, the population totals would be
the number of observational units in each stratum population. For a
stratified one-stage cluster sample, the population totals would be the
number of clusters in each stratum population.
The POPSIZE= variable can also contain special values that tell %SMSUB
to use different variance formulas for different strata. The
applicable variance formula for each stratum corresponds to the type of
sampling performed in each stratum. The value "0" indicates sample
units were selected with certainty (probability equal to one). The
value "-1" indicates sample units were selected with replacement.
For example, all three possible variance formulas could be specified
using the following codes:
---------------------------------------------------------------
| STRATA | POPSIZE | Variance Formula |
---------------------------------------------------------------
| 1 | 0 | with certainty, variance equals zero |
| 2 | -1 | with replacement, no fpc |
| 3 | 1000 | without replacement, fpc=(1-n/1000) |
---------------------------------------------------------------
Optional Macro Parameters
STATISTICS= keywords < / options >,
specifies the statistics to be computed.
If the VAR= parameter is used, the available statistics are:
NOBS number of nonmissing observations
NMISS number of missing observations
SUMWGT weight sum of nonmissing observations
MEAN mean
SEMEAN standard error of mean
CLMEAN confidence limits for mean
CVMEAN coefficient of variation for mean
TMEAN t-value for Ho: contrast-mean=0
PMEAN p-value for Ho: contrast-mean=0
TOTAL total
SETOTAL standard error of total
CLTOTAL confidence limits for total
CVTOTAL coefficient of variation for total
ALL all statistics
If the RATIO= parameter is used, the available statistics are:
NOBS number of nonmissing observations
NMISS number of missing observations
SUMWGT weight sum of nonmissing observations
NUMTOTAL numerator total
DENTOTAL denominator total
RATIO ratio
SERATIO standard error of ratio
CLRATIO confidence limits for ratio
CVRATIO coefficient of variation for ratio
TRATIO t-value for Ho: contrast-ratio=0
PRATIO p-value for Ho: contrast-ratio=0
ALL all statistics
The default statistics for the VAR=, RATIO=, and CONTRAST parameters
are presented below:
------------------------------------------------------------
| Parameters | Default Statistics |
------------------------------------------------------------
| VAR= | NOBS MEAN SEMEAN CLMEAN |
| VAR=, CONTRAST= | NOBS MEAN SEMEAN TMEAN PMEAN |
| RATIO=, | NOBS RATIO SERATIO CLRATIO |
| RATIO=, CONTRAST= | NOBS RATIO SERATIO TRATIO PRATIO |
------------------------------------------------------------
You can use the following options to control how the statistics are
printed:
Fkeyword=format
specifies a numerical format for individual statistics.
NOPRINT
suppresses the printing of output.
For example, if you wanted to change the default 12.4 format for SEMEAN
to 12.6, you could specify the format as follows:
statistics= nobs mean semean /fsemean=12.6,
The NOPRINT option is helpful when used in combination with the OUTPUT=
parameter.
TABLES= requests < / options >
requests one-way, two-way, or three-way subgroup tables, where the
subgroup variables are joined by asterisks. For example, the following
requests would generate all three tables types:
tables= a a*b a*b*c,
If you omit the TABLES= parameter, the macro produces a table with
overall results. You can also request an overall table by using the
TABLES= parameter with a request using the special keyword, _OVERALL_.
You can use the following options to control how the table rows are
constructed:
LEVELMISS
displays statistics for missing subgroup levels.
EXCLMISSMARG
excludes missing subgroup levels in the marginal estimates.
NOMARG
suppresses marginal estimates.
See Example 6 for an illustration of when LEVELMISS and EXCLMISSMARG
are important options to consider.
You can also use the STYLE= option to produce standard box tables, or
to produce listing output ready for conversion to WORD tables.
STYLE=BOX
presents tables in a standard box style (the default).
STYLE=WORDREADY
generates listing output ready to convert to WORD tables.
For the STYLE=WORDREADY option, the output objects contain pound signs
that act as column deliminators. You create WORD tables by first
pasting your entire listing output into a WORD document. Using your
pointing device, select an output table (excluding the table title),
and click Table-Convert-Text to Table, which will open the table
conversion dialog box. Then choose the following options:
AutoFit behavior: AutoFit to contents
AutoFormat... "choose a table style"
Separate text at: Other #
After selecting the dialog box options, click OK to create the table.
See Example 2 for an illustration of the listing output created by the
STYLE=WORDREADY option.
CONTRAST= 'label' variable vector < ... & 'label' variable vector >
estimates a linear contrast. If subgroup tables are also requested,
then contrast estimates are computed for each level. For multiple
contrast requests, ampersands are used to separate each contrast.
For example, the parameters below test pairwise differences of drug use
for a variable c, by each level of gender:
var= druguse,
table= gender,
contrast= 'c: 1-2' c 1 -1 0 &
'c: 1-3' c 1 0 -1 &
'c: 2-3' c 0 1 -1,
When contrasts are requested, the statistics associated with the "mean",
"total", and "ratio" actually refer to the contrast mean, contrast
total, and contrast ratio.
SUBPOP= variable operator value
SUBPOP= variable IN (values)
restricts the analysis to a subpopulation of the overall population.
The available comparison operators are listed below:
-----------------------------------------------------------------
| Symbol | Mnemonic | Definition |
| | Equivalent | |
----------------------------------------------------------------|
| = | EQ | equal to |
| ^= or ~= | NE | not equal to |
| > | GT | greater than |
| < | LT | less than |
| >= | GE | greater than or equal to |
| <= | LE | less than or equal to |
| | IN | equal to one from a list of numbers |
-----------------------------------------------------------------
For example, to restrict an analysis to people aged 25 or older, the
syntax could be written using either the ">=" or "ge" operators:
subpop= age >= 25,
subpop= age ge 25,
For the IN comparison operator, an example that restricts an analysis
to certain age groups could be written as follows:
subpop= agegrp in (3 4),
OUTPUT= object=SAS-data-set < ... object=SAS-data-set >,
creates output data sets from the macro output objects. The output
objects are listed below:
----------------------------------------------------
| Object | Description | Invocation |
---------------------------------------------------
| Summary | Data summary | always |
| Tables | Table estimates | always |
| Contrast | Contrast vectors | CONTRAST= |
----------------------------------------------------
For example, to create data sets for all three output objects, the
OUTPUT= parameter could be written as follows:
output= summary=mysummary contrast=mycontrast tables=mytables,
The tables= data set will contain special codes for the marginal and
missing categories of the subgroup variables. The value "0" identifies
marginal categories. The value "9.9E16" identifies any missing
categories if the table option LEVELMISS is specified.
If you omit the OUTPUT= parameter, default output data sets are still
created using the following names: _summary_, _contrast_, and
_tables_.
ALPHA= value
specifies the confidence level for confidence limits. The default
value of ALPHA=.05 produces 95% confidence limits.
Optional Parameters for Output Appearance
FORMAT= variable value1='format1' ... valuen='formatn'
< ... & variable value1='format1' ... valuen='formatn' >
specifies value formats for subgroup variables in the TABLES=
parameter. The ampersands separate the value formats given for each
variable. For example, a three-way table could specify formats as
follows:
tables= agegrp*gender*race,
format= agegrp 1='12-17' 2='18-34' &
gender 1='Male' 2='Female' &
race 1='White' 2='Black',
OPTIONS= options
overrides the SAS systems options coded inside the %SMSUB macro, which
are:
NOCENTER NONUMBER NODATE NONOTES NOSTIMER FORMDLIM=' '
FORMCHAR="|----|+|---+=|-/\<>*" ps=max ls=94
For example, to override the NODATE option, you would specify:
OPTIONS=date,
Only the date option would be changed. The remaining system options
would remain in effect.
COLWIDTH= value
specifies the first column width in the output tables. The default is
26 spaces.
TITLE= 'text'
specifies a title for the output listing.
The %SMSUB macro provides additional subgroup capabilities beyond those provided by the DOMAIN statement in PROC SURVEYMEANS. This includes:
Because survey samples are from finite populations, the correct variance estimator for a given subgroup requires all the observations in your sample, including observations outside the subgroup. Consequently, elimination of observations outside the subgroup can often produce incorrect variance estimates. [For conditions that lead to incorrect variance estimates, refer to Korn, Graubard (1999, section 5.4).] Therefore, you should not subset your data set prior to analysis. Also, you should not use the BY statement or the WHERE statement to compute subgroup estimates.
The above consideration for subgroup variance estimation also applies when there is missing data in your data set. If your analysis variables contain missing values, the estimates for the mean, total, or ratio effectively estimate the mean, total, or ratio on the part of the population that, if sampled, would have given a response to the survey item. Analogous to any subgroup, the variance estimation for item-respondents requires all the observations in your sample, including the item-nonrespondents (i.e. the missing values). Of course, with nontrivial amounts of missing data, the total error will equal the variance of the estimator plus the squared bias, where the bias contribution from nonresponse is unknown.
For the mean and total statistics in PROC SURVEYMEANS, the subgroup variance estimators multiply the analysis weight by a zero-one subgroup indicator in each location the weight appears. However, the zero-one indicator does not identify the item response within the subgroup. Rather, it deletes the observations associated with the missing values. In regard to the ratio statistic, PROC SURVEYMEANS currently does not compute subgroup estimates.
The variance estimators in the %SMSUB macro for the mean, total, and ratio statistics start with the same formulas used in PROC SURVEYMEANS. But for each statistic, the analysis weight is multiplied by a zero-one indicator that identifies both the subgroup and item response membership. Thereby, the subgroup variance estimates are always computed over all sample observations. For details on subgroup estimation and variance estimation in the presence of missing values, refer to Cochran (1977), Levy/Lemeshow (1999), or Korn/Graubard (1999).
Acronyms in Documentation:
Sample Design Specifications
The %SMSUB macro provides the same sample design capabilities as the SURVEYMEANS procedure. This includes any sample design that falls under one of the following three major categories:
| Number of Stages | Replacement Method | Selection Probabilities |
| single-stage | with replacement (WR) | equal or unequal |
| single-stage | without replacement (WOR) | equal |
| multistage | with replacement at first stage | equal or unequal |
For multistage samples, the variance estimator only uses the between-PSU variance component and is unbiased if the PSUs are selected with replacement. The variance components for the subsequent sample-stage units within the PSUs are not needed because the PSUs are independent when selected with replacement. This corresponds to the classical analysis of variance for nested random models. [Refer to Cochran (1977, section 11.9).]
The table below expands the major categories into all the possible sample designs and presents the corresponding required sample design parameters:
| Number of Stages | Stage 1 Features | Stage 1 Method | Stage 1 Probs | Required Design Parameters |
| Single | WR | Equal or Unequal | ||
| Single | WOR | Equal | POPSIZE= | |
| Single | Strata | WR | Equal or Unequal | STRATA= |
| Single | Strata | WOR | Equal | STRATA= POPSIZE= |
| Single | Clusters | WR | Equal or Unequal | CLUSTER= |
| Single | Clusters | WOR | Equal | CLUSTER= POPSIZE= |
| Single | Strata, Clusters | WR | Equal or Unequal | STRATA= CLUSTER= |
| Single | Strata, Clusters | WOR | Equal | STRATA= CLUSTER= POPSIZE= |
| Multi | Clusters | WR | Equal or Unequal | CLUSTER= |
| Multi | Strata, Clusters | WR | Equal or Unequal | STRATA= CLUSTER= |
The remaining major categories currently not supported by the %SMSUB macro (or the SURVEYMEANS procedure) are as follows:
| Number of Stages | Replacement Method | Selection Probabilities |
| single-stage | without replacement | unequal |
| multistage | without replacement at first stage | equal |
| multistage | without replacement at first stage | unequal |
The unbiased variance estimators for these sample designs require additional computation. For multistage samples using without replacement sampling at the first stage, the unbiased variance estimator not only adjusts the between-PSU variance component (with fpc factors if equal probabilities), but also requires computing additional variance components associated with subsequent sample-stage units. [Refer to Cochran, (section 10.4).] Also, if without replacement sampling is performed with unequal selection probabilities, the unbiased variance estimator requires incorporating the joint selection probabilities instead of computing fpc factors. [Refer to Cochran, (section 9A.7).]
At first glance, the variance estimation limitations for without replacement sampling may seem restrictive. However, a common practice is to approximate the variances for these more complicated designs by assuming the PSUs were selected with replacement. If the PSU sampling fractions are small (say, less than 10 percent), the approximation is very close to the variances that would be obtained from the unbiased estimators. In fact, for multistage samples with equal PSU selection probabilities, the approximation results in negligible overestimation, and for conservative practices, may even be preferable. [Refer to Korn and Graubard (1999, section 2.3).]
Note on POPSIZE=
As stated in the SYNTAX section, the POPSIZE= parameter can be used to compute fpc factors for single-stage samples selected without replacement and with equal probabilities. It should not be used for multistage samples. For example, if the POPSIZE= parameter were used on a multistage sample that had equal selection probabilities at the first stage, the variance estimates would correctly reduce the between-PSU variance component with fpc factors, but would be missing the additional variance components associated with subsequent sample-stage units. Hence, the variances would be underestimated.
Levy, P.S, Lemeshow, S. (1999), Sampling of Populations, Third Edition, New York: John Wiley & Sons, Inc.
Korn, E.L., Graubard, B.I. (1999), Analysis of Health Surveys, New York: John Wiley & Sons, Inc.
Sample Design Parameters: STRATA=, POPSIZE=
Other Parameters: VAR=, TABLES=
Three hospital strata, partitioned by level of obstetric services. Sample 15 hospitals from a population of 158 hospitals.
Estimate total number of births by level of obstetric services (1=basic, 2=intermediate, 3=tertiary).
Reference: Levy, P.S., Lemeshow, S. (1999), Sampling of Populations, Third Edition, New York: John Wiley & Sons, Inc. p. 135.
data hospsamp;
input hospno oblevel sampsize tothosp births; weighta=tothosp/sampsize;
datalines;
15 1 4 42 480
80 1 4 42 426
86 1 4 42 342
136 1 4 42 174
7 2 5 99 2022
26 2 5 99 576
62 2 5 99 1999
90 2 5 99 482
101 2 5 99 836
28 3 6 17 3108
34 3 6 17 4674
39 3 6 17 2539
102 3 6 17 1610
119 3 6 17 4618
149 3 6 17 1781
;
%inc "<location of your file containing the SMSUB macro>";
%smsub(data=hospsamp, statistics=nobs total setotal,
strata=oblevel,
popsize=tothosp,
weight=weighta,
var=births,
tables=oblevel,
title='--- Example 1 ---');
run;
--- Example 1 ---
The SURVEYMEANS SUBGROUP Macro
Data Summary
Number of Observations Read: 15 Weight Sum: 158
Number of Strata: 3
Number of PSUs: 15
Denominator Degrees of Freedom: 12
Table: oblevel
Analysis Variable: births
-----------------------------------------------------------------
|By: oblevel | | | Std Error |
| | N | Total | of Total |
|------------------------+------------+------------+------------|
|Total | 15| 183983.0| 34014.3|
|1 | 4| 14931.0| 2669.9|
|2 | 5| 117117.0| 33067.7|
|3 | 6| 51935.0| 7508.4|
-----------------------------------------------------------------
Sample Design Parameters: STRATA=, CLUSTER=
Other Parameters: VAR=, TABLES/style=wordready, FORMAT=
Two first-stage school strata, partitioned by region. Sample 12 schools in first stage. Sample six students per school in second stage (72 total).
Estimate mean drug use by agegrp*gender, and agegrp*race.
NOTE: The STYLE=WORDREADY option is used to generate listing output ready to convert to WORD tables. (See the TABLES= parameter for more information.)
data students;
input studid$ region school gender race agegrp druguse @@;
if region=1 then analwt=1380; else analwt=1120;
datalines;
0101 1 01 2 2 2 1 0102 1 01 1 2 2 1 0103 1 01 1 1 2 1
0104 1 01 2 1 1 0 0105 1 01 1 2 2 1 0106 1 01 1 1 2 1
0201 1 02 2 1 2 0 0202 1 02 2 2 1 1 0203 1 02 2 1 1 1
0204 1 02 1 2 1 1 0205 1 02 2 2 2 1 0206 1 02 2 1 2 0
0301 1 03 2 1 1 1 0302 1 03 1 1 2 1 0303 1 03 1 2 1 0
0304 1 03 1 2 2 0 0305 1 03 2 1 1 0 0306 1 03 2 1 2 1
0401 1 04 1 1 1 1 0402 1 04 2 1 2 0 0403 1 04 2 2 2 1
0404 1 04 2 2 1 0 0405 1 04 1 2 1 1 0406 1 04 1 1 2 0
0501 1 05 2 2 2 0 0502 1 05 2 1 1 1 0503 1 05 1 2 2 1
0504 1 05 2 1 2 1 0505 1 05 1 2 1 0 0506 1 05 1 1 1 0
0601 1 06 1 1 2 0 0602 1 06 1 1 1 0 0603 1 06 2 2 1 1
0604 1 06 2 2 1 0 0605 1 06 1 1 2 1 0606 1 06 1 1 1 0
0701 2 07 1 1 1 1 0702 2 07 1 2 1 1 0703 2 07 1 1 1 0
0704 2 07 2 2 1 0 0705 2 07 2 2 2 1 0706 2 07 1 1 1 1
0801 2 08 2 2 2 0 0802 2 08 1 2 2 1 0803 2 08 1 2 1 0
0804 2 08 1 1 1 1 0805 2 08 1 1 2 0 0806 2 08 1 2 1 1
0901 2 09 2 1 1 0 0902 2 09 1 1 1 1 0903 2 09 2 1 2 0
0904 2 09 2 1 1 0 0905 2 09 2 1 1 0 0906 2 09 2 2 2 0
1001 2 10 1 2 1 0 1002 2 10 2 2 2 1 1003 2 10 2 2 1 1
1004 2 10 1 1 2 1 1005 2 10 2 1 2 1 1006 2 10 1 1 2 1
1101 2 11 2 1 1 0 1102 2 11 2 2 2 0 1103 2 11 1 2 2 1
1104 2 11 2 1 2 1 1105 2 11 1 1 2 0 1106 2 11 2 1 1 1
1201 2 12 1 2 2 0 1202 2 12 2 2 1 1 1203 2 12 1 1 1 1
1204 2 12 2 1 2 1 1205 2 12 1 2 2 0 1206 2 12 1 1 2 0
;
%inc "<location of your file containing the SMSUB macro>";
%smsub(data=students, statistics=nobs mean semean,
strata=region,
cluster=school,
weight=analwt,
var=druguse,
tables=Agegrp*Gender Agegrp*Race /style=wordready,
format= Agegrp 1='15-16' 2='17-18' &
Gender 1='Male' 2='Female' &
Race 1='White' 2='Black',
title='--- Example 2 ---');
run;
--- Example 2 ---
The SURVEYMEANS SUBGROUP Macro
Data Summary
Number of Observations Read:#72 #Weight Sum:#90000
#
Number of Strata:#2
Number of PSUs:#12
Denominator Degrees of Freedom:#10
Table: Agegrp*Gender
Analysis Variable: druguse
Agegrp#Gender#NOBS#MEAN#SEMEAN
Total #Total #72 #0.5431 #0.0561
Total #Male #37 #0.5663 #0.0896
Total #Female #35 #0.5187 #0.0932
15-16 #Total #35 #0.5099 #0.0832
15-16 #Male #18 #0.5387 #0.1384
15-16 #Female #17 #0.4799 #0.1330
17-18 #Total #37 #0.5744 #0.1019
17-18 #Male #19 #0.5921 #0.1474
17-18 #Female #18 #0.5556 #0.1098
Table: Agegrp*Race
Analysis Variable: druguse
Agegrp#Race#NOBS#MEAN#SEMEAN
Total #Total #72 #0.5431 #0.0561
Total #White #40 #0.5224 #0.0758
Total #Black #32 #0.5690 #0.0902
15-16 #Total #35 #0.5099 #0.0832
15-16 #White #20 #0.4947 #0.1024
15-16 #Black #15 #0.5297 #0.1009
17-18 #Total #37 #0.5744 #0.1019
17-18 #White #20 #0.5495 #0.1449
17-18 #Black #17 #0.6042 #0.1269
Sample Design Parameters: CLUSTER=, POPSIZE=
Other Parameters: VAR=
Sample two housing developments out of five. Interview all households from each sampled development.
Estimate total number persons over 65 in the five housing developments.
Reference: Levy, P.S., Lemeshow, S. (1999), Sampling of Populations, Third Edition, New York: John Wiley & Sons, Inc. p. 236.
data tab9_1a;
input devlpmnt hh nge65 @@; wt1=2.5; M=5;
datalines;
2 1 2 2 2 1 2 3 2 2 4 1 2 5 1
2 6 1 2 7 2 2 8 1 2 9 3 2 10 1
2 11 1 2 12 2 2 13 1 2 14 3 2 15 1
2 16 3 2 17 1 2 18 2 2 19 1 2 20 3
5 1 1 5 2 1 5 3 1 5 4 3 5 5 2
5 6 1 5 7 3 5 8 1 5 9 1 5 10 3
5 11 2 5 12 3 5 13 1 5 14 1 5 15 2
5 16 2 5 17 1 5 18 2 5 19 2 5 20 1
;
%inc "<location of your file containing the SMSUB macro>";
%smsub(data=tab9_1a, statistics=nobs total setotal,
cluster=devlpmnt,
popsize=M,
weight=wt1,
var=nge65,
title='--- Example 3 ---')
run;
--- Example 3 ---
The SURVEYMEANS SUBGROUP Macro
Data Summary
Number of Observations Read: 40 Weight Sum: 100
Number of Strata: 1
Number of PSUs: 2
Denominator Degrees of Freedom: 1
NOTE: Finite population correction to sample PSUs (devlpmnt)
assumes inclusion of all listing units within sample PSUs.
Table: Overall
Analysis Variable: nge65
-----------------------------------------------------------------
|By: Overall | | | Std Error |
| | N | Total | of Total |
|------------------------+------------+------------+------------|
|Total | 40| 167.5| 1.9|
-----------------------------------------------------------------
Sample Design Parameters: none
Other Parameters: VAR=, TABLES=, CONTRAST=, SUBPOP=, FORMAT=
Perform pairwise comparisons of blood pressure between different levels of alcohol use. Restrict subpopulation to: age >= 25.
Variables:| bp | (systolic blood pressure) |
| age | (age, in years) |
| exercise | (1=regular exercise, 2=no regular exercise) |
| alcohol | (1=heavy use, 2=moderate use, 3=no use) |
data bp;
input bp age exercise alcohol w @@;
cards;
124 26 1 3 20 136 20 2 2 20 116 37 1 1 20
113 51 1 3 20 137 43 1 1 20 126 36 2 2 20
111 13 1 2 20 143 50 2 1 20 120 22 2 2 20
. 28 2 3 20 . 61 1 2 20 118 36 1 1 20
132 25 1 1 20 101 37 1 2 20 . 34 2 3 20
150 53 2 1 20 136 45 2 3 20 125 33 1 2 20
131 39 2 2 20 116 37 2 2 20 139 21 2 1 20
112 30 1 3 20 . 35 1 1 20 143 72 2 2 20
137 15 2 2 20 124 27 1 2 20 129 35 1 3 20
151 43 1 2 20 138 45 2 3 20 126 24 2 2 20
126 36 2 3 20 . 20 2 2 20 124 35 2 3 20
136 58 1 1 20 118 38 2 3 20 119 14 1 1 20
120 54 1 2 20 112 40 2 2 20 121 44 2 3 20
122 9 1 2 20 141 44 1 1 20 . 42 1 3 20
120 54 2 3 20 115 8 1 2 20 126 26 2 3 20
124 26 1 3 20 . 11 2 2 20 . 46 1 2 20
138 33 2 1 20 141 24 2 1 20
;
%inc "<location of your file containing the SMSUB macro>";
%smsub(data=bp, statistics=nobs mean tmean pmean/
fnobs=10.0 fmean=10.4 ftmean=10.2 fpmean=10.4,
weight=w,
var=bp,
tables=Exercise,
contrast='Alcohol 1-2' alcohol 1 -1 0 &
'Alcohol 1-3' alcohol 1 0 -1 &
'Alcohol 2-3' alcohol 0 -1 1,
format= exercise 1='Regular Exercise' 2='No Exercise',
subpop= age >= 25,
title='--- Example 4 ---');
run;
--- Example 4 ---
The SURVEYMEANS SUBGROUP Macro
Data Summary
Subpopulation: age >= 25
Number of Observations Read: 50 Weight Sum: 1000
Subpopulation Observations: 38 Weight Sum: 760
Number of Strata: 1
Number of PSUs: 50
Denominator Degrees of Freedom: 49
Contrast Vector(s)
Label Variable Values Coefficients
Alcohol 1-2 alcohol 1 1
2 -1
3 0
Alcohol 1-3 alcohol 1 1
2 0
3 -1
Alcohol 2-3 alcohol 1 0
2 -1
3 1
Table: Exercise *Contrast
Analysis Variable: bp
----------------------------------------------------------------------
|By: Exercise, Contrast | | | t Value | Pr > |t| |
| | N | Mean | Mean=0 | Mean=0 |
|------------------------+----------+----------+----------+----------|
|Total | | | | |
| Alcohol 1-2 | 19| 9.6556| 1.71| 0.0933|
| Alcohol 1-3 | 22| 10.6325| 2.60| 0.0124|
| Alcohol 2-3 | 23| -0.9769| -0.20| 0.8411|
|Regular Exercise | | | | |
| Alcohol 1-2 | 11| 5.8000| 0.71| 0.4840|
| Alcohol 1-3 | 11| 9.6000| 1.93| 0.0596|
| Alcohol 2-3 | 10| -3.8000| -0.49| 0.6294|
|No Exercise | | | | |
| Alcohol 1-2 | 8| 18.0667| 3.14| 0.0029|
| Alcohol 1-3 | 11| 17.5417| 4.66| 0.0000|
| Alcohol 2-3 | 13| 0.5250| 0.09| 0.9250|
----------------------------------------------------------------------
Sample Design Parameters: POPSIZE=
Other Parameters: RATIO=, TABLES=
Sample 56 hospitals from a population of 101 hospitals.
Estimate ratio of newborns whose mothers' hepatitis B surface antigen status (HBsAG) were recorded on the newborn medical records by level of obstetric services (1=basic, 2=intermediate, 3=tertiary).
Ratio Variables:| agrecs | (number of records having mother's antigen recorded) |
| births | (number of births) |
Reference: Levy, P.S., Lemeshow, S. (1999), Sampling of Populations, Third Edition, New York: John Wiley & Sons, Inc. p. 201.
data hospsamp;
input hosp oblevel agrecs births @@; tothosp=101; w=101/56; cards;
1 2 2898 2898 2 2 1095 1304 3 2 1860 2022 4 2 1227 1395
5 2 618 773 6 2 1499 1630 7 2 1321 1436 8 2 0 2525
9 3 3595 4674 10 2 1732 2279 11 2 1020 1252 12 3 304 2539
13 2 1708 1780 14 3 1195 3320 15 3 3706 4029 16 2 837 1436
17 2 597 844 18 2 490 511 19 2 39 331 20 2 2438 2540
21 2 2222 2526 22 2 1859 1859 23 2 1999 1999 24 2 683 2136
25 2 0 1378 26 2 1004 1142 27 1 342 342 28 2 1050 1094
29 2 1614 1869 30 2 866 1204 31 2 520 651 32 2 771 836
33 3 2911 4044 34 2 1777 1777 35 3 2170 2261 36 2 793 863
37 3 1809 1809 38 2 1302 1302 39 3 2669 2669 40 3 1279 1512
41 2 434 1242 42 2 1178 1286 43 2 253 576 44 2 52 1324
45 2 985 1027 46 2 37 602 47 2 435 518 48 2 1225 1613
49 1 365 381 50 2 719 782 51 2 740 740 52 1 689 752
53 1 365 365 54 2 760 760 55 2 602 1369 56 2 651 651
;
%inc "<location of your file containing the SMSUB macro>";
%smsub(data=hospsamp, statistics=nobs ratio seratio,
popsize=tothosp,
weight=w,
ratio=agrecs/births,
tables=oblevel,
title='---Example 5---');
---Example 5---
The SURVEYMEANS SUBGROUP Macro
Data Summary
Number of Observations Read: 56 Weight Sum: 101
Number of Strata: 1
Number of PSUs: 56
Denominator Degrees of Freedom: 55
Table: oblevel
Ratio: agrecs/births
-----------------------------------------------------------------
|By: oblevel | | | Std Error |
| | N | Ratio | of Ratio |
|------------------------+------------+------------+------------|
|Total | 56| 0.7526| 0.0304|
|1 | 4| 0.9571| 0.0137|
|2 | 43| 0.7560| 0.0360|
|3 | 9| 0.7312| 0.0593|
-----------------------------------------------------------------
Sample Design Parameters: none
Other Parameters: VAR=, TABLES/levelmiss exclmissmarg, OUTPUT=
Sample 14 people out of 500. Estimate proportion for variable v, by one-way tables of race and gender. Variables v, race, and gender variables have different missing value patterns.
| Table | Pros | Cons |
| Default | Nobs in total row is consistent across both tables | Nobs in cells do not add up to total row |
| Levelmiss | Nobs in total row is consistent across both tables Nobs in cells add up to total row. |
Missing category may be of no interest |
| ExclmissMarg | Nobs in cells add up to total row. | Nobs in total row is not consistent across both tables Nobs in total row excludes some nonmissing values of v |
data sample;
input
race gender v; w=500/14; datalines;
. . .
. . .
. . 0
. . 1
. 1 1
. 2 0
1 1 1
1 2 0
1 1 0
1 2 1
2 1 1
2 2 0
2 1 1
2 2 1
;
%inc "<location of your file containing the SMSUB macro>";
*-----------------------*
* Default Table *
*-----------------------*;
%smsub(data=sample, statistics=nobs mean semean /noprint,
weight=w,
var=v,
tables=Race Gender,
output= summary=summary tables=table6a);
run;
*-----------------------*
* Levelmiss Table *
*-----------------------*;
%smsub(data=sample, statistics=nobs mean semean /noprint,
weight=w,
var=v,
tables=race gender /levelmiss,
output=tables=table6b);
run;
*-----------------------*
* Exclmissmarg Table *
*-----------------------*;
%smsub(data=sample, statistics=nobs mean semean /noprint,
weight=w,
var=v,
tables=race gender /exclmissmarg,
output=tables=table6c);
run;
options ls=75;
proc print data=summary;
title'--- Example 6: Data Summary ---';
run;
proc print data=table6a noobs;
var tableindex variable race gender nobs mean semean;
title'--- Default Table ---';
run;
proc print data=table6b noobs;
var tableindex variable race gender nobs mean semean;
title'--- Levelmiss Table ---';
run;
proc print data=table6c noobs;
var tableindex variable race gender nobs mean semean;
title'--- Exclmissmarg Table ---';
run;
--- Example 6: Data Summary ---
Obs NSTRATA NPSU DDF WSUM NOBSREAD
1 1 14 13 500 14
--- Default Table ---
TABLEINDEX VARIABLE RACE GENDER NOBS MEAN SEMEAN
1 V 0 . 12 0.58333 0.14769
1 V 1 . 4 0.50000 0.25944
1 V 2 . 4 0.75000 0.22468
2 V . 0 12 0.58333 0.14769
2 V . 1 5 0.80000 0.18564
2 V . 2 5 0.40000 0.22736
--- Levelmiss Table ---
TABLEINDEX VARIABLE RACE GENDER NOBS MEAN SEMEAN
1 V 0 . 12 0.58333 0.14769
1 V 1 . 4 0.50000 0.25944
1 V 2 . 4 0.75000 0.22468
1 V 9.9E16 . 4 0.50000 0.25944
2 V . 0 12 0.58333 0.14769
2 V . 1 5 0.80000 0.18564
2 V . 2 5 0.40000 0.22736
2 V . 9.9E16 2 0.50000 0.36690
--- Exclmissmarg Table ---
TABLEINDEX VARIABLE RACE GENDER NOBS MEAN SEMEAN
1 V 0 . 8 0.625 0.17762
1 V 1 . 4 0.500 0.25944
1 V 2 . 4 0.750 0.22468
2 V . 0 10 0.600 0.16077
2 V . 1 5 0.800 0.18564
2 V . 2 5 0.400 0.22736
Right-click on the link below and select Save to save
the %SMSUB macro definition
to a file. It is recommended that you name the file
smsub.sas.
| Type: | Sample |
| Topic: | SAS Reference ==> Procedures ==> SURVEYMEANS Analytics ==> Survey Sampling and Analysis Analytics ==> Descriptive Statistics |
| Date Modified: | 2007-08-14 03:03:12 |
| Date Created: | 2005-01-17 12:13:02 |
| Product Family | Product | Host | SAS Release | |
| Starting | Ending | |||
| SAS System | SAS/STAT | All | 8 TS M0 | n/a |





