Survey Analysis
Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Due to variability among items, researchers
apply scientific probabilitybased designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample.
The survey analysis procedures in SAS/STAT software properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or singlestage designs,
with or without stratification, and with or without unequal weighting.
The SAS/STAT survey analysis procedures include the following:
 SURVEYMEANS Procedure — Means, totals, proportions, quantiles, and ratios from complex multistage survey designs
 SURVEYFREQ Procedure — Oneway to nway frequency and crosstabulation tables from complex multistage survey designs
 SURVEYIMPUTE Procedure — Imputes missing values of an item in a data set by replacing them with observed values from
the same item and computes replicate weights (such as jackknife weights) that account for the imputation
 SURVEYLOGISTIC Procedure — Models with binary, ordinal, or nominal dependent variables for data sampled from complex survey designs
 SURVEYPHREG Procedure — Regression analysis based on the Cox proportional hazards model for sample survey data
 SURVEYREG Procedure — Linear regression analysis for for data sampled from complex survey designs
 SURVEYSELECT Procedure — Selects simple random samples or selects samples according to a complex multistage survey design
SURVEYMEANS Procedure
The SURVEYMEANS procedure estimates characteristics of a survey population by using statistics computed from a survey sample.
It enables you to estimate statistics such as means, totals, proportions, quantiles, geometric means, and ratios.
The following are highlights of the SURVEYMEANS procedure's features:
 provides domain analysis, which computes estimates for subpopulations or domains
 estimates variances and confidence limits and performs t tests for these statistics
 computes variances of the parameters by using the following methods:
 Taylor series (linearization)
 balanced repeated replication (BRR)
 delete1 jackknife
 enables you to employ Fay's method with BRR
 performs poststratification
 enables you to input or output a SAS data set containing a Hadamard matrix for BRR

 enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
 creates a SAS data set that contains the jackknife coefficients
 performs BY group processing, which enables you to obtain separate analyses on gouped observations (distinct from subpopulation analysis)
 creates a SAS data set that corresponds to any output table
 automatically creates graphs by using ODS graphics

For further details, see
SURVEYMEANS Procedure
SURVEYFREQ Procedure
The SURVEYFREQ procedure produces oneway to nway frequency and crosstabulation tables from complex multistage
survey designs with stratification, clustering, and unequal weighting. The following are highlights of the SURVEYFREQ
procedure's features:
 produces tables of population totals, population proportions, and their standard errors
 computes confidence limits, coefficients of variation, and design effects
 provides a variety of options to customize the table display
 provides RaoScott chisquare goodnessoffit tests, which are adjusted for the sample design, for oneway frequency tables
 produces simple and weighted kappa coefficients
 enables you to test a null hypothesis of equal proportions for a oneway frequency table or input a custom null hypothesis proportions for the test
 provides designadjusted tests of independence or no association, between the row and column variables for twoway tables. These tests include the following:
 RaoScott chisquare test
 RaoScott likelihood ratio test
 Wald chisquare test
 Wald loglinear chisquare test
 computes estimates and confidence limits for risks (or row proportions), the risk difference, the odds ratio, and relative risks for 2x2 tables

 computes variances of the estimated parameters by using the following methods:
 Taylor series (linearization)
 balanced repeated replication (BRR)
 delete1 jackknife
 enables you to employ Fay's method with BRR
 enables you to input or output a SAS data set containing a Hadamard matrix for BRR
 enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
 creates a SAS data set that contains the jackknife coefficients
 provides analysis for subpopulations, or domains, in addition to analysis for the entire study population
 calculates design effects for each overall proportion estimates in frequency and crosstabulation tables
 performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
 creates a SAS data set that corresponds to any output table
 automatically creates graphs by using ODS Graphics

For further details, see
SURVEYFREQ Procedure
SURVEYIMPUTE Procedure
The SURVEYIMPUTE procedure imputes missing values of an item in a data set by replacing them with observed
values from the same item. The principles by which the imputation is performed are particularly useful for survey data.
The following are highlights of the SURVEYIMPUTE procedure's features:
 fully efficient fractional hotdeck imputation
 traditional hotdeck imputation with the following donor selection methods
 approximate Bayesian bootstrap
 simple random samples without replacement
 simple random samples with replacement
 probability proportional to respondent weights with replacement
 computes imputationadjusted replicate weights

 computes imputationadjusted balanced repeated replication (BRR) weights
 computes imputationadjusted jackknife weights
 provides a CELLS statement which names the variables that identify the imputation cells
 imputes variables jointly or independently for the fully efficient fractional imputation method
 creates a SAS data set that contains the imputed data

For further details, see
SURVEYIMPUTE Procedure
SURVEYLOGISTIC Procedure
The SURVEYLOGISTIC procedure fits linear logistic regression models for discrete response survey data by the method of
maximum likelihood. For statistical inferences, PROC SURVEYLOGISTIC incorporates complex survey sample designs, including
designs with stratification, clustering, and unequal weighting. The following are highlights of the SURVEYLOGISTIC procedure's features:
 fits models with binary, ordinal, or nominal dependent variables with the following link functions:
 logit
 probit
 complementary loglog
 generalized logit
 computes variances of the regression parameters and odds ratios by using the following methods:
 Taylor series (linearization)
 balanced repeated replication (BRR)
 delete1 jackknife
 enables you to employ Fay's method with BRR
 enables you to input or output a SAS data set containing a Hadamard matrix for BRR
 enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
 creates a SAS data set that contains the jackknife coefficients
 provides analysis for subpopulations, or domains, in addition to analysis for the entire study population

 enables you to control the ordering of the response categories
 computes a generalized R2 measure for the fitted model
 tests linear hypotheses about the regression parameters
 enables you to specify units of change for continuous explanatory variables so that customized odds ratios can be estimated
 performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
 creates a data set that contains the variables in the input data set, the estimated linear predictors and their standard error estimates, the estimates
of the cumulative or individual response probabilities, and the confidence limits for the cumulative probabilities
 creates a SAS data set that corresponds to any output table
 automatically creates graphs by using ODS Graphics

For further details, see
SURVEYLOGISTIC Procedure
SURVEYPHREG Procedure
The SURVEYPHREG procedure performs regression analysis based on the Cox proportional hazards model for sample survey data.
Cox's semiparametric model is widely used in the analysis of survival data to estimate hazard rates when adequate explanatory variables are available.
The following are highlights of the SURVEYPHREG procedure's features:
 computes hazard ratios estimates
 computes variances of the regression parameters by using the following methods:
 Taylor series (linearization)
 balanced repeated replication (BRR)
 delete1 jackknife
 produces the following observationlevel output statistics:
 predicted values and their standard errors
 martingale residuals
 Schoenfeld residuals
 score residuals
 deviance residuals
 enables you to employ Fay's method with BRR
 enables you to input or output a SAS data set containing a Hadamard matrix for BRR
 enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
 provides analysis for subpopulations, or domains, in addition to analysis for the entire study population

 supports programming statements that enable you to include timedependent covariates in the model
 performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
 enables you to test linear hypotheses about the regression parameters
 enables you to estimate a linear function of the regression parameters
 creates a SAS data set that contains the estimated linear predictors and their standard error estimates, the residuals from the linear regression, and the
confidence limits for the predictors
 creates a SAS data set that contains the jackknife coefficients
 saves the context and results in an item store that can be processed with the PLM procedure
 creates a SAS data set that corresponds to any output table
 automatically creates graphs by using ODS Graphics

For further details, see
SURVEYPHREG Procedure
SURVEYREG Procedure
The SURVEYREG procedure performs regression analysis for sample survey data. This procedure can handle complex survey sample designs,
including designs with stratification, clustering, and unequal weighting. The procedure fits linear models for survey data and computes
regression coefficients and their variancecovariance matrix. The following are highlights of the SURVEYREG procedure's features:
 computes the regression coefficient estimators by generalized least squares
estimation using elementwise regression
 computes variances of the regression parameters by using the following methods:
 Taylor series (linearization)
 balanced repeated replication (BRR)
 delete1 jackknife
 enables you to employ Fay's method with BRR
 enables you to input or output a SAS data set containing a Hadamard matrix for BRR
 enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
 creates a SAS data set that contains the jackknife coefficients
 provides analysis for subpopulations, or domains, in addition to analysis for the entire study population

 calculates design effects for the regression coefficients
 enables you to test linear hypotheses about the regression parameters
 enables you to estimate a linear function of the regression parameters
 performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
 creates a SAS data set that contains the estimated linear predictors and their standard error estimates,
the residuals from the linear regression, and the confidence limits for the predictors
 creates a SAS data set that corresponds to any output table
 automatically creates graphs by using ODS Graphics

For further details, see
SURVEYREG Procedure
SURVEYSELECT Procedure
The SURVEYSELECT procedure provides a variety of methods for selecting probabilitybased random samples. The procedure can
select a simple random sample or can sample according to a complex multistage sample design that includes stratification,
clustering, and unequal probabilities of selection. With probability sampling, each unit in the survey population has a known,
positive probability of selection. This property of probability sampling avoids selection bias and enables you to use statistical
theory to make valid inferences from the sample to the survey population. The following are highlights of the SURVEYSELECT procedure's
features:
 selects the sample and produces an output data set that contains the selected units, their selection probabilities, and their sampling weights
 provides methods for both equal probability sampling and probability proportional to size (PPS) sampling
 provides the following equal probability sampling methods:
 simple random sampling
 unrestricted random sampling (with replacement)
 systematic random sampling
 sequential random sampling
 Bernoulli
 provides the following unequal probability sampling methods:
 provides the following probability proportional to size (PPS) methods:
 PPS sampling without replacement
 PPS sampling with replacement
 PPS systematic sampling
 PPS algorithms for selecting two units per stratum
 sequential PPS sampling with minimum replacement

 performs stratified sampling by selecting samples independently within the specified strata, or nonoverlapping subgroups of the survey population
 enables you to sort by control variables within strata for the additional control
of implicit stratification when using a systematic or sequential selection method
 provides survey design methods to allocate the total sample size among the strata
 provides the following allocation methods: proportional, Neyman, and optimal allocation
 provides replicated sampling, where the total sample is composed of a
set of replicates, and each replicate is selected in the same way
 enables you to randomly assign the observations in the input data set to groups

For further details, see
SURVEYSELECT Procedure