This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.
Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Because of variability among items, researchers apply probabilitybased scientific designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. For more information about statistical sampling and analysis of complex survey data, see Lohr (2010); Kalton (1983); Cochran (1977); Kish (1965). To select probabilitybased random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods of probability sampling. To perform imputation of missing values in survey data, you can use the SURVEYIMPUTE procedure, which provides methods of donorbased imputation. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures, which incorporate the sample design into the analyses.
Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.
The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures properly analyze complex survey data by taking into account the sample design. You can use these procedures for multistage or singlestage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife.
Table 14.1 briefly describes the SAS/STAT sampling and analysis procedures.
Table 14.1: Survey Sampling and Analysis Procedures
Selection Methods 
Simple random sampling (without replacement) 
Unrestricted random sampling (with replacement) 

Systematic 

Sequential 

Bernoulli 

Poisson 

Probability proportional to size (PPS) sampling, 

with and without replacement 

PPS systematic 

PPS for two units per stratum 

PPS sequential with minimum replacement 

Allocation Methods 
Proportional 
Optimal 

Neyman 

Sampling Tools 
Stratified sampling 
Cluster sampling 

Replicated sampling 

Serpentine sorting 

Random assignment 

Imputation Methods 
Single and multiple hotdeck 
Approximate Bayesian bootstrap 

Fully efficient fractional 

Statistics 
Means and totals 
Proportions 

Quantiles 

Geometric means 

Ratios 

Standard errors 

Confidence limits 

Analyses 
Hypothesis tests 
Domain analysis 

Poststratification 

Graphics 
Histograms 
Box plots 

Summary panel plots 

Domain box plots 

Tables 
Oneway frequency tables 
Twoway and multiway crosstabulation tables 

Estimates of population totals and proportions 

Standard errors 

Confidence limits 

Analyses 
Tests of goodness of fit 
Tests of independence 

Risks and risk differences 

Odds ratios and relative risks 

Kappa coefficients 

Graphics 
Weighted frequency and percent plots 
Mosaic plots 

Odds ratio, relative risk, and risk difference plots 

Kappa plots 

Analyses 
Linear regression model fitting 
Regression coefficients 

Covariance matrices 

Confidence limits 

Hypothesis tests 

Estimable functions 

Contrasts 

Least squares means (LSmeans) of effects 

Custom hypothesis tests among LSmeans 

Regression with constructed effects 

Predicted values and residuals 

Domain analysis 

Graphics 
Fit plots 
Analyses 
Cumulative logit regression model fitting 
Logit, probit, and complementary loglog link functions 

Generalized logit regression model fitting 

Regression coefficients 

Covariance matrices 

Confidence limits 

Hypothesis tests 

Odds ratios 

Estimable functions 

Contrasts 

Least squares means (LSmeans) of effects 

Custom hypothesis tests among LSmeans 

Regression with constructed effects 

Model diagnostics 

Domain analysis 

Analyses 
Proportional hazards regression model fitting 
Breslow and Efron likelihoods 

Regression coefficients 

Covariance matrices 

Confidence limits 

Hypothesis tests 

Hazard ratios 

Contrasts 

Predicted values and standard errors 

Martingale, Schoenfeld, score, and deviance residuals 

Domain analysis 