Introduction to Survey Sampling and Analysis Procedures 
This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.
Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Due to variability among items, researchers apply scientific probabilitybased designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. See Lohr (1999), Kalton (1983), Cochran (1977), and Kish (1965) for more information about statistical sampling and analysis of complex survey data. To select probabilitybased random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods for probability sampling. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, and SURVEYLOGISTIC procedures, which incorporate the sample design into the analyses.
Many SAS/STAT procedures, such as the MEANS, FREQ, GLM and LOGISTIC procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.
The SURVEYMEANS, SURVEYFREQ, SURVEYREG, and SURVEYLOGISTIC procedures properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or singlestage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife.
Table 14.1 briefly describes the sampling and analysis procedures in SAS/STAT software.
PROC SURVEYSELECT 


Sampling Methods 
simple random sampling 
unrestricted random sampling (with replacement) 

systematic 

sequential 

probability proportional to size (PPS) sampling 

with and without replacement 

PPS systematic 

PPS for two units per stratum 

PPS sequential with minimum replacement 

Allocation Methods 
proportional 
optimal 

Neyman 

PROC SURVEYMEANS 

Statistics 
estimates of population means and totals 
estimates of population proportions 

estimates of population quantiles 

ratio estimates 

standard errors 

confidence limits 

hypothesis tests 

domain analysis 

PROC SURVEYFREQ 

Tables 
oneway frequency tables 
twoway and multiway crosstabulation tables 

estimates of population totals and proportions 

standard errors 

confidence limits 

Analyses 
tests of goodness of fit 
tests of independence 

risks and risk differences 

odds ratios and relative risks 

PROC SURVEYREG 

Analyses 
linear regression model fitting 
regression coefficients 

covariance matrices 

confidence limits 

hypothesis tests 

estimable functions 

contrasts 

predicted values and residuals 

domain analysis 

PROC SURVEYLOGISTIC 

Analyses 
cumulative logit regression model fitting 
logit, probit, and complementary loglog link functions 

generalized logit regression model fitting 

regression coefficients 

covariance matrices 

confidence limits 

hypothesis tests 

odds ratios 

estimable functions 

contrasts 

model diagnostics 

domain analysis 
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.