Introduction to Survey Sampling and Analysis Procedures |
This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.
Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Due to variability among items, researchers apply scientific probability-based designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. See Lohr (1999), Kalton (1983), Cochran (1977), and Kish (1965) for more information about statistical sampling and analysis of complex survey data. To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods for probability sampling. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, and SURVEYLOGISTIC procedures, which incorporate the sample design into the analyses.
Many SAS/STAT procedures, such as the MEANS, FREQ, GLM and LOGISTIC procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.
The SURVEYMEANS, SURVEYFREQ, SURVEYREG, and SURVEYLOGISTIC procedures properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife.
Table 14.1 briefly describes the sampling and analysis procedures in SAS/STAT software.
PROC SURVEYSELECT |
|
---|---|
Sampling Methods |
simple random sampling |
unrestricted random sampling (with replacement) |
|
systematic |
|
sequential |
|
probability proportional to size (PPS) sampling |
|
with and without replacement |
|
PPS systematic |
|
PPS for two units per stratum |
|
PPS sequential with minimum replacement |
|
Allocation Methods |
proportional |
optimal |
|
Neyman |
|
PROC SURVEYMEANS |
|
Statistics |
estimates of population means and totals |
estimates of population proportions |
|
estimates of population quantiles |
|
ratio estimates |
|
standard errors |
|
confidence limits |
|
hypothesis tests |
|
domain analysis |
|
PROC SURVEYFREQ |
|
Tables |
one-way frequency tables |
two-way and multiway crosstabulation tables |
|
estimates of population totals and proportions |
|
standard errors |
|
confidence limits |
|
Analyses |
tests of goodness of fit |
tests of independence |
|
risks and risk differences |
|
odds ratios and relative risks |
|
PROC SURVEYREG |
|
Analyses |
linear regression model fitting |
regression coefficients |
|
covariance matrices |
|
confidence limits |
|
hypothesis tests |
|
estimable functions |
|
contrasts |
|
predicted values and residuals |
|
domain analysis |
|
PROC SURVEYLOGISTIC |
|
Analyses |
cumulative logit regression model fitting |
logit, probit, and complementary log-log link functions |
|
generalized logit regression model fitting |
|
regression coefficients |
|
covariance matrices |
|
confidence limits |
|
hypothesis tests |
|
odds ratios |
|
estimable functions |
|
contrasts |
|
model diagnostics |
|
domain analysis |
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.