Overview: Survey Sampling and Analysis Procedures

This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.

Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Because of variability among items, researchers apply probability-based scientific designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. For more information about statistical sampling and analysis of complex survey data, see Lohr (2010); Kalton (1983); Cochran (1977); Kish (1965). To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods for probability sampling. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures, which incorporate the sample design into the analyses.

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife.

Table 14.1 briefly describes the SAS/STAT sampling and analysis procedures.

Table 14.1: Survey Sampling and Analysis Procedures in SAS/STAT Software

PROC SURVEYSELECT

 

Selection Methods

Simple random sampling (without replacement)

 

Unrestricted random sampling (with replacement)

 

Systematic

 

Sequential

 

Bernoulli

 

Poisson

 

Probability proportional to size (PPS) sampling,

 

$\quad $ with and without replacement

 

PPS systematic

 

PPS for two units per stratum

 

PPS sequential with minimum replacement

Allocation Methods

Proportional

 

Optimal

 

Neyman

Sampling Tools

Cluster sampling

 

Replicated sampling

 

Serpentine sorting

PROC SURVEYMEANS

 

Statistics

Means and totals

 

Proportions

 

Quantiles

 

Geometric means

 

Ratios

 

Standard errors

 

Confidence limits

Analyses

Hypothesis tests

 

Domain analysis

 

Poststratification

PROC SURVEYFREQ

 

Tables

One-way frequency tables

 

Two-way and multiway crosstabulation tables

 

Estimates of population totals and proportions

 

Standard errors

 

Confidence limits

Analyses

Tests of goodness of fit

 

Tests of independence

 

Risks and risk differences

 

Odds ratios and relative risks

Graphics

Weighted frequency and percent plots

 

Mosaic plots

 

Odds ratio, relative risk, and risk difference plots

PROC SURVEYREG

 

Analyses

Linear regression model fitting

 

Regression coefficients

 

Covariance matrices

 

Confidence limits

 

Hypothesis tests

 

Estimable functions

 

Contrasts

 

Least squares means (LS-means) of effects

 

Custom hypothesis tests among LS-means

 

Regression with constructed effects

 

Predicted values and residuals

 

Domain analysis

PROC SURVEYLOGISTIC

 

Analyses

Cumulative logit regression model fitting

 

Logit, probit, and complementary log-log link functions

 

Generalized logit regression model fitting

 

Regression coefficients

 

Covariance matrices

 

Confidence limits

 

Hypothesis tests

 

Odds ratios

 

Estimable functions

 

Contrasts

 

Least squares means (LS-means) of effects

 

Custom hypothesis tests among LS-means

 

Regression with constructed effects

 

Model diagnostics

 

Domain analysis

PROC SURVEYPHREG

 

Analyses

Proportional hazards regression model fitting

 

Breslow and Efron likelihoods

 

Regression coefficients

 

Covariance matrices

 

Confidence limits

 

Hypothesis tests

 

Hazard ratios

 

Contrasts

 

Predicted values and standard errors

 

Martingale, Schoenfeld, score, and deviance residuals

 

Domain analysis