## Overview: Survey Sampling and Analysis Procedures

This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.

Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Because of variability among items, researchers apply probability-based scientific designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. For more information about statistical sampling and analysis of complex survey data, see Lohr (2010); Kalton (1983); Cochran (1977); Kish (1965). To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods for probability sampling. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures, which incorporate the sample design into the analyses.

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife.

Table 14.1 briefly describes the SAS/STAT sampling and analysis procedures.

Table 14.1: Survey Sampling and Analysis Procedures in SAS/STAT Software

Selection Methods

Simple random sampling (without replacement)

Unrestricted random sampling (with replacement)

Systematic

Sequential

Bernoulli

Poisson

Probability proportional to size (PPS) sampling,

with and without replacement

PPS systematic

PPS for two units per stratum

PPS sequential with minimum replacement

Allocation Methods

Proportional

Optimal

Neyman

Sampling Tools

Cluster sampling

Replicated sampling

Serpentine sorting

Statistics

Means and totals

Proportions

Quantiles

Geometric means

Ratios

Standard errors

Confidence limits

Analyses

Hypothesis tests

Domain analysis

Poststratification

Tables

One-way frequency tables

Two-way and multiway crosstabulation tables

Estimates of population totals and proportions

Standard errors

Confidence limits

Analyses

Tests of goodness of fit

Tests of independence

Risks and risk differences

Odds ratios and relative risks

Graphics

Weighted frequency and percent plots

Mosaic plots

Odds ratio, relative risk, and risk difference plots

Analyses

Linear regression model fitting

Regression coefficients

Covariance matrices

Confidence limits

Hypothesis tests

Estimable functions

Contrasts

Least squares means (LS-means) of effects

Custom hypothesis tests among LS-means

Regression with constructed effects

Predicted values and residuals

Domain analysis

Analyses

Cumulative logit regression model fitting

Logit, probit, and complementary log-log link functions

Generalized logit regression model fitting

Regression coefficients

Covariance matrices

Confidence limits

Hypothesis tests

Odds ratios

Estimable functions

Contrasts

Least squares means (LS-means) of effects

Custom hypothesis tests among LS-means

Regression with constructed effects

Model diagnostics

Domain analysis

Analyses

Proportional hazards regression model fitting

Breslow and Efron likelihoods

Regression coefficients

Covariance matrices

Confidence limits

Hypothesis tests

Hazard ratios

Contrasts

Predicted values and standard errors

Martingale, Schoenfeld, score, and deviance residuals

Domain analysis