FOCUS AREAS

SAS/STAT Topics

SAS/STAT Software

Survey Analysis

Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Due to variability among items, researchers apply scientific probability-based designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. The survey analysis procedures in SAS/STAT software properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or single-stage designs, with or without stratification, and with or without unequal weighting.

The SAS/STAT survey analysis procedures include the following:

SURVEYMEANS Procedure


The SURVEYMEANS procedure estimates characteristics of a survey population by using statistics computed from a survey sample. It enables you to estimate statistics such as means, totals, proportions, quantiles, geometric means, and ratios. The following are highlights of the SURVEYMEANS procedure's features:

  • provides domain analysis, which computes estimates for subpopulations or domains
  • estimates variances and confidence limits and performs t tests for these statistics
  • computes variances of the parameters by using the following methods:
    • Taylor series (linearization)
    • balanced repeated replication (BRR)
    • delete-1 jackknife
  • enables you to employ Fay's method with BRR
  • performs poststratification
  • enables you to input or output a SAS data set containing a Hadamard matrix for BRR
  • enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
  • creates a SAS data set that contains the jackknife coefficients
  • performs BY group processing, which enables you to obtain separate analyses on gouped observations (distinct from subpopulation analysis)
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS graphics
For further details, see SURVEYMEANS Procedure

SURVEYFREQ Procedure


The SURVEYFREQ procedure produces one-way to n-way frequency and crosstabulation tables from complex multistage survey designs with stratification, clustering, and unequal weighting. The following are highlights of the SURVEYFREQ procedure's features:

  • produces tables of population totals, population proportions, and their standard errors
  • computes confidence limits, coefficients of variation, and design effects
  • provides a variety of options to customize the table display
  • provides Rao-Scott chi-square goodness-of-fit tests, which are adjusted for the sample design, for one-way frequency tables
  • produces simple and weighted kappa coefficients
  • enables you to test a null hypothesis of equal proportions for a one-way frequency table or input a custom null hypothesis proportions for the test
  • provides design-adjusted tests of independence or no association, between the row and column variables for two-way tables. These tests include the following:
    • Rao-Scott chi-square test
    • Rao-Scott likelihood ratio test
    • Wald chi-square test
    • Wald log-linear chi-square test
  • computes estimates and confidence limits for risks (or row proportions), the risk difference, the odds ratio, and relative risks for 2x2 tables
  • computes variances of the estimated parameters by using the following methods:
    • Taylor series (linearization)
    • balanced repeated replication (BRR)
    • delete-1 jackknife
  • enables you to employ Fay's method with BRR
  • enables you to input or output a SAS data set containing a Hadamard matrix for BRR
  • enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
  • creates a SAS data set that contains the jackknife coefficients
  • provides analysis for subpopulations, or domains, in addition to analysis for the entire study population
  • calculates design effects for each overall proportion estimates in frequency and crosstabulation tables
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see SURVEYFREQ Procedure

SURVEYIMPUTE Procedure


The SURVEYIMPUTE procedure imputes missing values of an item in a data set by replacing them with observed values from the same item. The principles by which the imputation is performed are particularly useful for survey data. The following are highlights of the SURVEYIMPUTE procedure's features:

  • fully efficient fractional hot-deck imputation
  • traditional hot-deck imputation with the following donor selection methods
    • approximate Bayesian bootstrap
    • simple random samples without replacement
    • simple random samples with replacement
    • probability proportional to respondent weights with replacement
  • computes imputation-adjusted replicate weights
  • computes imputation-adjusted balanced repeated replication (BRR) weights
  • computes imputation-adjusted jackknife weights
  • provides a CELLS statement which names the variables that identify the imputation cells
  • imputes variables jointly or independently for the fully efficient fractional imputation method
  • creates a SAS data set that contains the imputed data
For further details, see SURVEYIMPUTE Procedure

SURVEYLOGISTIC Procedure


The SURVEYLOGISTIC procedure fits linear logistic regression models for discrete response survey data by the method of maximum likelihood. For statistical inferences, PROC SURVEYLOGISTIC incorporates complex survey sample designs, including designs with stratification, clustering, and unequal weighting. The following are highlights of the SURVEYLOGISTIC procedure's features:

  • fits models with binary, ordinal, or nominal dependent variables with the following link functions:
    • logit
    • probit
    • complementary log-log
    • generalized logit
  • computes variances of the regression parameters and odds ratios by using the following methods:
    • Taylor series (linearization)
    • balanced repeated replication (BRR)
    • delete-1 jackknife
  • enables you to employ Fay's method with BRR
  • enables you to input or output a SAS data set containing a Hadamard matrix for BRR
  • enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
  • creates a SAS data set that contains the jackknife coefficients
  • provides analysis for subpopulations, or domains, in addition to analysis for the entire study population
  • enables you to control the ordering of the response categories
  • computes a generalized R2 measure for the fitted model
  • tests linear hypotheses about the regression parameters
  • enables you to specify units of change for continuous explanatory variables so that customized odds ratios can be estimated
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
  • creates a data set that contains the variables in the input data set, the estimated linear predictors and their standard error estimates, the estimates of the cumulative or individual response probabilities, and the confidence limits for the cumulative probabilities
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see SURVEYLOGISTIC Procedure

SURVEYPHREG Procedure


The SURVEYPHREG procedure performs regression analysis based on the Cox proportional hazards model for sample survey data. Cox's semiparametric model is widely used in the analysis of survival data to estimate hazard rates when adequate explanatory variables are available. The following are highlights of the SURVEYPHREG procedure's features:

  • computes hazard ratios estimates
  • computes variances of the regression parameters by using the following methods:
    • Taylor series (linearization)
    • balanced repeated replication (BRR)
    • delete-1 jackknife
  • produces the following observation-level output statistics:
    • predicted values and their standard errors
    • martingale residuals
    • Schoenfeld residuals
    • score residuals
    • deviance residuals
  • enables you to employ Fay's method with BRR
  • enables you to input or output a SAS data set containing a Hadamard matrix for BRR
  • enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
  • provides analysis for subpopulations, or domains, in addition to analysis for the entire study population
  • supports programming statements that enable you to include time-dependent covariates in the model
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
  • enables you to test linear hypotheses about the regression parameters
  • enables you to estimate a linear function of the regression parameters
  • creates a SAS data set that contains the estimated linear predictors and their standard error estimates, the residuals from the linear regression, and the confidence limits for the predictors
  • creates a SAS data set that contains the jackknife coefficients
  • saves the context and results in an item store that can be processed with the PLM procedure
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see SURVEYPHREG Procedure

SURVEYREG Procedure


The SURVEYREG procedure performs regression analysis for sample survey data. This procedure can handle complex survey sample designs, including designs with stratification, clustering, and unequal weighting. The procedure fits linear models for survey data and computes regression coefficients and their variance-covariance matrix. The following are highlights of the SURVEYREG procedure's features:

  • computes the regression coefficient estimators by generalized least squares estimation using elementwise regression
  • computes variances of the regression parameters by using the following methods:
    • Taylor series (linearization)
    • balanced repeated replication (BRR)
    • delete-1 jackknife
  • enables you to employ Fay's method with BRR
  • enables you to input or output a SAS data set containing a Hadamard matrix for BRR
  • enables you to import or export SAS data sets containing replicate weights for BRR or jackknife methods
  • creates a SAS data set that contains the jackknife coefficients
  • provides analysis for subpopulations, or domains, in addition to analysis for the entire study population
  • calculates design effects for the regression coefficients
  • enables you to test linear hypotheses about the regression parameters
  • enables you to estimate a linear function of the regression parameters
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations (distinct from subpopulation analysis)
  • creates a SAS data set that contains the estimated linear predictors and their standard error estimates, the residuals from the linear regression, and the confidence limits for the predictors
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see SURVEYREG Procedure

SURVEYSELECT Procedure


The SURVEYSELECT procedure provides a variety of methods for selecting probability-based random samples. The procedure can select a simple random sample or can sample according to a complex multistage sample design that includes stratification, clustering, and unequal probabilities of selection. With probability sampling, each unit in the survey population has a known, positive probability of selection. This property of probability sampling avoids selection bias and enables you to use statistical theory to make valid inferences from the sample to the survey population. The following are highlights of the SURVEYSELECT procedure's features:

  • selects the sample and produces an output data set that contains the selected units, their selection probabilities, and their sampling weights
  • provides methods for both equal probability sampling and probability proportional to size (PPS) sampling
  • provides the following equal probability sampling methods:
    • simple random sampling
    • unrestricted random sampling (with replacement)
    • systematic random sampling
    • sequential random sampling
    • Bernoulli
  • provides the following unequal probability sampling methods:
    • Poisson sampling
  • provides the following probability proportional to size (PPS) methods:
    • PPS sampling without replacement
    • PPS sampling with replacement
    • PPS systematic sampling
    • PPS algorithms for selecting two units per stratum
    • sequential PPS sampling with minimum replacement
  • performs stratified sampling by selecting samples independently within the specified strata, or nonoverlapping subgroups of the survey population
  • enables you to sort by control variables within strata for the additional control of implicit stratification when using a systematic or sequential selection method
  • provides survey design methods to allocate the total sample size among the strata
  • provides the following allocation methods: proportional, Neyman, and optimal allocation
  • provides replicated sampling, where the total sample is composed of a set of replicates, and each replicate is selected in the same way
  • enables you to randomly assign the observations in the input data set to groups
For further details, see SURVEYSELECT Procedure