The SURVEYIMPUTE Procedure

Missing Values

You might have missing values in your data set for several reasons. Some common reasons are data entry error, ineligible items or units, and nonresponse. You should complete your data preparation (identify data entry error or ineligibility) and adjustment (fill in deterministic values or edits) before using the SURVEYIMPUTE procedure. Use PROC SURVEYIMPUTE to impute missing values that arise only from nonresponse. If you have observations that have missing values in variables that are not specified in the VAR statement, then the procedure does not impute those observations. The following subsections describe how PROC SURVEYIMPUTE treats missing values in the variables that are specified in some statements.

WEIGHT Statement Variable

If an observation has a missing value or a nonpositive value for the variable in the WEIGHT statement, then PROC SURVEYIMPUTE excludes that observation from the analysis. However, if you use the OUTPUT statement, the observation is included in the output data set.

REPWEIGHTS Statement Variables

If you provide replicate weights by using a REPWEIGHTS statement, the values for all variables in that statement must be nonmissing and nonnegative. PROC SURVEYIMPUTE does not perform the analysis when any replicate weight value is missing or nonpositive.

Variables in the CLUSTER and STRATA Statements

An observation is excluded from the analysis if it has a missing value for any variable in a CLUSTER or STRATA statement. However, if you use the OUTPUT statement, the observation is included in the output data set.

Variables in the CELLS Statement

An observation is excluded from the imputation if it has a missing value for any variable in the CELLS statement. However, if you specify the VARMETHOD=JK option in the PROC SURVEYIMPUTE statement, then the observation unit is used to create replicate weights, unless the observation unit has missing values in any of the variables in the STRATA , CLUSTER , or WEIGHT statement. If you use the OUTPUT statement, the observation is also included in the output data set.

Auxiliary Variables in the IMPJOINT Statement

Variables that you specify in the IMPJOINT statement but do not specify in the VAR statement are used as auxiliary variables in the imputation. If you have missing values in the auxiliary variables, then that observation unit is not used in the imputation. However, if you specify the VARMETHOD=JK option in the PROC SURVEYIMPUTE statement, then the observation unit is used to create replicate weights, unless the observation unit has missing values in any of the variables in the STRATA , CLUSTER , or WEIGHT statement. If you use the OUTPUT statement, the observation unit is also included in the output data set.

Variable in the ID Statement

If an observation unit has a missing value for the variable in the ID statement, then that observation is used in the imputation unless it also has missing values for variables in the STRATA , CLUSTER , or WEIGHT or CELLS statement. If the observation is selected as a donor unit for a recipient unit, then the donor identification for that recipient unit will also be missing.