Missing Values |
Missing values in your survey data can compromise the quality of your survey results. Some missing values for survey data are because of nonresponses. An observation whose response to every survey item is available is called a complete respondent, and an observation whose response to one or more survey items are missing is called an incomplete respondent. If the complete respondents are different from the incomplete respondents with regard to a survey effect or outcome, then survey estimates will be biased and will not accurately represent the survey population. There are a variety of techniques in sample design and survey operations that can reduce nonresponse. After data collection is complete, you can use imputation to replace missing values with acceptable values, and you can use sampling weight adjustments to compensate for nonresponse. You should complete this data preparation and adjustment before you analyze your data with PROC SURVEYPHREG. See, for example, Cochran (1977), Kalton and Kaspyzyk (1986), and Brick and Kalton (1996) for more details.
If an observation has a missing value or a nonpositive value for the WEIGHT variable, then PROC SURVEYPHREG excludes that observation from the analysis.
If you provide replicate weights with a REPWEIGHTS statement for BRR or jackknife variance estimation, all REPWEIGHTS variable values must be nonmissing. Similarly, if you provide jackknife coefficients with the JKCOEFS= option in the REPWEIGHTS statement, all values of the JKCoefficient variable must be nonmissing. The procedure does not perform the analysis when any replicate weight or jackknife coefficient value is missing.
An observation is excluded from the analysis if it has a missing value for any CLASS, STRATA, CLUSTER, or DOMAIN variable, unless you specify the MISSING option in the PROC SURVEYPHREG statement. If you specify the MISSING option, the procedure treats missing values as a valid (nonmissing) category for all categorical variables, which include STRATA variables, CLUSTER variables, CLASS variables, and DOMAIN variables.
By default, PROC SURVEYPHREG excludes an observation from the likelihood estimation and all associated analyses if the observation has a missing value for any of the variables in the MODEL statement, unless you specify the MISSING or NOMCAR option in the PROC SURVEYPHREG statement. When the procedure excludes observations with missing values from analyses, it displays the total frequency of observations used in the "NObs" table.
If you specify the MISSING option, the procedure treats missing levels as a valid (nonmissing) level for each categorical analysis variable.
If you specify the NOMCAR option for Taylor series variance estimation, the procedure includes observations with missing values of analysis variables in the variance computations.
When you specify the NOMCAR option, PROC SURVEYPHREG computes variance estimates by analyzing the nonmissing values for variables in the regression model as a domain or subpopulation, where the entire population includes both nonmissing and missing domains. By default, if an observation contains missing values for the dependent variable or for any variable used in the independent effects, the observation is excluded from the analysis. See the section Missing Values for more information.
Note that the NOMCAR option has no effect on categorical predictors when you specify the MISSING option, which treats missing values as a valid nonmissing level. The NOMCAR option does not affect the inclusion of observations with missing values of the WEIGHT, FREQ, CLUSTER, STRATA, or DOMAIN variables. Observations with missing values of the WEIGHT and FREQ variables are always excluded from the analysis. Observations with missing values of the CLUSTER, DOMAIN, or STRATA variables are excluded unless you specify the MISSING option.
The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you request with the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option.
PROC SURVEYPHREG computes degrees of freedom to compute confidence limits and F statistics. The degrees of freedom computation depends on the variance estimation method that you request. See the section Degrees of Freedom for details. Missing values can affect the degrees of freedom computation.
The degrees of freedom can depend on the number of clusters, the number of strata, and the number of observations. For Taylor series variance estimation, these numbers are based on the observations included in the analysis. These numbers do not count observations that are excluded from the analysis due to missing values. If all values in a stratum are excluded from the analysis as missing values, then that stratum is called an empty stratum. Empty strata are not counted in the total number of strata for the analysis. Similarly, empty clusters and missing observations are not included in the totals counts of clusters and observations that are used to compute the degrees of freedom for the analysis.
If you specify the MISSING option, missing values are treated as valid nonmissing levels and are included in computing degrees of freedom. If you specify the NOMCAR option for Taylor series variance estimation, observations with missing values for variables in the regression model are included in computing degrees of freedom.
For BRR or jackknife variance estimation, by default PROC SURVEYPHREG computes the degrees of freedom by using all valid observations in the input data set. A valid observation is an observation that has a positive value of the WEIGHT variable and nonmissing values of the STRATA and CLUSTER variables unless you specify the MISSING option.