If an observation has a missing or nonpositive value for the WEIGHT variable, PROC SURVEYFREQ excludes that observation from the analysis.
If you provide replicate weights by specifying a REPWEIGHTS
statement, all REPWEIGHTS variable values must be nonmissing. Similarly, if you provide jackknife coefficients by specifying
the JKCOEFS=
option in the REPWEIGHTS statement, all values of the JKCoefficient
variable must be nonmissing. If any replicate weight or jackknife coefficient is missing, PROC SURVEYFREQ does not perform
the analysis.
If an observation has a missing value for any STRATA or CLUSTER variable, PROC SURVEYFREQ excludes that observation from the analysis unless you specify the MISSING option in the PROC SURVEYFREQ statement. If you specify the MISSING option, PROC SURVEYFREQ treats missing values as a valid (nonmissing) category for all categorical variables, which include STRATA, CLUSTER, and TABLES variables.
If an observation has a missing value for any variable in the TABLES request, PROC SURVEYFREQ excludes that observation from the crosstabulation table (and all associated analyses) unless you specify the MISSING or NOMCAR option in the PROC SURVEYFREQ statement. When the procedure excludes observations with missing values from a table, it displays the total frequency of missing observations below the table.
If you specify the MISSING option, PROC SURVEYFREQ treats missing values as a valid (nonmissing) level for each TABLES variable. The procedure displays these levels in the crosstabulation table and includes them in the computation of totals, percentages, and all other table statistics.
If you specify the NOMCAR option in the PROC SURVEYFREQ statement for Taylor series variance estimation, the procedure includes observations with missing values of TABLES variables in the variance computations. The NOMCAR option does not display missing levels in the crosstabulation table or compute percentages and totals for missing levels.
The NOMCAR option in the PROC SURVEYFREQ statement includes observations with missing values of TABLES variables in the variance computations as not missing completely at random (NOMCAR) for Taylor series variance estimation. By default, observations are completely excluded from the analysis if they have missing values for any of the variables in the current TABLES request. This default treatment is based on the assumption that the values are missing completely at random (MCAR), and assumes that the analysis results should not be substantially different between the missing and nonmissing groups. For more information, see the section Analysis Considerations.
When you specify the NOMCAR option, PROC SURVEYFREQ computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains.
The NOMCAR option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level. The NOMCAR option does not affect the inclusion of observations with missing values of the WEIGHT , CLUSTER , or STRATA variables. Observations with missing values of the WEIGHT variable are always excluded from the analysis. Observations with missing values of the CLUSTER or STRATA variables are excluded unless you specify the MISSING option.
The NOMCAR option applies only to Taylor series variance estimation VARMETHOD=TAYLOR . The replication methods, which you can request by specifying the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option.
PROC SURVEYFREQ computes the degrees of freedom to obtain the t-percentile for confidence limits for proportions, totals, and other statistics. The procedure also uses the degrees of freedom for the F statistics in the Rao-Scott and Wald chi-square tests. The degrees of freedom computation depends on the sample design and the variance estimation method. For more information, see the section Degrees of Freedom. Missing values can affect the degrees of freedom computation.
The degrees of freedom can depend on the number of clusters, the number of strata, and the number of observations. For Taylor series variance estimation, these numbers are based on the observations included in the analysis of the individual table. These numbers do not count observations that are excluded from the table due to missing values. If all values in a stratum are excluded from the analysis of a table as missing values, then that stratum is called an empty stratum. Empty strata are not counted in the total number of strata for the table. Similarly, empty clusters and missing observations are not included in the total counts of clusters and observations that are used to compute the degrees of freedom for the analysis.
If you specify the MISSING option, missing values are treated as valid nonmissing levels and are included in computing degrees of freedom. If you specify the NOMCAR option for Taylor series variance estimation, observations with missing values of the TABLES variables are included in computing degrees of freedom.
For BRR or jackknife variance estimation, by default PROC SURVEYFREQ computes the degrees of freedom by using all valid observations in the input data set. A valid observation is an observation that has a positive value of the WEIGHT variable and nonmissing values of the STRATA and CLUSTER variables unless you specify the MISSING option. For information about valid observations, see the section Data Summary Table.
If you specify the DFADJ method-option for VARMETHOD=BRR or VARMETHOD=JACKKNIFE , the procedure computes the degrees of freedom based on the nonmissing observations included in the individual table request. This excludes any empty strata or clusters that occur when observations with missing values of the TABLES variables are removed from the analysis for that table.
For each table request, PROC SURVEYFREQ produces a nondisplayed ODS table, "Table Summary," which contains the number of (nonmissing) observations, strata, and clusters that are included in the analysis of the individual table. If there are missing observations, empty strata, or empty clusters excluded from the analysis, the "Table Summary" data set also contains this information. If you request any confidence limits or chi-square tests for the table, which require degrees of freedom, the "Table Summary" data set provides the degrees of freedom.
Due to missing values, the number of observations used for an individual table analysis can differ from the number of valid observations in the input data set, which is reported in the "Data Summary" table. Similarly, a difference can also occur for the number of clusters or strata. See Example 97.3 for more information about the "Table Summary" output data set.
If you specify the NOMCAR option for Taylor series variance estimation, the "Table Summary" data set reflects all observations used for variance estimation, which includes those observations with missing values of the TABLES variables.
If you have missing values in your survey data for any reason (such as nonresponse), this can compromise the quality of your survey results. An observation without missing values is called a complete respondent, and an observation with missing values is called an incomplete respondent. If the complete respondents are different from the incomplete respondents with regard to a survey effect or outcome, then survey estimates will be biased and will not accurately represent the survey population. There are a variety of techniques in sample design and survey operations that can reduce nonresponse. After data collection is complete, you can use imputation to replace missing values with acceptable values, and you can use sampling weight adjustments to compensate for nonresponse. You should complete this data preparation and adjustment before you analyze your data with PROC SURVEYFREQ. For more information, see Cochran (1977), Kalton and Kasprzyk (1986), and Brick and Kalton (1996).