Usage Note 22600: Distinction between the WEIGHT and FREQ statements
The FREQ statement effectively treats each observation in the input data set as though it occurred the number of times indicated by its value of the FREQ variable. When fitting a model, you can use the FREQ statement to enter summarized data containing only the unique, observed combinations of independent variable settings along with a variable indicating the number
of times each occurred in the raw data. For instance, the FREQ statement can be used when the observations in your data are the cell counts of a contingency table as opposed to observations for each subject. Note that because the FREQ variable is intended to indicate counts, the values are required to be integers. Noninteger values are truncated to integers.
Use the WEIGHT statement when you want the observations to contribute unequally in the fitting of the model. For a procedure using the maximum likelihood (ML) estimation method, the weights multiply the contributions of observations to the likelihood function that is maximized
by the procedure. In logistic models, weights are often used to adjust for
oversampling (sampling with respect to the response levels). While weighting can also be used to adjust the parameter estimates for sampling of populations defined by the predictor variables, it does not assure proper estimation of their standard errors. Survey sampling methods are required for proper variance estimation. These methods are only available in the SURVEY procedures such as the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures.
Note that the WEIGHT statement does not change the sample size (N) that is used in formulas for some statistics. The FREQ statement does affect N as noted above. Unlike the FREQ variable whose values must be integers, weights can be fractional. Observations with zero or negative FREQ or WEIGHT values are ignored by modeling procedures.
In many cases, using a given integer-valued variable as a WEIGHT variable or as a FREQ variable results in the same parameter estimates and standard errors (this is not true for GEE models fit in PROC GENMOD using the REPEATED statement). However, other statistics such as the Schwartz criterion (SC or BIC) might differ if N is involved in their computation. For example, in the Hosmer-Lemeshow test that is produced by the LACKFIT option in PROC LOGISTIC, weights affect the statistic only through their effect on the model parameter estimates, whereas frequencies affect the observed and total counts in the statistic's formula. Consequently, a variable that is used as a FREQ variable can result in a dramatically different LACKFIT statistic than if it is used as a WEIGHT variable.
In some modeling procedures such as PROC LOGISTIC and PROC GENMOD, arbitrarily inflating the values of a FREQ or WEIGHT variable (for instance, multiplying them by a constant) drives all effects toward significance. For this reason, weights are typically normalized so that they sum to the actual sample size. You can request normalizing of your weights by using the NORMALIZE option in the WEIGHT statement in the LOGISTIC or PHREG procedure.
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Usage Note |
Priority: | low |
Topic: | Analytics ==> Regression SAS Reference ==> Procedures ==> SURVEYLOGISTIC Analytics ==> Longitudinal Analysis SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> PHREG SAS Reference ==> Procedures ==> PROBIT Analytics ==> Categorical Data Analysis Analytics ==> Analysis of Variance SAS Reference ==> Procedures ==> ANOVA SAS Reference ==> Procedures ==> GLM SAS Reference ==> Procedures ==> REG Analytics ==> Survey Sampling and Analysis Analytics ==> Survival Analysis
|
Date Modified: | 2010-09-10 11:24:29 |
Date Created: | 2002-12-16 10:56:39 |