|Generic Collector Appendix 1: Algorithm Used by GENERATE SOURCE|
The control statement GENERATE SOURCE uses model data as the basis for the table and variable definitions that it generates. As the table and variable statements are constructed, certain assumptions are made by the rules embedded in the GENERATE SOURCE logic. By understanding the algorithm used by GENERATE SOURCE and building your model data set to take advantage of the algorithm, you can decrease the amount of work that you need to do to customize the table and variable characteristics that are assigned by default.
The algorithm determines the specific interpretation type for each variable. This interpretation type determines how a variable is to be displayed, what summary statistics to collect, and how these summary statistics are to be calculated. For more information about interpretation types, see Shared Appendix 6: Characteristics of Variables.
The determination of the interpretation type is performed in a cascading series of IF. . .THEN. . .ELSE. . . tests, until an assignment is made. Therefore it is important to pay attention to the order in which the heuristics operate, as follows:
First, the algorithm handles the simple case of character-type data, as follows:
if the variable's SAS display format contains the string 'HEX' then set the interpretation type to HEXFLAGS else set the interpretation type to STRING
For numeric-type data, the algorithm is far more complicated. It is very important to remember that the IF. . .THEN. . .ELSE. . . tests start at the beginning of these lists and go to the end of these lists. For example, any match found in a test late in the lists has already been a non-match in all the tests above it in the lists.
First the variable's SAS display format is examined to try to set the interpretation type, as follows:
|IF the variable format contains the string||THEN set the interpretation type to|
If this does not yield a match, the algorithm then begins to examine the variable's label. This is done with special consideration to strings being in the beginning or end of words, hence the use of the blank (' ') as part of the search order. The complete algorithm for this stage is as follows (where each common right-column value indicates a new ELSE IF):
|IF the variable label contains the word or phrase||THEN set the interpretation type to|
If the algorithm has still not assigned an interpretation type, then the interpretation type is set to GAUGE.