The DATASOURCE Procedure

Variable Lists

Variable lists used in PROC DATASOURCE statements can consist of any combination of variable names and name range specifications. Items in variable lists can have the following forms:

  • a name, such as PZU.

  • an alphabetic range name1-name2. For example, A-DZZZZZZZ specifies all variables with names starting with A, B, C, or D.

  • a prefix range prefix :. For example, IP: selects all variables with names starting with the letters IP.

  • an order range name1–name2. For example, GLR72–GLRD72 specifies all variables in the input data file between GLR72 and GRLD72 inclusive.

  • a numeric order range name1-NUMERIC-name2. For example, GLR72-NUMERIC-GLRD72 specifies all numeric variables between GLR72 and GRLD72 inclusive.

  • a character order range name1-CHARACTER-name2. For example, GLR72-CHARACTER-GLRD72 specifies all character variables between GLR72 and GRLD72 inclusive.

  • one of the keywords _NUMERIC_, _CHARACTER_, or _ALL_. The keyword _NUMERIC_ specifies all numeric variables, _CHARACTER_ specifies all character variables, and _ALL_ specifies all variables.

To determine the order of series in a data file, run PROC DATASOURCE with the OUTCONT= option, and print the output data set. Note that order and alphabetic range specifications are inclusive, meaning that the beginning and ending names of the range are also included in the variable list.

For order ranges, the names used to define the range must actually name variables in the input data file. For alphabetic ranges, however, the names used to define the range need not be present in the data file.

Note that variable specifications are applied to each cross section independently. This may cause the order-range variable list specification to behave differently than its DATA step and data set option counterparts. This is because PROC DATASOURCE knows which variables are defined for which cross sections, while the DATA step applies order range specification to the whole collection of time series variables.

If the ending variable name in an order range specification is not in the current cross section, all variables starting from the beginning variable to the last variable defined in that cross section get selected. If the first variable is not in the current cross section, then order range specification has no effect for that cross section.

The variable names used in variable list specifications can refer either to series names appearing in the input data file or to the SAS names assigned to series data fields internally if the series names are not recorded to the INFILE= file. When the latter is the case, internally defined variable names are listed in Data Elements Reference: DATASOURCE Procedure later in this chapter.

The following are examples of the use of variable lists:

   keep  ip: pw112-pw117 pzu;
   drop  data1-data99  data151-data350;
   length data1-numeric-aftnt350  ucode 4;

The first statement keeps all the variables starting with IP:, all the variables between PW112 and PW117 including PW112 and PW117 themselves, and a single variable PZU. The second statement drops all the variables that fall alphabetically between DATA1 and DATA99, and between DATA151 and DATA350. Finally, the third statement assigns a length of 4 bytes to all the numeric variables defined between DATA1 and AFTNT350, and UCODE. Variable lists can not exceed 200 characters in length.