The DATASOURCE Procedure

Selecting Time Series Variables – The KEEP and DROP Statements

If you want to include specific series in the OUT= data set, list them in a KEEP statement. If, on the other hand, you want to exclude some variables from the OUT= data set, list them in a DROP statement. For example, the following statements extract monthly foreign exchange rates for Japan (EXRJAN), Switzerland (EXRSW), and the United Kingdom (EXRUK) from a DRIBASIC file CITIFILE:

   proc datasource filetype=dribasic infile=citifile
                   interval=month  out=dataset;
      keep  exrjan exrsw exruk;
   run;

The KEEP statement also allows input names to be quoted strings. If the name of a series in the input file contains blanks or special characters that are not valid SAS name syntax, put the series name in quotes to select it. Another way to allow the use of special characters in your SAS variable names is to use the SAS options statement to designate VALIDVARNAME=ANY. This option will allow PROC DATASOURCE to include special characters in your SAS variable names. The following is an example of extracting series from a FAME database by using the DATASOURCE procedure.

   proc datasource filetype=fame dbname='fame_nys /disk1/prc/prc'
                   interval=weekday out=outds outcont=attrds;
      range '1jan90'd to '1feb90'd;
      keep cci.close
           '{ibm.high,ibm.low,ibm.close}'
           'mave(ibm.close,30)'
           'crosslist({gm,f,c},{volume})'
           'cci.close+ibm.close';
      rename 'mave(ibm.close,30)' = ibm30day
             'cci.close+ibm.close' = cci_ibm;
   run;

The resulting output data set OUTDS contains the following series: DATE, CCI_CLOS, IBM_HIGH, IBM_LOW, IBM_CLOS, IBM30DAY, GM_VOLUM, F_VOLUME, C_VOLUME, CCI_IBM.

Obviously, to be able to use KEEP and DROP statements, you need to know the name of time series variables available in the data file. The OUTCONT= option gives you this information. More specifically, the OUTCONT= option creates a data set containing descriptive information on the same frequency time series. This descriptive information includes series names, a flag indicating if the series is selected for output, series variable types, lengths, position of series in the OUT= data set, labels, format names, format lengths, format decimals, and a set of FILETYPE= specific descriptor variables.

For example, the following statements list some of the monthly series available in the CITIFILE and are shown in Figure 12.1.

/*-- Selecting Time Series Variables -- The KEEP and DROP Statements --*/
filename citifile "%sysget(DATASRC_DATA)citiaf.dat" RECFM=F LRECL=80;
proc datasource filetype=dribasic infile=citifile
                interval=month  outcont=vars;
   drop e: ;
run;

title1 'Some Time Series Variables Available in CITIFILE';
proc print data=vars;
run;

Figure 12.1: Listing of the OUTCONT= Data Set

Some Time Series Variables Available in CITIFILE

Obs NAME KEPT SELECTED TYPE LENGTH VARNUM LABEL FORMAT FORMATL FORMATD CODE
1 BUS 1 1 1 5 . INDEX OF NET BUSINESS FORMATION, (1967=100;SA)   0 0 BUS
2 CCBPY 1 1 1 5 . RATIO, CONSUMER INSTAL CREDIT TO PERSONAL INCOME (%,SA)(BCD-95)   0 0 CCBPY
3 CCI30M 1 1 1 5 . CONSUMER INSTAL.LOANS: DELINQUENCY RATE,30 DAYS & OVER, (%,SA)   0 0 CCI30M
4 CCIPY 1 1 1 5 . RATIO, CONSUMER INSTAL CREDIT TO PERSONAL INCOME (%,SA)(BCD-95)   0 0 CCIPY
5 COCI77 1 1 1 5 . CONSTRUCTION COST INDEX: DEPT OF COMMERCE COMPOSITE(1977=100,NSA)   0 0 COCI77
6 CONU 1 1 1 5 . CONSTRUCT.PUT IN PLACE: PRIV NEW HOUSING UNITS (MIL$,SAAR)   0 0 CONU
7 DLEAD 1 1 1 5 . COMPOSITE INDEX OF 12 LEADING INDICATORS(67=100,SA)   0 0 DLEAD
8 F6CMB 1 1 1 5 . DEPOSITORY INST RESERVES: TOTAL BORROWINGS AT RES BANKS(MIL$,NSA)   0 0 F6CMB
9 F6EDM 1 1 1 5 . U.S.MDSE EXPORTS: MANUFACTURED GOODS (MIL$,NSA)   0 0 F6EDM
10 WTNO8 1 1 1 5 . MFG & TRADE SALES:MERCHANT WHOLESALERS,OTHR NONDUR GDS,82$   0 0 WTNO8
11 WTNR 1 1 1 5 . MERCHANT WHOLESALERS' SALES: NONDURABLE GOODS (MIL$,SA)   0 0 WTNR
12 WTR 1 1 1 5 . MERCHANT WHOLESALERS' SALES: TOTAL (MIL$,SA)   0 0 WTR