The DATASOURCE Procedure

Example 12.2 BLS Consumer Price Index Surveys

This example compares changes of the prices in medical care services with respect to different regions for all urban consumers (SURVEY=’CU’) since May 1975. The source of the data is the Consumer Price Index Surveys distributed by the U.S. Department of Labor, Bureau of Labor Statistics.

An initial run of PROC DATASOURCE gives the descriptive information on different regions available (the OUTBY= data set), as well as the series variable name corresponding to medical care services (the OUTCONT= data set).

options yearcutoff = 1900;

filename datafile "%sysget(DATASRC_DATA)blscpi1.data" recfm=v lrecl=152;
proc datasource filetype=blscpi
   interval=mon
   outselect=off
   outby=cpikey(where=( upcase(areaname) 
                   in ('NORTHEAST','NORTH CENTRAL','SOUTH','WEST')) )
   outcont=cpicont(where= ( index( upcase(label), 'MEDICAL CARE' )) );
   where survey='CU';
run;

title1 'OUTBY= Data Set, By AREANAME Selection';
proc print
   data=cpikey;
run;

title1 'OUTCONT= Data Set, By LABEL Selection';
proc print
   data=cpicont;
run;

The OUTBY= data set in Output 12.2.1 lists all cross sections available for the four geographical regions: Northeast (AREA=’0100’), North Central (AREA=’0200’), Southern (AREA=’0300’), and Western (AREA=’0400’). The OUTCONT= data set in Output 12.2.2 gives the variable names for medical care related series.

Output 12.2.1: Partial Listings of the OUTBY= Data Set

OUTBY= Data Set, By AREANAME Selection

Obs SURVEY SEASON AREA BASPTYPE BASEPER BYSELECT ST_DATE END_DATE NTIME NOBS NSERIES NSELECT SURTITLE AREANAME
1 CU U 0200 S 1982-84=100 1 DEC1977 JUL1990 152 152 2 2 ALL URBAN CONSUM NORTH CENTRAL
2 CU U 0100 S 1982-84=100 1 . . . 0 0 0 ALL URBAN CONSUM NORTHEAST
3 CW U 0400 S 1982-84=100 0 DEC1977 JUL1990 152 0 1 0 URBAN WAGE EARN WEST
4 CW U 0100 S 1982-84=100 0 . . . 0 0 0 URBAN WAGE EARN NORTHEAST
5 CW U 0200 S 1982-84=100 0 . . . 0 0 0 URBAN WAGE EARN NORTH CENTRAL


Output 12.2.2: Partial Listings of the OUTCONT= Data Set

OUTCONT= Data Set, By LABEL Selection

Obs NAME SELECTED TYPE LENGTH VARNUM LABEL FORMAT FORMATL FORMATD
1 ASL5 1 1 5 . SERVICES LESS MEDICAL CARE   0 0
2 A512 1 1 5 . MEDICAL CARE SERVICES   0 0
3 A0L5 0 1 5 . ALL ITEMS LESS MEDICAL CARE   0 0


The following statements make use of this information to extract the data for A512 and descriptive information on cross sections containing A512. Output 12.2.3 and Output 12.2.4 show these results.

options yearcutoff = 1900;

filename datafile "%sysget(DATASRC_DATA)blscpi1.data" recfm=v lrecl=152;

proc format;
   value $areafmt '0100' = 'Northeast Region'
                  '0200' = 'North Central Region'
                  '0300' = 'Southern Region'
                  '0400' = 'Western Region';
run;

proc datasource filetype=blscpi interval=month
             out=medical outall=medinfo;
   where survey='CU' and area in ( '0100','0200','0300','0400' );
   keep date a512;
   range  from 1988:9;
   format area $areafmt.;
   rename a512=medcare;
run;

title1 'Information on Medical Care Service, OUTALL= Data Set';
proc print
   data=medinfo;
run;

title1 'Medical Care Service By Region, OUT= Data Set';
title2 'Range from September, 1988';
proc print
   data=medical;
run;

Output 12.2.3: Printout of the OUTALL= Data Set

Information on Medical Care Service, OUTALL= Data Set

Obs SURVEY SEASON AREA BASPTYPE BASEPER BYSELECT NAME KEPT SELECTED TYPE LENGTH VARNUM BLKNUM LABEL FORMAT FORMATL FORMATD ST_DATE END_DATE NTIME NOBS NINRANGE SURTITLE AREANAME S_CODE UNITS NDEC
1 CU U North Central Region S 1982-84=100 1 medcare 1 1 1 5 7 50 MEDICAL CARE SERVICES   0 0 DEC1977 JUL1990 152 152 23 ALL URBAN CONSUM NORTH CENTRAL CUUR0200SA512   1


Output 12.2.4: Printout of the OUT= Data Set

Medical Care Service By Region, OUT= Data Set
Range from September, 1988

Obs SURVEY SEASON AREA BASPTYPE BASEPER DATE medcare
1 CU U North Central Region S 1982-84=100 SEP1988 1364
2 CU U North Central Region S 1982-84=100 OCT1988 1365
3 CU U North Central Region S 1982-84=100 NOV1988 1368
4 CU U North Central Region S 1982-84=100 DEC1988 1372
5 CU U North Central Region S 1982-84=100 JAN1989 1387
6 CU U North Central Region S 1982-84=100 FEB1989 1399
7 CU U North Central Region S 1982-84=100 MAR1989 1405
8 CU U North Central Region S 1982-84=100 APR1989 1413
9 CU U North Central Region S 1982-84=100 MAY1989 1416
10 CU U North Central Region S 1982-84=100 JUN1989 1425
11 CU U North Central Region S 1982-84=100 JUL1989 1439
12 CU U North Central Region S 1982-84=100 AUG1989 1452
13 CU U North Central Region S 1982-84=100 SEP1989 1460
14 CU U North Central Region S 1982-84=100 OCT1989 1473
15 CU U North Central Region S 1982-84=100 NOV1989 1481
16 CU U North Central Region S 1982-84=100 DEC1989 1485
17 CU U North Central Region S 1982-84=100 JAN1990 1500
18 CU U North Central Region S 1982-84=100 FEB1990 1516
19 CU U North Central Region S 1982-84=100 MAR1990 1528
20 CU U North Central Region S 1982-84=100 APR1990 1538
21 CU U North Central Region S 1982-84=100 MAY1990 1548
22 CU U North Central Region S 1982-84=100 JUN1990 1557
23 CU U North Central Region S 1982-84=100 JUL1990 1573


The OUTALL= data set in Output 12.2.3 indicates that data values are stored with one decimal place (see the NDEC variable). Therefore, they need to be rescaled, as follows:

   data medical;
      set medical;
      medcare = medcare * 0.1;
   run;

This example illustrates the following features:

  • Descriptive information needed to write KEEP and WHERE statements can be obtained with an initial run of the DATASOURCE procedure.

  • The OUTCONT= and OUTALL= data sets contain information on how data values are stored, such as the precision, the units, and so on.

  • The OUTCONT= and OUTALL= data sets report the new series names assigned by the RENAME statement, not the old names (see the NAME variable in Output 12.2.3).

  • You can use PROC FORMAT to define formats for series or BY variables to enhance your output. Note that PROC DATASOURCE associates a permanent format, $AREAFMT., with the BY variable AREA. As a result, the formatted values are displayed in the printout of the OUTALL=MEDINFO data set (see Output 12.2.3).