The DATASOURCE Procedure |
This example compares changes of the prices in medical care services with respect to different regions for all urban consumers (SURVEY=’CU’) since May 1975. The source of the data is the Consumer Price Index Surveys distributed by the U.S. Department of Labor, Bureau of Labor Statistics.
An initial run of PROC DATASOURCE gives the descriptive information on different regions available (the OUTBY= data set), as well as the series variable name corresponding to medical care services (the OUTCONT= data set).
options yearcutoff = 1900; filename datafile 'blscpi1.data' recfm=v lrecl=152; proc datasource filetype=blscpi interval=mon outselect=off outby=cpikey(where=( upcase(areaname) in ('NORTHEAST','NORTH CENTRAL','SOUTH','WEST')) ) outcont=cpicont(where= ( index( upcase(label), 'MEDICAL CARE' )) ); where survey='CU'; run; title1 'OUTBY= Data Set, By AREANAME Selection'; proc print data=cpikey; run; title1 'OUTCONT= Data Set, By LABEL Selection'; proc print data=cpicont; run;
The OUTBY= data set in Output 11.2.1 lists all cross sections available for the four geographical regions: Northeast (AREA=’0100’), North Central (AREA=’0200’), Southern (AREA=’0300’), and Western (AREA=’0400’). The OUTCONT= data set in Output 11.2.2 gives the variable names for medical care related series.
OUTCONT= Data Set, By LABEL Selection |
Obs | SURVEY | SEASON | AREA | BASPTYPE | BASEPER | BYSELECT | ST_DATE | END_DATE | NTIME | NOBS | NSERIES | NSELECT | SURTITLE | AREANAME |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | CU | U | 0200 | S | 1982-84=100 | 1 | DEC1977 | JUL1990 | 152 | 152 | 2 | 2 | ALL URBAN CONSUM | NORTH CENTRAL |
2 | CU | U | 0100 | S | 1982-84=100 | 1 | . | . | . | 0 | 0 | 0 | ALL URBAN CONSUM | NORTHEAST |
3 | CW | U | 0400 | S | 1982-84=100 | 0 | DEC1977 | JUL1990 | 152 | 0 | 1 | 0 | URBAN WAGE EARN | WEST |
4 | CW | U | 0100 | S | 1982-84=100 | 0 | . | . | . | 0 | 0 | 0 | URBAN WAGE EARN | NORTHEAST |
5 | CW | U | 0200 | S | 1982-84=100 | 0 | . | . | . | 0 | 0 | 0 | URBAN WAGE EARN | NORTH CENTRAL |
The following statements make use of this information to extract the data for A512 and descriptive information on cross sections containing A512. Output 11.2.3 and Output 11.2.4 show these results.
options yearcutoff = 1900; filename datafile 'blscpi1.data' recfm=v lrecl=152; proc format; value $areafmt '0100' = 'Northeast Region' '0200' = 'North Central Region' '0300' = 'Southern Region' '0400' = 'Western Region'; run; proc datasource filetype=blscpi interval=month out=medical outall=medinfo; where survey='CU' and area in ( '0100','0200','0300','0400' ); keep date a512; range from 1988:9; format area $areafmt.; rename a512=medcare; run; title1 'Information on Medical Care Service, OUTALL= Data Set'; proc print data=medinfo; run; title1 'Medical Care Service By Region, OUT= Data Set'; title2 'Range from September, 1988'; proc print data=medical; run;
Medical Care Service By Region, OUT= Data Set |
Range from September, 1988 |
Obs | SURVEY | SEASON | AREA | BASPTYPE | BASEPER | BYSELECT | NAME | KEPT | SELECTED | TYPE | LENGTH | VARNUM | BLKNUM | LABEL | FORMAT | FORMATL | FORMATD | ST_DATE | END_DATE | NTIME | NOBS | NINRANGE | SURTITLE | AREANAME | S_CODE | UNITS | NDEC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | CU | U | North Central Region | S | 1982-84=100 | 1 | medcare | 1 | 1 | 1 | 5 | 7 | 50 | MEDICAL CARE SERVICES | 0 | 0 | DEC1977 | JUL1990 | 152 | 152 | 23 | ALL URBAN CONSUM | NORTH CENTRAL | CUUR0200SA512 | 1 |
Medical Care Service By Region, OUT= Data Set |
Range from September, 1988 |
Obs | SURVEY | SEASON | AREA | BASPTYPE | BASEPER | DATE | medcare |
---|---|---|---|---|---|---|---|
1 | CU | U | North Central Region | S | 1982-84=100 | SEP1988 | 1364 |
2 | CU | U | North Central Region | S | 1982-84=100 | OCT1988 | 1365 |
3 | CU | U | North Central Region | S | 1982-84=100 | NOV1988 | 1368 |
4 | CU | U | North Central Region | S | 1982-84=100 | DEC1988 | 1372 |
5 | CU | U | North Central Region | S | 1982-84=100 | JAN1989 | 1387 |
6 | CU | U | North Central Region | S | 1982-84=100 | FEB1989 | 1399 |
7 | CU | U | North Central Region | S | 1982-84=100 | MAR1989 | 1405 |
8 | CU | U | North Central Region | S | 1982-84=100 | APR1989 | 1413 |
9 | CU | U | North Central Region | S | 1982-84=100 | MAY1989 | 1416 |
10 | CU | U | North Central Region | S | 1982-84=100 | JUN1989 | 1425 |
11 | CU | U | North Central Region | S | 1982-84=100 | JUL1989 | 1439 |
12 | CU | U | North Central Region | S | 1982-84=100 | AUG1989 | 1452 |
13 | CU | U | North Central Region | S | 1982-84=100 | SEP1989 | 1460 |
14 | CU | U | North Central Region | S | 1982-84=100 | OCT1989 | 1473 |
15 | CU | U | North Central Region | S | 1982-84=100 | NOV1989 | 1481 |
16 | CU | U | North Central Region | S | 1982-84=100 | DEC1989 | 1485 |
17 | CU | U | North Central Region | S | 1982-84=100 | JAN1990 | 1500 |
18 | CU | U | North Central Region | S | 1982-84=100 | FEB1990 | 1516 |
19 | CU | U | North Central Region | S | 1982-84=100 | MAR1990 | 1528 |
20 | CU | U | North Central Region | S | 1982-84=100 | APR1990 | 1538 |
21 | CU | U | North Central Region | S | 1982-84=100 | MAY1990 | 1548 |
22 | CU | U | North Central Region | S | 1982-84=100 | JUN1990 | 1557 |
23 | CU | U | North Central Region | S | 1982-84=100 | JUL1990 | 1573 |
The OUTALL= data set in Output 11.2.3 indicates that data values are stored with one decimal place (see the NDEC variable). Therefore, they need to be rescaled, as follows:
data medical; set medical; medcare = medcare * 0.1; run;
This example illustrates the following features:
Descriptive information needed to write KEEP and WHERE statements can be obtained with an initial run of the DATASOURCE procedure.
The OUTCONT= and OUTALL= data sets contain information on how data values are stored, such as the precision, the units, and so on.
The OUTCONT= and OUTALL= data sets report the new series names assigned by the RENAME statement, not the old names (see the NAME variable in Output 11.2.3).
You can use PROC FORMAT to define formats for series or BY variables to enhance your output. Note that PROC DATASOURCE associates a permanent format, $AREAFMT., with the BY variable AREA. As a result, the formatted values are displayed in the printout of the OUTALL=MEDINFO data set (see Output 11.2.3).
Copyright © SAS Institute, Inc. All Rights Reserved.