The DATASOURCE Procedure

Example 12.9 CRSP Daily NYSE/AMEX Combined Stocks

This sample code reads all the data on a three-volume daily NYSE/AMEX combined character data set. Assume that the following filerefs are assigned to the calendar/indices file and security files that this database comprises:

Fileref

VOLSER

File Type

calfile

DXAA1

calendar/indices file on volume 1

secfile1

DXAA1

security file on volume 1

secfile2

DXAA2

security file on volume 2

secfile3

DXAA3

security file on volume 3

The data set CALDATA is created by the following statements to contain the calendar/indices file:

proc datasource filetype=crspdci infile=calfile out=caldata;
run;

Here the FILETYPE=CRSPDCI indicates that you are reading a character format (indicated by a C in the 6th position) daily (indicated by a D in the 5th position) calendar/indices file (indicated by an I in the 7th position).

The annual data in security files can be obtained by the following statements:

proc datasource filetype=crspdca
                infile=( secfile1 secfile2 secfile3 )
                out=annual;
run;

Similarly, the data sets to contain the daily security data (the OUT= data set) and the event data (the OUTEVENT= data set) are obtained by the following statements:

proc datasource filetype=crspdcs
                infile=( calfile secfile1 secfile2 secfile3 )
                out=periodic index outevent=events;
run;

Note that the FILETYPE= has an S in the 7th position, since you are reading the security files. Also, the INFILE= option first expects the fileref of the calendar/indices file since the dating variable (CALDT) is contained in that file. Following the fileref of calendar/indices file, you give the list of security files in the order in which you want to read them. When data span more than one physical volume, the filerefs of the security files residing on each volume must be given following the fileref of the calendar/indices file. The DATASOURCE procedure reads each of these files in the order in which they are specified. Therefore, you can request that all three volumes be mounted to the same drive, if you choose to do so.

This sample code illustrates the following points:

  • The INDEX option in the second PROC DATASOURCE run creates an index file for the OUT=PERIODIC data set. This index file provides random access to the OUT= data set and may increase the efficiency of the subsequent PROC and DATA steps that use BY and WHERE statements. The index variables are CUSIP, CRSP permanent number (PERMNO), NASDAQ company number (COMPNO), NASDAQ issue number (ISSUNO), header exchange code (HEXCD), and header SIC code (HSICCD). Each one of these variables forms a different key which is a single index. If you want to form keys from a combination of variables (composite indexes) or use some other variables as indexes, you should use the INDEX= data set option for the OUT= data set.

  • The OUTEVENT=EVENTS data set is sparse. In fact, for each EVENT type, a unique set of event variables are defined. For example, for EVENT=’SHARES’, only the variables SHROUT and SHRFLG are defined, and they have missing values for all other EVENT types. Pictorially, this structure is similar to the data set shown in Table 12.4. Because of this sparse representation, you should create the OUTEVENT= data set only when you need a subset of securities and events.

By default, the OUT= data set contains only the periodic data. However, you may also want to include the event-oriented data in the OUT= data set. This is accomplished by listing the event variables together with periodic variables in a KEEP statement. For example, if you want to extract the historical CUSIP (NCUSIP), number of shares outstanding (SHROUT), and dividend cash amount (DIVAMT) together with all the periodic series, use the following statements.

proc datasource filetype=crspdcs
                infile=( calfile secfile1 secfile2 secfile3 )
                out=both outevent=events;
   where cusip='09523220';
   keep  bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
run;

The KEEP statement has no effect on the event variables output to the OUTEVENT= data set. If you want to extract only a subset of event variables, you need to use the KEEPEVENT statement. For example, the following sample code outputs only NCUSIP and SHROUT to the OUTEVENT= data set for CUSIP=’09523220’:

proc datasource filetype=crspdxc
                infile=( calfile secfile)
                outevent=subevts;
   where cusip='09523220';
   keepevent  ncusip shrout;
run;

Output 12.9.1, Output 12.9.2, Output 12.9.3, and Output 12.9.4 show how to read the CRSP Daily NYSE/AMEX Combined ASCII Character Files.

filename dxci "%sysget(DATASRC_DATA)dxccal95.dat" RECFM=F LRECL=130;
filename dxc "%sysget(DATASRC_DATA)dxcsub95.dat" RECFM=F LRECL=400;

/*--- create output data sets from character format DX files ---*/
/*- create securities output data sets using DATASOURCE  -------*/
/*- statements                                                 -*/
proc datasource filetype=crspdcs  ascii
                infile=( dxci dxc )
                interval=day
                outcont=dxccont
                outkey=dxckey
                outall=dxcall
                out=dxc
                outevent=dxcevent
                outselect=off;
   range from '15aug95'd to '28aug95'd ;
   where cusip in ('12709510','35614220');
run;

title1 'Date Range 15aug95-28aug95 ';

title3 'DX Security File Outputs';
title4 'OUTKEY= Data Set';
proc print data=dxckey;
run;

title4 'OUTCONT= Data Set';
proc print data=dxccont;
run;

title4 "Listing of OUT= Data Set for cusip in ('12709510','35614220')";
proc print data=dxc;
run;

title4 "Listing of OUTEVENT= Data Set for cusip in ('12709510','35614220')";
proc print data=dxcevent;
run;

Output 12.9.1: Listing of the OUTBY= Data Set with OUTSELECT=OFF

Date Range 15aug95-28aug95
 
DX Security File Outputs
OUTKEY= Data Set

Obs CUSIP PERMNO COMPNO ISSUNO HEXCD HSICCD BYSELECT ST_DATE END_DATE NTIME NOBS NINRANGE NSERIES NSELECT
1 68391610 10000 7952 9787 3 3990 0 07JAN1986 11JUN1987 521 0 0 35 7
2 12709510 10010 7967 9809 3 3840 1 17JAN1986 28AUG1995 3511 2431 10 35 7
3 49307510 10020 7972 9824 3 6710 0 27JAN1986 30APR1993 2651 0 0 35 7
4 00338690 10030 22160 0 1 3310 0 02JUL1962 26DEC1968 2370 0 0 35 7
5 41741F20 10040 7988 9846 3 6210 0 07FEB1986 15JUN1989 1225 0 0 35 7
6 00074210 10050 13 11 3 3448 0 29DEC1972 16JUN1978 1996 0 0 35 7
7 35614220 10060 8007 9876 3 1040 1 24FEB1986 29DEC1995 3596 2492 10 35 7


Output 12.9.2: Listing of the OUTCONT= Data Set

Date Range 15aug95-28aug95
 
DX Security File Outputs
OUTCONT= Data Set

Obs NAME KEPT SELECTED TYPE LENGTH VARNUM LABEL FORMAT FORMATL FORMATD
1 BIDLO 1 1 1 6 8 Bid or Low   0 0
2 ASKHI 1 1 1 6 9 Ask or High   0 0
3 PRC 1 1 1 6 10 Closing Price of Bid/Ask average   0 0
4 VOL 1 1 1 6 11 Share Volume   0 0
5 RET 1 1 1 6 12 Holding Period Return   0 0
6 SXRET 1 1 1 6 13 Standard Deviation Excess Return   0 0
7 BXRET 1 1 1 6 14 Beta Excess Return   0 0
8 NCUSIP 0 0 2 8 . Name CUSIP   0 0
9 TICKER 0 0 2 5 . Exchange Ticker Symbol   0 0
10 COMNAM 0 0 2 32 . Company Name   0 0
11 SHRCLS 0 0 2 1 . Share Class   0 0
12 SHRCD 0 0 1 6 . Share Code   0 0
13 EXCHCD 0 0 1 6 . Exchange Code   0 0
14 SICCD 0 0 1 6 . Standard Industrial Classification Code   0 0
15 DISTCD 0 0 1 6 . Distribution Code   0 0
16 DIVAMT 0 0 1 6 . Dividend Cash Amount   0 0
17 FACPR 0 0 1 6 . Factor to adjust price   0 0
18 FACSHR 0 0 1 6 . Factor to adjust shares outstanding   0 0
19 DCLRDT 0 0 1 6 . Declaration date DATE 7 0
20 RCRDDT 0 0 1 6 . Record date DATE 7 0
21 PAYDT 0 0 1 6 . Payment date DATE 7 0
22 SHROUT 0 0 1 6 . Number of shares outstanding   0 0
23 SHRFLG 0 0 1 6 . Share flag   0 0
24 DLSTCD 0 0 1 6 . Delisting code   0 0
25 NWPERM 0 0 1 6 . New CRSP permanent number   0 0
26 NEXTDT 0 0 1 6 . Date of next available information DATE 7 0
27 DLBID 0 0 1 6 . Delisting bid   0 0
28 DLASK 0 0 1 6 . Delisting ask   0 0
29 DLPRC 0 0 1 6 . Delisting price   0 0
30 DLVOL 0 0 1 6 . Delisting volume   0 0
31 DLRET 0 0 1 6 . Delisting return   0 0
32 TRTSCD 0 0 1 6 . Traits code   0 0
33 NMSIND 0 0 1 6 . National Market System Indicator   0 0
34 MMCNT 0 0 1 6 . Market maker count   0 0
35 NSDINX 0 0 1 6 . NASD index   0 0


Output 12.9.3: Listing of the OUT= Data Set with OUTSELECT=OFF for CUSIPs 12709510 and 35614220

Date Range 15aug95-28aug95
 
DX Security File Outputs
Listing of OUT= Data Set for cusip in ('12709510','35614220')

Obs CUSIP PERMNO COMPNO ISSUNO HEXCD HSICCD DATE BIDLO ASKHI PRC VOL RET SXRET BXRET
1 12709510 10010 7967 9809 3 3840 15AUG1995 7.500 7.8750 7.5625 29200 -0.008197 . .
2 12709510 10010 7967 9809 3 3840 16AUG1995 7.500 7.8750 7.5000 22365 -0.008264 . .
3 12709510 10010 7967 9809 3 3840 17AUG1995 7.500 7.8750 7.5000 33416 0.000000 . .
4 12709510 10010 7967 9809 3 3840 18AUG1995 7.375 7.5000 7.3750 16666 -0.016667 . .
5 12709510 10010 7967 9809 3 3840 21AUG1995 7.375 7.3750 7.3750 9382 0.000000 . .
6 12709510 10010 7967 9809 3 3840 22AUG1995 7.250 7.3750 7.2500 33674 -0.016949 . .
7 12709510 10010 7967 9809 3 3840 23AUG1995 7.250 7.3750 7.3125 22371 0.008621 . .
8 12709510 10010 7967 9809 3 3840 24AUG1995 7.125 7.5000 7.1250 38621 -0.025641 . .
9 12709510 10010 7967 9809 3 3840 25AUG1995 6.875 7.3750 7.0000 29713 -0.017544 . .
10 12709510 10010 7967 9809 3 3840 28AUG1995 7.000 7.1250 7.0000 38798 0.000000 . .
11 35614220 10060 8007 9876 3 1040 15AUG1995 12.375 12.6875 12.3750 39136 0.000000 . .
12 35614220 10060 8007 9876 3 1040 16AUG1995 12.125 12.3750 12.2031 45916 -0.013889 . .
13 35614220 10060 8007 9876 3 1040 17AUG1995 12.250 12.3125 12.2500 43644 0.003841 . .
14 35614220 10060 8007 9876 3 1040 18AUG1995 12.250 12.6250 12.3750 11027 0.010204 . .
15 35614220 10060 8007 9876 3 1040 21AUG1995 12.375 12.6250 12.3750 7378 0.000000 . .
16 35614220 10060 8007 9876 3 1040 22AUG1995 12.250 12.3750 12.2500 99655 -0.010101 . .
17 35614220 10060 8007 9876 3 1040 23AUG1995 12.125 12.2500 12.1250 95148 -0.010204 . .
18 35614220 10060 8007 9876 3 1040 24AUG1995 12.125 12.3750 12.3750 185572 0.020619 . .
19 35614220 10060 8007 9876 3 1040 25AUG1995 12.000 12.2500 12.0000 9575 -0.030303 . .
20 35614220 10060 8007 9876 3 1040 28AUG1995 12.000 12.0625 12.0625 12854 0.005208 . .


Output 12.9.4: Listing of the OUTEVENT= Data Set in Range 15aug95-28aug95

Date Range 15aug95-28aug95
 
DX Security File Outputs
Listing of OUTEVENT= Data Set for cusip in ('12709510','35614220')

Obs CUSIP PERMNO COMPNO ISSUNO HEXCD HSICCD EVENT DATE NCUSIP TICKER COMNAM SHRCLS SHRCD EXCHCD SICCD DISTCD DIVAMT FACPR FACSHR DCLRDT RCRDDT PAYDT SHROUT SHRFLG DLSTCD NWPERM NEXTDT DLBID DLASK DLPRC DLVOL DLRET TRTSCD NMSIND MMCNT NSDINX
1 12709510 10010 7967 9809 3 3840 DELIST 28AUG1995         . . . . . . . . . . . . 203 23588 . . . 0 . 0.037500 . . . .
2 12709510 10010 7967 9809 3 3840 NASDIN 24AUG1995         . . . . . . . . . . . . . . . . . . . . 1 2 17 2


Note in Output 12.9.4 that there were no events in range for cusip 35614220. See Chapter 39: The SASECRSP Interface Engine, for more on CRSPAccess Data access.