This sample code reads all the data on a three-volume daily NYSE/AMEX combined character data set. Assume that the following filerefs are assigned to the calendar/indices file and security files that this database comprises:
Fileref |
VOLSER |
File Type |
calfile |
DXAA1 |
calendar/indices file on volume 1 |
secfile1 |
DXAA1 |
security file on volume 1 |
secfile2 |
DXAA2 |
security file on volume 2 |
secfile3 |
DXAA3 |
security file on volume 3 |
The data set CALDATA is created by the following statements to contain the calendar/indices file:
proc datasource filetype=crspdci infile=calfile out=caldata; run;
Here the FILETYPE=CRSPDCI indicates that you are reading a character format (indicated by a C in the 6th position) daily (indicated by a D in the 5th position) calendar/indices file (indicated by an I in the 7th position).
The annual data in security files can be obtained by the following statements:
proc datasource filetype=crspdca infile=( secfile1 secfile2 secfile3 ) out=annual; run;
Similarly, the data sets to contain the daily security data (the OUT= data set) and the event data (the OUTEVENT= data set) are obtained by the following statements:
proc datasource filetype=crspdcs infile=( calfile secfile1 secfile2 secfile3 ) out=periodic index outevent=events; run;
Note that the FILETYPE= has an S in the 7th position, since you are reading the security files. Also, the INFILE= option first expects the fileref of the calendar/indices file since the dating variable (CALDT) is contained in that file. Following the fileref of calendar/indices file, you give the list of security files in the order in which you want to read them. When data span more than one physical volume, the filerefs of the security files residing on each volume must be given following the fileref of the calendar/indices file. The DATASOURCE procedure reads each of these files in the order in which they are specified. Therefore, you can request that all three volumes be mounted to the same drive, if you choose to do so.
This sample code illustrates the following points:
The INDEX option in the second PROC DATASOURCE run creates an index file for the OUT=PERIODIC data set. This index file provides random access to the OUT= data set and may increase the efficiency of the subsequent PROC and DATA steps that use BY and WHERE statements. The index variables are CUSIP, CRSP permanent number (PERMNO), NASDAQ company number (COMPNO), NASDAQ issue number (ISSUNO), header exchange code (HEXCD), and header SIC code (HSICCD). Each one of these variables forms a different key which is a single index. If you want to form keys from a combination of variables (composite indexes) or use some other variables as indexes, you should use the INDEX= data set option for the OUT= data set.
The OUTEVENT=EVENTS data set is sparse. In fact, for each EVENT type, a unique set of event variables are defined. For example, for EVENT=’SHARES’, only the variables SHROUT and SHRFLG are defined, and they have missing values for all other EVENT types. Pictorially, this structure is similar to the data set shown in Table 13.4. Because of this sparse representation, you should create the OUTEVENT= data set only when you need a subset of securities and events.
By default, the OUT= data set contains only the periodic data. However, you may also want to include the event-oriented data in the OUT= data set. This is accomplished by listing the event variables together with periodic variables in a KEEP statement. For example, if you want to extract the historical CUSIP (NCUSIP), number of shares outstanding (SHROUT), and dividend cash amount (DIVAMT) together with all the periodic series, use the following statements.
proc datasource filetype=crspdcs infile=( calfile secfile1 secfile2 secfile3 ) out=both outevent=events; where cusip='09523220'; keep bidlo askhi prc vol ret sxret bxret ncusip shrout divamt; run;
The KEEP statement has no effect on the event variables output to the OUTEVENT= data set. If you want to extract only a subset of event variables, you need to use the KEEPEVENT statement. For example, the following sample code outputs only NCUSIP and SHROUT to the OUTEVENT= data set for CUSIP=’09523220’:
proc datasource filetype=crspdxc infile=( calfile secfile) outevent=subevts; where cusip='09523220'; keepevent ncusip shrout; run;
Output 13.9.1, Output 13.9.2, Output 13.9.3, and Output 13.9.4 show how to read the CRSP Daily NYSE/AMEX Combined ASCII Character Files.
filename dxci "%sysget(DATASRC_DATA)dxccal95.dat" RECFM=F LRECL=130; filename dxc "%sysget(DATASRC_DATA)dxcsub95.dat" RECFM=F LRECL=400; /*--- create output data sets from character format DX files ---*/ /*- create securities output data sets using DATASOURCE -------*/ /*- statements -*/ proc datasource filetype=crspdcs ascii infile=( dxci dxc ) interval=day outcont=dxccont outkey=dxckey outall=dxcall out=dxc outevent=dxcevent outselect=off; range from '15aug95'd to '28aug95'd ; where cusip in ('12709510','35614220'); run; title1 'Date Range 15aug95-28aug95 '; title3 'DX Security File Outputs'; title4 'OUTKEY= Data Set'; proc print data=dxckey; run; title4 'OUTCONT= Data Set'; proc print data=dxccont; run; title4 "Listing of OUT= Data Set for cusip in ('12709510','35614220')"; proc print data=dxc; run; title4 "Listing of OUTEVENT= Data Set for cusip in ('12709510','35614220')"; proc print data=dxcevent; run;
Output 13.9.1: Listing of the OUTBY= Data Set with OUTSELECT=OFF
Date Range 15aug95-28aug95 |
DX Security File Outputs |
OUTKEY= Data Set |
Obs | CUSIP | PERMNO | COMPNO | ISSUNO | HEXCD | HSICCD | BYSELECT | ST_DATE | END_DATE | NTIME | NOBS | NINRANGE | NSERIES | NSELECT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 68391610 | 10000 | 7952 | 9787 | 3 | 3990 | 0 | 07JAN1986 | 11JUN1987 | 521 | 0 | 0 | 35 | 7 |
2 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 1 | 17JAN1986 | 28AUG1995 | 3511 | 2431 | 10 | 35 | 7 |
3 | 49307510 | 10020 | 7972 | 9824 | 3 | 6710 | 0 | 27JAN1986 | 30APR1993 | 2651 | 0 | 0 | 35 | 7 |
4 | 00338690 | 10030 | 22160 | 0 | 1 | 3310 | 0 | 02JUL1962 | 26DEC1968 | 2370 | 0 | 0 | 35 | 7 |
5 | 41741F20 | 10040 | 7988 | 9846 | 3 | 6210 | 0 | 07FEB1986 | 15JUN1989 | 1225 | 0 | 0 | 35 | 7 |
6 | 00074210 | 10050 | 13 | 11 | 3 | 3448 | 0 | 29DEC1972 | 16JUN1978 | 1996 | 0 | 0 | 35 | 7 |
7 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 1 | 24FEB1986 | 29DEC1995 | 3596 | 2492 | 10 | 35 | 7 |
Output 13.9.2: Listing of the OUTCONT= Data Set
Date Range 15aug95-28aug95 |
DX Security File Outputs |
OUTCONT= Data Set |
Obs | NAME | KEPT | SELECTED | TYPE | LENGTH | VARNUM | LABEL | FORMAT | FORMATL | FORMATD |
---|---|---|---|---|---|---|---|---|---|---|
1 | BIDLO | 1 | 1 | 1 | 6 | 8 | Bid or Low | 0 | 0 | |
2 | ASKHI | 1 | 1 | 1 | 6 | 9 | Ask or High | 0 | 0 | |
3 | PRC | 1 | 1 | 1 | 6 | 10 | Closing Price of Bid/Ask average | 0 | 0 | |
4 | VOL | 1 | 1 | 1 | 6 | 11 | Share Volume | 0 | 0 | |
5 | RET | 1 | 1 | 1 | 6 | 12 | Holding Period Return | 0 | 0 | |
6 | SXRET | 1 | 1 | 1 | 6 | 13 | Standard Deviation Excess Return | 0 | 0 | |
7 | BXRET | 1 | 1 | 1 | 6 | 14 | Beta Excess Return | 0 | 0 | |
8 | NCUSIP | 0 | 0 | 2 | 8 | . | Name CUSIP | 0 | 0 | |
9 | TICKER | 0 | 0 | 2 | 5 | . | Exchange Ticker Symbol | 0 | 0 | |
10 | COMNAM | 0 | 0 | 2 | 32 | . | Company Name | 0 | 0 | |
11 | SHRCLS | 0 | 0 | 2 | 1 | . | Share Class | 0 | 0 | |
12 | SHRCD | 0 | 0 | 1 | 6 | . | Share Code | 0 | 0 | |
13 | EXCHCD | 0 | 0 | 1 | 6 | . | Exchange Code | 0 | 0 | |
14 | SICCD | 0 | 0 | 1 | 6 | . | Standard Industrial Classification Code | 0 | 0 | |
15 | DISTCD | 0 | 0 | 1 | 6 | . | Distribution Code | 0 | 0 | |
16 | DIVAMT | 0 | 0 | 1 | 6 | . | Dividend Cash Amount | 0 | 0 | |
17 | FACPR | 0 | 0 | 1 | 6 | . | Factor to adjust price | 0 | 0 | |
18 | FACSHR | 0 | 0 | 1 | 6 | . | Factor to adjust shares outstanding | 0 | 0 | |
19 | DCLRDT | 0 | 0 | 1 | 6 | . | Declaration date | DATE | 7 | 0 |
20 | RCRDDT | 0 | 0 | 1 | 6 | . | Record date | DATE | 7 | 0 |
21 | PAYDT | 0 | 0 | 1 | 6 | . | Payment date | DATE | 7 | 0 |
22 | SHROUT | 0 | 0 | 1 | 6 | . | Number of shares outstanding | 0 | 0 | |
23 | SHRFLG | 0 | 0 | 1 | 6 | . | Share flag | 0 | 0 | |
24 | DLSTCD | 0 | 0 | 1 | 6 | . | Delisting code | 0 | 0 | |
25 | NWPERM | 0 | 0 | 1 | 6 | . | New CRSP permanent number | 0 | 0 | |
26 | NEXTDT | 0 | 0 | 1 | 6 | . | Date of next available information | DATE | 7 | 0 |
27 | DLBID | 0 | 0 | 1 | 6 | . | Delisting bid | 0 | 0 | |
28 | DLASK | 0 | 0 | 1 | 6 | . | Delisting ask | 0 | 0 | |
29 | DLPRC | 0 | 0 | 1 | 6 | . | Delisting price | 0 | 0 | |
30 | DLVOL | 0 | 0 | 1 | 6 | . | Delisting volume | 0 | 0 | |
31 | DLRET | 0 | 0 | 1 | 6 | . | Delisting return | 0 | 0 | |
32 | TRTSCD | 0 | 0 | 1 | 6 | . | Traits code | 0 | 0 | |
33 | NMSIND | 0 | 0 | 1 | 6 | . | National Market System Indicator | 0 | 0 | |
34 | MMCNT | 0 | 0 | 1 | 6 | . | Market maker count | 0 | 0 | |
35 | NSDINX | 0 | 0 | 1 | 6 | . | NASD index | 0 | 0 |
Output 13.9.3: Listing of the OUT= Data Set with OUTSELECT=OFF for CUSIPs 12709510 and 35614220
Date Range 15aug95-28aug95 |
DX Security File Outputs |
Listing of OUT= Data Set for cusip in ('12709510','35614220') |
Obs | CUSIP | PERMNO | COMPNO | ISSUNO | HEXCD | HSICCD | DATE | BIDLO | ASKHI | PRC | VOL | RET | SXRET | BXRET |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 15AUG1995 | 7.500 | 7.8750 | 7.5625 | 29200 | -0.008197 | . | . |
2 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 16AUG1995 | 7.500 | 7.8750 | 7.5000 | 22365 | -0.008264 | . | . |
3 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 17AUG1995 | 7.500 | 7.8750 | 7.5000 | 33416 | 0.000000 | . | . |
4 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 18AUG1995 | 7.375 | 7.5000 | 7.3750 | 16666 | -0.016667 | . | . |
5 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 21AUG1995 | 7.375 | 7.3750 | 7.3750 | 9382 | 0.000000 | . | . |
6 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 22AUG1995 | 7.250 | 7.3750 | 7.2500 | 33674 | -0.016949 | . | . |
7 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 23AUG1995 | 7.250 | 7.3750 | 7.3125 | 22371 | 0.008621 | . | . |
8 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 24AUG1995 | 7.125 | 7.5000 | 7.1250 | 38621 | -0.025641 | . | . |
9 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 25AUG1995 | 6.875 | 7.3750 | 7.0000 | 29713 | -0.017544 | . | . |
10 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | 28AUG1995 | 7.000 | 7.1250 | 7.0000 | 38798 | 0.000000 | . | . |
11 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 15AUG1995 | 12.375 | 12.6875 | 12.3750 | 39136 | 0.000000 | . | . |
12 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 16AUG1995 | 12.125 | 12.3750 | 12.2031 | 45916 | -0.013889 | . | . |
13 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 17AUG1995 | 12.250 | 12.3125 | 12.2500 | 43644 | 0.003841 | . | . |
14 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 18AUG1995 | 12.250 | 12.6250 | 12.3750 | 11027 | 0.010204 | . | . |
15 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 21AUG1995 | 12.375 | 12.6250 | 12.3750 | 7378 | 0.000000 | . | . |
16 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 22AUG1995 | 12.250 | 12.3750 | 12.2500 | 99655 | -0.010101 | . | . |
17 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 23AUG1995 | 12.125 | 12.2500 | 12.1250 | 95148 | -0.010204 | . | . |
18 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 24AUG1995 | 12.125 | 12.3750 | 12.3750 | 185572 | 0.020619 | . | . |
19 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 25AUG1995 | 12.000 | 12.2500 | 12.0000 | 9575 | -0.030303 | . | . |
20 | 35614220 | 10060 | 8007 | 9876 | 3 | 1040 | 28AUG1995 | 12.000 | 12.0625 | 12.0625 | 12854 | 0.005208 | . | . |
Output 13.9.4: Listing of the OUTEVENT= Data Set in Range 15aug95-28aug95
Date Range 15aug95-28aug95 |
DX Security File Outputs |
Listing of OUTEVENT= Data Set for cusip in ('12709510','35614220') |
Obs | CUSIP | PERMNO | COMPNO | ISSUNO | HEXCD | HSICCD | EVENT | DATE | NCUSIP | TICKER | COMNAM | SHRCLS | SHRCD | EXCHCD | SICCD | DISTCD | DIVAMT | FACPR | FACSHR | DCLRDT | RCRDDT | PAYDT | SHROUT | SHRFLG | DLSTCD | NWPERM | NEXTDT | DLBID | DLASK | DLPRC | DLVOL | DLRET | TRTSCD | NMSIND | MMCNT | NSDINX |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | DELIST | 28AUG1995 | . | . | . | . | . | . | . | . | . | . | . | . | 203 | 23588 | . | . | . | 0 | . | 0.037500 | . | . | . | . | ||||
2 | 12709510 | 10010 | 7967 | 9809 | 3 | 3840 | NASDIN | 24AUG1995 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | 1 | 2 | 17 | 2 |
Note in Output 13.9.4 that there were no events in range for cusip 35614220. See Chapter 46: The SASECRSP Interface Engine, for more on CRSPAccess Data access.