The SASECRSP Interface Engine

The LIBNAME libref SASECRSP Statement

LIBNAME libref SASECRSP ’physical name options ;

The physical name required by the LIBNAME statement should point to the directory of CRSPAccess data files where the CRSP database you want to open is located. Note that the physical name must end in a slash for UNIX environments and a backslash for Windows environments.

The CRSP environment variable CRSPDB_SASCAL must be defined before the SASECRSP engine can access the CRSPAccess database calendars. The CRSP environment variable CRSPDB_SASCAL is necessary for the SASECRSP LIBNAME to assign successfully. This necessary environment variable should be defined automatically by either the CRSP software installation or, in later versions, the CRSP data installation. Since occasional instances occur where the variable is not set properly, always check to ensure the CRSPDB_SASCAL environment variable is set to the location where your most recent CRSP data resides. Remember to include the final slash or backslash required.

After the LIBNAME is assigned, you can access any of the available data sets/members within the opened database. For a complete description of available data sets and their fields, see the section Data Elements Reference: SASECRSP Interface Engine.

The following options can be used in the LIBNAME libref SASECRSP statement:

SETID=crsp_setidnumber

Specifies the CRSP database you want to read from. SETID is a required option. Choose one SETID from seven possible values in Table 39.1. The SETID limits the frequency selection of time series that are included in the SAS data set.

As an example, to access monthly CRSP Stock data, you would use the following statements:

   LIBNAME myLib sasecrsp 'physical-name'
                           SETID=20;
PERMNO=crsp_permnumber

By default, the SASECRSP engine reads all keys for the CRSPAccess database that you specified in your SASECRSP libref. The PERMNO= option enables you to select data from your CRSP database by the PERMNO(s) (or other keys) you specify. PERMNOs are CRSP’s unique permanent issue identification number. There is no limit to the number of crsp_permnumber options that you can use.

From a performance standpoint, the PERMNO= option does efficient random access and reads only the data for the PERMNOs specified.

The following LIBNAME statement reads data only for Microsoft Corporation (PERMNO=10107) and International Business Machines Corporation (PERMNO=12490) using the primary PERMNO key, and thus is very efficient.

   LIBNAME myLib sasecrsp 'physical-name'
      SETID=20
      PERMNO=10107
      PERMNO=12490;

The PERMCO=, CUSIP=, HCUSIP=, SICCD=, TICKER=, GVKEY=, and INDNO= options behave similarly and you can use them in conjunction with or in place of the PERMNO= option. For example you could have used the following statement to access monthly data for Microsoft and IBM:

   LIBNAME myLib sasecrsp 'physical-name'
                           SETID=20
                           TICKER='MSFT'
                           CUSIP=59491810;

Details on the use of other key selection options are described separately later.

PERMNOs specified by this option can select the companies or issues to keep for CRSP Stock or for CRSP/Compustat Merged databases, but PERMNO is not a supported option for CRSP Indices databases. Use the INDNO= option for the CRSP Indices database and use the PERMNO= option with CRSP US Stock and with CRSP/Compustat Merged databases. Details on the use of key selection options for each type of database follows.

STK Databases

PERMNO is the primary key for CRSP Stock databases. Every valid PERMNO you specify with the PERMNO= option keeps exactly one issue.

CCM Databases

PERMNO can be used as a secondary key for the CCM database through CRSPLink™. Linking between the CRSP and Compustat databases is a complex, many-to-many relationship between PERMNO/PERMCOs and GVKEYs. When accessing CCM data by PERMNO, all GVKEYs that link to the given PERMNO are amalgamated to provide seamless access to all linked data. However, note that accessing CCM data by PERMNO is logically different than accessing it by its linked GVKEY(s).

In particular, when the PERMNO you specify is linked to several different GVKEYs, one link is designated as the primary link. This designation is set by CRSP and its researchers, and serves in a specific role for the link information in the CCM database. Only data for the primary link is retrieved for the header. For other members, including all time series members, all links are examined, but data is extracted only for the active period of the links and only if the data is within any possible user-specified date restriction. If two or more GVKEY-to-PERMNO links overlap in time, data from the later (more recent) GVKEY is used. For more information about CRSP links, see Link Used Array in the CRSP/Compustat Merged Database Guide.

For example, PERMNO=10083 is CRSP’s unique issue identifier for Teknowledge Incorporated, and later (due to a name change) Cimflex Teknowledge Corporation. To access CCM data for IBM Corporation, Teknowledge Inc., and Cimflex Teknowledge Corp., you can use the following statement:

   LIBNAME myLib1 sasecrsp 'physical-name'
                           SETID=200
                           GVKEY=6066      /* IBM */
                           PERMNO=10083;   /* Teknowledge and Cimflex */
  

Teknowledge Inc. and Cimflex Corp. have separate GVKEYs in the CCM database, so the previous statement is actually an example of using one PERMNO to access data for several (linked) GVKEYs. The first link to GVKEY=11947 spans March 5, 1986, to December 31, 1988, and the second link to GVKEY=15495 spans February 2, 1989, to September 9, 1993.

An alternate way of accessing the data is by using the linked GVKEYs directly as seen in this statement.

   LIBNAME myLib2 sasecrsp 'physical-name'
                           SETID=200
                           GVKEY=6066
                           GVKEY=11947
                           GVKEY=15495;
  

These two LIBNAME statements look similar, but do not perform the same operation. myLib1 assumes you are selecting the issue data for PERMNO=10083, so only observations from the CCM database that are within the time period of the used links are accessed. In the previous example for myLib1, only data ranging from March 5, 1986, to December 31, 1988, are extracted for GVKEY=11947 and only data ranging from February 28, 1989, to September 9, 1993, are extracted for GVKEY=15496.

Furthermore, while both GVKEYs 11947 and 15495 are linked to the PERMNO, GVKEY 15495 is the primary link, and when accessing the header, only 15495 is used. If the two links overlap, the data from the later (more recent) GVKEY of 15495 is used.

In contrast, myLib2 uses an open range for all three specified keys. If there are data overlapping in time between GVKEY 11947 and 15495, data for both are reported. Similarly, when accessing the header, data for both 11947 and 15497 are retrieved.

IND Databases

INDNO is the primary key for accessing CRSP Indices databases. PERMNO is not available as a key for the IND (CRSP Indices) database; use INDNO for efficient access of IND database.

GVKEY=crsp_gvkey

The GVKEY= option is similar to the PERMNO= option. It enables you to use the Compustat’s Permanent SPC Identifier key (GVKEY) to select the companies or issues to keep. There is no limit to the number of crsp_gvkey options that you can use.

STK Databases

GVKEY can serve as a secondary key for accessing CRSP Stock databases. This requires the additional use of the CRSPLINKPATH= option. Linking between the Compustat and CRSP databases is a complex, many-to-many relationship between GVKEYs and PERMNO/PERMCOs. When accessing CRSP data by GVKEY, all links of the specified GVKEY are followed and processed. No additional logic is applied, and link ranges are ignored. Accessing CRSP data by GVKEY is identical to accessing CRSP data by all of its linked PERMNOs.

For example, Wolverine Exploration Co. and Amerac Energy Corp have different PERMNOs but the same GVKEY, and there are two identical ways of accessing CRSP Stock data on these two entities.

   LIBNAME myLib1 sasecrsp 'physical-name'
                           SETID=10
                           PERMNO=13638  /* Wolverine Exploration */
                           PERMNO=84641; /* Amerac Energy */

   LIBNAME myLib2 sasecrsp 'physical-name'
                            SETID=10
                            CRSPLINKPATH='physical-name'
                            GVKEY=1544;

The CRSPLINKPATH= option is required when accessing CRSP Stock databases by GVKEY. See the discussion later in this section on the CRSPLINKPATH= option.

CCM Databases

GVKEY is the primary key for accessing the CCM database. Every valid GVKEY you specify keeps exactly one company.

IND Databases

INDNO is the primary key for accessing CRSP Indices databases; use INDNO instead of GVKEY for IND databases. GVKEY is not available as a key for accessing CRSP Indices databases.

PERMCO=crsp_permcompany

The PERMCO= option is similar to the PERMNO= option. It enables you to use the CRSP’s unique permanent company identification key (PERMCO) to select the companies or issues to keep. There is no limit to the number of crsp_permcompany options that you can use.

STK Databases

PERMCO is a secondary key for accessing CRSP Stock databases. One PERMCO can map to multiple PERMNOs. Access by PERMCO is equivalent to access by all mapped PERMNOs.

CCM Databases

PERMCO can also be used as a secondary key for accessing the CCM database. Linking between the CRSP and CCM databases is a complex, many-to-many relationship. When accessing CCM data by PERMCO, all linking GVKEYs are amalgamated and processed. Link active ranges are respected. Only data for the primary link is returned for the header. In cases when the active ranges of various links overlap, the most recent link is used. See PERMNO= option for more details.

IND Databases

Use INDNO for accessing CRSP Indices databases. PERMCO is not available as a key for accessing CRSP Indices databases; use INDNO instead.

CUSIP=crsp_cusip

The CUSIP= option is similar to the PERMNO= option. It enables you to use the CUSIP key to select the companies or issues to keep. There is no limit to the number of crsp_cusip options that you can use.

STK Databases

CUSIP is a secondary key for accessing CRSP Stock databases. One CUSIP maps to one PERMNO.

CCM Databases

CUSIP is not available as a key for accessing CCM databases.

IND Databases

Use INDNO for accessing CRSP Indices databases. CUSIP is not available as a key for accessing CRSP Indices databases; use INDNO instead.

HCUSIP=crsp_hcusip

The HCUSIP= option is similar to the PERMNO= option. It enables you to use the historical CUSIP key, HCUSIP, to select the companies or issues to keep. There is no limit to the number of crsp_hcusip options that you can use.

STK Databases

HCUSIP is a secondary key for accessing CRSP Stock databases. One HCUSIP maps to one PERMNO.

CCM Databases

HCUSIP is not available as a key for accessing CCM databases.

IND Databases

Use INDNO for accessing CRSP Indices databases. HCUSIP is not available as a key for accessing CRSP Indices databases; use INDNO instead.

TICKER=crsp_ticker

The TICKER= option is similar to the PERMNO= option. It enables you to use the TICKER key to select the companies or issues to keep. There is no limit to the number of crsp_ticker options that you can use.

STK Databases

TICKER is a secondary key for accessing CRSP Stock databases. One TICKER maps to one PERMNO. Note that some PERMNOs are inaccessible by TICKER.

CCM Databases

TICKER is not available as a key for accessing CCM databases.

IND Databases

Use INDNO for accessing CRSP Indices databases. TICKER is not available as a key for accessing CRSP Indices databases; use INDNO instead.

SICCD=crsp_siccd

The SICCD= option is similar to the PERMNO= option. It enables you to use the Standard Industrial Classification (SIC) Code (SICCD) to select the companies or issues to keep. There is no limit to the number of crsp_siccd options that you can use.

STK Databases

SICCD is a secondary key for accessing CRSP Stock databases. One SICCD can map to multiple PERMNOs. All PERMNOs that have been classified once under the specified SICCD are mapped and data for them is retrieved. Access by SICCD is equivalent to access by all PERMNOs that have ever been classified under the specified SICCD.

CCM Databases

SICCD is not available as a key for accessing CCM databases.

IND Databases

Use INDNO for accessing CRSP Indices databases. SICCD is not available as a key for accessing CRSP Indices databases; use INDNO instead.

INDNO=crsp_indno

The INDNO= option is similar to the PERMNO= option. It enables you to use CRSP’s permanent index number INDNO to select the companies or issues to keep. There is no limit to the number of crsp_indno options that you can use.

STK Databases

INDNO is not available as a key for accessing CRSP Stock databases, but it can be used in the combined CRSP Stock and Indices databases.

CCM Databases

INDNO is not available as a key for accessing CCM databases; use GVKEY instead.

IND Databases

INDNO is the primary key for accessing CRSP Indices databases. Every INDNO you specify keeps exactly one index series or group.

For example, you can use the following statement to access the CRSP NYSE Value-Weighted and Equal-Weighted daily market indices:

   LIBNAME myLib3 sasecrsp 'physical-name'
                           SETID=460
                           INDNO=1000000   /* Value-Weighted */
                           INDNO=1000001;  /* Equal-Weighted */
  
CRSPLINKPATH=’crsp_linkpath’

To access CRSP Stock data with GVKEYs, use the CRSPLINKPATH= option. CRSPLINKPATH= specifies the physical location where your CCM database resides. Note: The physical name must end in a slash for UNIX environments and a backslash for Windows environments.

RANGE=’crsp_begdt-crsp_enddt’

To limit the time range of data read from your CRSPAccess database, specify the RANGE= option in your SASECRSP libref, where crsp_begdt is the beginning date in YYYYMMDD format and crsp_enddt is the ending date of the range in YYYYMMDD format.

As an example, to access monthly stock data for Microsoft Corporation and for International Business Machines Corporation for the first quarter of 1999, you can use the following statement:

   LIBNAME myLib sasecrsp 'physical-name'
                          SETID=20
                          PERMNO=10107
                          PERMNO=12490
                          RANGE='19990101-19990331';

The given beginning and ending dates are interpreted as calendar dates by default. If you want these dates to be interpreted as fiscal dates, you must prepend the character 'f' to the range.

For example, the following statement extracts data for the 1994 fiscal year of both Microsoft and IBM.

   LIBNAME myLib sasecrsp 'physical-name'
      SETID=20
      PERMNO=10107
      PERMNO=12490
      RANGE='f19940101-19941231';

The result of the previous statement is that data from actual calendar date July 1,1993, to June 30,1994, is extracted for Microsoft because its fiscal year end month is June. Data from January 1,1994, to December 31,1994, is extracted for IBM because its fiscal year end month is December. See Example 39.10 for a more detailed example.

The RANGE= option can be used on all CRSP Stock, Indices, and CCM members. When this option is applied to segment data members however, the behavior is slightly different in the following ways.

  • Dates associated with segment member data records are in years and can resolve only to years. This is unique to segment members. All other CRSP data members have a date resolution to the day. For example, monthly time series, though monthly, resolve to the last trading day of the month. However, segment members have a maximum resolution of years because they are not mapped to a calendar in the CRSP/Compustat database. Hence, when range restrictions are applied to segment members, only the 'YYYY' year portion of the range is considered.

  • Multiple dates are sometimes associated with a particular segment member record. In such cases, the preferred date for use in determining the date range restriction is the data year as opposed to the source year. This multiple date behavior is unique only to segment members. All other CRSP data members are associated with only one date.

INSET='setname[,keyfieldname,keyfieldtype,date1field,date2field,datetype]'

When you specify a SAS data set named setname as input for issues, the SASECRSP engine assumes that a default PERMNO field that contains selected CRSP PERMNOs is present in the data set. If optional parameters are used, they must all be specified. The only acceptable shorthand for dropping the parameters is to drop those at the very end, assuming they are all being omitted. Dropped parameters use their defaults.

The optional parameters are explained below:

  • label of the field that contains the keys to be selected. If unspecified, the default is PERMNO.

  • specifies the CRSPAccess key type of the provided keys. Possible key types are: PERMNO, PERMCO, CUSIP, HCUSIP, TICKER, SICCD, GVKEY or INDNO. If unspecified, the default is PERMNO.

  • beginning date of the specific date range restriction being applied to this key. If either date1field or date2field is omitted, the default is for there to be no date range restriction.

  • ending date of the specific date range restriction being applied to this key. If either date1field or date2field is omitted, the default is for there to be no date range restriction.

  • indicates whether the provided beginning and ending dates are calendar dates or fiscal dates. A fiscal date type means the dates given are based on the fiscal calendar of the respective company or GVKEY. A calendar date means the dates are based on the standard Julian calendar.

    The strings 'calendar' and 'fiscal' are used to indicate the respective date types. If unspecified, the default type is calendar.

    It is important to note that fiscal dates are applicable only to members with fiscal data. Fiscal members consists of all period descriptors, items, and segment members of the CCM database. If a fiscal date range is applied to nonfiscal members, it is ignored.

Individual date range restrictions specified by the inset can be used in combination with the RANGE= option on the LIBNAME. In such a case, only data from the intersection of the individual date restriction and the global RANGE= option date restriction are read.