The DATASOURCE Procedure

Reading in Data Files Containing Cross Sections

Some data files group time series data with respect to cross-section identifiers; for example, International Financial Statistics files, distributed by IMF, group data with respect to countries (COUNTRY). Within each country, data are further grouped by Control Source Code (CSC), Partner Country Code (PARTNER), and Version Code (VERSION).

If a data file contains cross-section identifiers, the DATASOURCE procedure adds them to the output data set as BY variables. For example, the data set in Table 12.2 contains three cross sections:

  • Cross-section one is identified by (COUNTRY=’112’ CSC=’F’ PARTNER=’ ’ VERSION=’Z’).

  • Cross-section two is identified by (COUNTRY=’146’ CSC=’F’ PARTNER=’ ’ VERSION=’Z’).

  • Cross-section three is identified by (COUNTRY=’158’ CSC=’F’ PARTNER=’ ’ VERSION=’Z’).

Table 12.2: The Form of a SAS Data Set Containing BY Variables

BY

Time ID

Time Series

Variables

Variable

Variables

COUNTRY

CSC

PARTNER

VERSION

DATE

EFFEXR

EXRINDEX

112

F

 

Z

SEP1987

9326

12685

112

F

 

Z

OCT1987

9393

12813

112

F

 

Z

NOV1987

9626

13694

112

F

 

Z

DEC1987

9675

14099

112

F

 

Z

JAN1988

9581

13910

112

F

 

Z

FEB1988

9493

13549

146

F

 

Z

SEP1987

12046

16192

146

F

 

Z

OCT1987

12067

16266

146

F

 

Z

NOV1987

12558

17596

146

F

 

Z

DEC1987

12759

18301

146

F

 

Z

JAN1988

12642

18082

146

F

 

Z

FEB1988

12409

17470

158

F

 

Z

SEP1987

13841

16558

158

F

 

Z

OCT1987

13754

16499

158

F

 

Z

NOV1987

14222

17505

158

F

 

Z

DEC1987

14768

18423

158

F

 

Z

JAN1988

14933

18565

158

F

 

Z

FEB1988

14915

18331


Note that the data sets in Table 12.1 and Table 12.2 use two different ways of representing time series data for three different countries: the United Kingdom (COUNTRY=’112’), Switzerland (COUNTRY=’146’), and Japan (COUNTRY=’158’). The first representation (Table 12.1) incorporates each country’s name into the series names, while the second representation (Table 12.2) represents countries as different cross sections by using the BY variable named COUNTRY. See Time Series and SAS Data Sets in Chapter 3: Working with Time Series Data.