The DATASOURCE Procedure 
OUT= Data Set 
The OUT= data set can contain the following variables:
the BY variables, which identify crosssectional dimensions when the input data file contains time series replicated for different values of the BY variables. Use the BY variables in a WHERE statement to process the OUT= data set by cross sections. The order in which BY variables are defined in the OUT= data set corresponds to the order in which the data file is sorted.
DATE, a SAS date, time, or datetimevalued variable that reports the time period of each observation. The values of the DATE variable may span different time ranges for different BY groups. The format of the DATE variable depends on the INTERVAL= option.
the periodic time series variables, which are included in the OUT= data set only if they have data in at least one selected BY group and they are not discarded by a KEEP or DROP statement
the event variables, which are included in the OUT= data set if they are not discarded by a KEEP or DROP statement. By default, these variables are not output to OUT= data set.
The values of BY variables remain constant in each cross section. Observations within each BY group correspond to the sampling of the series variables at the time periods indicated by the DATE variable.
You can create a set of single indexes for the OUT= data set by using the INDEX option, provided there are BY variables. Under some circumstances, this may increase the efficiency of subsequent PROC and DATA steps that use BY and WHERE statements. However, there is a cost associated with creation and maintenance of indexes. The SAS Language Reference: Concepts lists the conditions under which the benefits of indexes outweigh the cost.
With data files containing cross sections, there can be various degrees of overlap among the series variables. One extreme is when all the series variables contain data for all the cross sections. In this case, the output data set is very compact. In the other extreme case, however, the set of time series variables are unique for each cross section, making the output data set very sparse, as depicted in Table 11.4.
BY 
Series in 
Series in 

Series in 
Variables 
first BY group 
second BY group 

last BY group 
BY1 BYP 
F1 F2 F3 FN 
S1 S2 S3 SM 

T1 T2 T3 TK 
BY 
DATA 

group 
is 

1 
here 

BY 
DATA 
data is missing 

group 
is 
everywhere except 

2 
here 
on diagonal 

DATA 


is 

here 

BY 
DATA 

group 
is 

N 
here 
The data in Table 11.4 can be represented more compactly if crosssectional information is incorporated into series variable names.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.