About Creating a SAS Data Set with a DATA Step

Creating a SAS Data File or a SAS View

You can create either a SAS data file, a data set that holds actual data, or a SAS view, a data set that references data that is stored elsewhere. By default, you create a SAS data file. To create a SAS view instead, use the VIEW= option in the DATA statement. With a SAS view, you can process current input data values without having to edit your DATA step. For example, you can process monthly sales figures without having to edit your DATA step. Whenever you need to create output, the output from a SAS view reflects the current input data values.
The following DATA statement creates a SAS view called MONTHLY_SALES.
data monthly_sales / view=monthly_sales;
The following DATA statement creates a data file called TEST_RESULTS.
data test_results;

Sources of Input Data

You select data-reading statements based on the source of your input data. There are at least six sources of input data:
  • raw data in an external file
  • raw data in the jobstream (instream data)
  • data in SAS data sets
  • data that is created by programming statements
  • data that you can remotely access through an FTP protocol, TCP/IP socket, a SAS catalog entry, or through a URL
  • data that is stored in a Database Management System (DBMS) or other vendor's data files.
Usually, DATA steps read input data records from only one of the first three sources of input. However, DATA steps can use a combination of some or all of the sources.

Reading Raw Data: Examples

Example 1: Reading External File Data

The components of a DATA step that produce a SAS data set from raw data stored in an external file are outlined here.
data Weight;  1
   infile 'your-input-file';  2
   input IDnumber $ week1 week16;  3
   WeightLoss=week1-week16;  4
run;  5

proc print data=Weight;  6
run;  7
1 Begin the DATA step and create a SAS data set called WEIGHT.
2 Specify the external file that contains your data.
3 Read a record and assign values to three variables.
4 Calculate a value for variable WeightLoss.
5 Execute the DATA step.
6 Print data set WEIGHT using the PRINT procedure.
7 Execute the PRINT procedure.

Example 2: Reading Instream Data Lines

This example reads raw data from instream data lines.
data Weight2;  1
   input IDnumber $ week1 week16;  2
   AverageLoss=week1-week16;  3
   datalines;  4
2477 195 163
2431 220 198
2456 173 155
2412 135 116
;  5
proc print data=Weight2;  6
run;
1 Begin the DATA step and create SAS data set WEIGHT2.
2 Read a data line and assign values to three variables.
3 Calculate a value for variable WeightLoss2.
4 Begin the data lines.
5 Signal end of data lines with a semicolon and execute the DATA step.
6 Print data set WEIGHT2 using the PRINT procedure.
7 Execute the PRINT procedure.

Example 3: Reading Instream Data Lines with Missing Values

You can also take advantage of options in the INFILE statement when you read instream data lines. This example shows the use of the MISSOVER option, which assigns missing values to variables for records that contain no data for those variables.
data
weight2;  
   infile datalines missover;  1
   input IDnumber $ Week1 Week16;  
   WeightLoss2=Week1-Week16;
   datalines;  2
2477 195  163
2431 
2456 173  155
2412 135  116
;  3

proc print data=weight2;  4
run;  5
1 Use the MISSOVER option to assign missing values to variables that do not contain values in records that do not satisfy the current INPUT statement.
2 Begin data lines.
3 Signal end of data lines and execute the DATA step.
4 Print data set WEIGHT2 using the PRINT procedure.
5 Execute the PRINT procedure.

Example 4: Using Multiple Input Files in Instream Data

This example shows how to use multiple input files as instream data to your program. This example reads the records in each file and creates the ALL_ERRORS SAS data set. The program then sorts the observations by Station, and creates a sorted data set called SORTED_ERRORS. The print procedure prints the results.

data all_errors;
   length filelocation $ 60;
   input filelocation;  /* reads instream data */
   infile daily filevar=filelocation
                filename=daily end=done;
   do while (not done);
      input Station $ Shift $ Employee $ NumberOfFlaws;
      output;
   end;
   put 'Finished reading ' daily=;
   datalines;
pathmyfile_A
pathmyfile_B
pathmyfile_C
;

proc sort data=all_errors out=sorted_errors;
   by Station;
run;

proc print data = sorted_errors;
   title 'Flaws Report sorted by Station';
run;
Multiple Input Files in Instream Data
Multiple Input Files in Instream Data

Reading Data from SAS Data Sets

This example reads data from one SAS data set, generates a value for a new variable, and creates a new data set.
data average_loss;  1  
   set weight;  2
   Percent=round((AverageLoss * 100) / Week1);  3
run;  4
1 Begin the DATA step and create a SAS data set called AVERAGE_LOSS.
2 Read an observation from SAS data set WEIGHT.
3 Calculate a value for variable Percent.
4 Execute the DATA step.

Generating Data from Programming Statements

You can create data for a SAS data set by generating observations with programming statements rather than by reading data. A DATA step that reads no input goes through only one iteration.
data investment;  1  
   begin='01JAN1990'd;
   end='31DEC2009'd;
   do year=year(begin) to year(end);  2  
      Capital+2000 + .07*(Capital+2000);
      output;  3     
   end;
   put 'The number of DATA step iterations is '_n_;  4 
run;  5     

proc print data=investment;  6   
   format Capital dollar12.2;  7  
run;  8  
1 Begin the DATA step and create a SAS data set called INVESTMENT.
2 Calculate a value based on a $2,000 capital investment and 7% interest each year from 1990 to 2009. Calculate variable values for one observation per iteration of the DO loop.
3 Write each observation to data set INVESTMENT.
4 Write a note to the SAS log proving that the DATA step iterates only once.
5 Execute the DATA step.
6 To see your output, print the INVESTMENT data set with the PRINT procedure.
7 Use the FORMAT statement to write numeric values with dollar signs, commas, and decimal points.
8 Execute the PRINT procedure.