Ways to Create Variables

Overview

These are some of the most common ways that you can create variables in a DATA step:
  • Use an assignment statement.
  • Read data with the INPUT statement in a DATA step.
  • Specify a new variable in a FORMAT or INFORMAT statement.
  • Specify a new variable in a LENGTH statement.
  • Specify a new variable in an ATTRIB statement.
Note: This list is not exhaustive. For example, the SET, MERGE, MODIFY, and UPDATE statements can also create variables.

Using an Assignment Statement

In a DATA step, you can create a new variable and assign it a value by using it for the first time on the left side of an assignment statement. SAS determines the length of a variable from its first occurrence in the DATA step. The new variable gets the same type and length as the expression on the right side of the assignment statement.
When the type and length of a variable are not explicitly set, SAS gives the variable a default type and length, as shown in the examples in the following table.
Resulting Variable Types and Lengths Produced When They Are Not Explicitly Set
Expression
Example
Resulting Type of X
Resulting Length of X
Explanation
Numeric variable
length a 4;
x=a;
Numeric variable
8
Default numeric length (8 bytes unless otherwise specified)
Character variable
length a $ 4;
x=a;
Character variable
4
Length of source variable
Character literal
x='ABC';
x='ABCDE';
Character variable
3
Length of first literal encountered
Concatenation of variables
length a $ 4
b $ 6
c $ 2;
x=a||b||c;
Character variable
12
Sum of the lengths of all variables
Concatenation of variables and literal
length a $ 4;
x=a||'CAT';
x=a||'CATNIP';
Character variable
7
Sum of the lengths of variables and literals encountered in first assignment statement
If a variable appears for the first time on the right side of an assignment statement, SAS assumes that it is a numeric variable and that its value is missing. If no later statement gives it a value, SAS prints a note in the log that the variable is uninitialized.
Note: A RETAIN statement initializes a variable and can assign it an initial value, even if the RETAIN statement appears after the assignment statement.

Reading Data with the INPUT Statement in a DATA Step

When you read raw data in SAS by using an INPUT statement, you define variables based on positions in the raw data. You can use one of the following methods with the INPUT statement to provide information to SAS about how the raw data is organized:
  • column input
  • list input (simple or modified)
  • formatted input
  • named input
See SAS Formats and Informats: Reference for more information about using each method.
The following example uses simple list input to create a SAS data set named GEMS and defines four variables based on the data provided:
data gems;
   input Name $ Color $ Carats Owner $;
   datalines;
emerald green 1 smith
sapphire blue 2 johnson
ruby red 1 clark
;

Specifying a New Variable in a FORMAT or an INFORMAT Statement

You can create a variable and specify its format or informat with a FORMAT or an INFORMAT statement. For example, the following FORMAT statement creates a variable named SALE_PRICE with a format of 6.2 in a new data set named SALES:
data sales;
   Sale_Price=49.99;  
   format Sale_Price 6.2;
run;
SAS creates a numeric variable with the name SALE_PRICE and a length of 8.
See SAS Formats and Informats: Reference for more information about using the FORMAT and INFORMAT statements.

Specifying a New Variable in a LENGTH Statement

You can use the LENGTH statement to create a variable and set the length of the variable, as in the following example:
data sales; 
   length Salesperson $20;
run; 
For character variables, you must use the longest possible value in the first statement that uses the variable, because you cannot change the length with a subsequent LENGTH statement within the same DATA step. The maximum length of any character variable in SAS is 32,767 bytes. For numeric variables, you can change the length of the variable by using a subsequent LENGTH statement.
When SAS assigns a value to a character variable, it pads the value with blanks or truncates the value on the right side, if necessary, to make it match the length of the target variable. Consider the following statements:
length address1 address2 address3 $ 200;
address3=address1||address2;
Because the length of ADDRESS3 is 200 bytes, only the first 200 bytes of the concatenation (the value of ADDRESS1) are assigned to ADDRESS3. You might be able to avoid this problem by using the TRIM function to remove trailing blanks from ADDRESS1 before performing the concatenation, as follows:
address3=trim(address1)||address2;
For more information, see LENGTH Statement in SAS Statements: Reference.

Specifying a New Variable in an ATTRIB Statement

The ATTRIB statement enables you to specify one or more of the following variable attributes for an existing variable:
  • FORMAT=
  • INFORMAT=
  • LABEL=
  • LENGTH=
If the variable does not already exist, one or more of the FORMAT=, INFORMAT=, and LENGTH= attributes can be used to create a new variable. For example, the following DATA step creates a variable named FLAVOR in a data set named LOLLIPOPS:
data lollipops;
   Flavor="Cherry";
   attrib Flavor format=$10.;
run;
Note: You cannot create a new variable by using a LABEL statement or the ATTRIB statement's LABEL= attribute by itself. Labels can be applied only to existing variables.
For more information, see ATTRIB Statement in SAS Statements: Reference.

Using the IN= Data Set Option

The IN= data set option creates a special Boolean variable that indicates whether the data set contributed data to the current observation. The variable has a value of 1 when true, and a value of 0 when false. You can use IN= on the SET, MERGE, and UPDATE statements in a DATA step.
The following example shows a merge of the OLD and NEW data sets where the IN= option is used to create a variable named X that indicates whether the NEW data set contributed data to the observation:
data master missing;
   merge old new(in=x);
   by id;
   if x=0 then output missing;
   else output master;
run;