Working with Character Variables |
This example illustrates why you may want to specify a length for a character variable, rather than let the first assigned value determine the length. Because New York City has two airports, both the abbreviations for John F. Kennedy International Airport and La Guardia Airport can be assigned to the Airport variable as in the DATA step.
Note: When you create character variables, SAS determines the length of the variable from its first occurrence in the DATA step. Therefore, you must allow for the longest possible value in the first statement that mentions the variable. If you do not assign the longest value the first time the variable is assigned, then data can be truncated.
/* first attempt */ options pagesize=60 linesize=80 pageno=1 nodate; data aircode; set mylib.departures; if USGate = 'San Francisco' then Airport = 'SFO'; else if USGate = 'Honolulu' then Airport = 'HNL'; else if USGate = 'New York' then Airport = 'JFK or LGA'; run; proc print data=aircode; var Country USGate Airport; title 'Country by US Point of Departure'; run;
The following output displays the results:
Truncation of Character Values
Country by US Point of Departure 1 Obs Country USGate Airport 1 Japan San Francisco SFO 2 Italy New York JFK 3 Australia Honolulu HNL 4 Venezuela Miami 5 Brazil
Only the characters JFK appear in the observation for New York. SAS first encounters Airport in the statement that assigns the value SFO. Therefore, SAS creates Airport with a length of three bytes and uses only the first three characters in the New York observation.
To allow space to write JFK or LGA, use a LENGTH statement as the first reference to Airport. The LENGTH statement is a declarative statement and has the form
LENGTH variable-list $ number-of-bytes; |
where variable-list is the variable or variables to which you are assigning the length number-of-bytes. The dollar sign ($) indicates that the variable is a character variable. The LENGTH statement determines the length of a character variable in both the program data vector and the data set that are being created. (In contrast, a LENGTH statement determines the length of a numeric variable only in the data set that is being created.) The maximum length of any character value in SAS is 32,767 bytes.
This LENGTH statement assigns a length of 10 to the character variable Airport:
length Airport $ 10;
Note: If you use a LENGTH statement to assign a length to a character variable, then it must be the first reference to the character variables in the DATA step. Therefore, the best position in the DATA step for a LENGTH statement is immediately after the DATA statement.
The following DATA step includes the LENGTH statement for Airport. Remember that you can use the DATASETS procedure to display the length of variables in a SAS data set.
/* correct method */ options pagesize=60 linesize=80 pageno=1 nodate; data aircode2; length Airport $ 10; set mylib.departures; if USGate = 'San Francisco' then Airport = 'SFO'; else if USGate = 'Honolulu' then Airport = 'HNL'; else if USGate = 'New York' then Airport = 'JFK or LGA'; else if USGate = 'Miami' then Airport = 'MIA'; run; proc print data=aircode2; var Country USGate Airport; title 'Country by US Point of Departure'; run;
The following output displays the results:
Using a LENGTH Statement to Capture Complete Variable Information
Country by US Point of Departure 1 Obs Country USGate Airport 1 Japan San Francisco SFO 2 Italy New York JFK or LGA 3 Australia Honolulu HNL 4 Venezuela Miami MIA 5 Brazil
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.