Reading Raw Data |
Definitions |
contains only numbers, and sometimes a decimal point, a minus sign, or both. When they are read into a SAS data set, numeric values are stored in the floating-point format native to the operating environment. Nonstandard numeric values can contain other characters as numbers; you can use formatted input to enable SAS to read them.
are character or numeric values that can be read with list, column, formatted, or named input. Examples of standard data include:
is data that can be read only with the aid of informats. Examples of nonstandard data include numeric values that contain commas, dollar signs, or blanks; date and time values; and hexadecimal and binary values.
Numeric Data |
Numeric data can be represented in several ways. SAS can read standard numeric values without any special instructions. To read nonstandard values, SAS requires special instructions in the form of informats. Reading Different Types of Numeric Data shows standard, nonstandard, and invalid numeric data values and the special tools, if any, that are required to read them. For complete descriptions of all SAS informats, see SAS Language Reference: Dictionary.
Example of Numeric Data | Description | Solution Required to Read | |
---|---|---|---|
Standard Numeric Data | |||
23 | input right aligned | None needed | |
23 | input not aligned | None needed | |
23 | input left aligned | None needed | |
00023 | input with leading zeros | None needed | |
23.0 | input with decimal point | None needed | |
2.3E1 | in E-notation, 2.30 (ss1) | None needed | |
230E-1 | in E-notation, 230x10 (ss-1) | None needed | |
-23 | minus sign for negative numbers | None needed | |
Nonstandard Numeric Data | |||
2 3 | embedded blank | COMMA. or BZ. informat | |
- 23 | embedded blank | COMMA. or BZ. informat | |
2,341 | comma | COMMA. informat | |
(23) | parentheses | COMMA. informat | |
C4A2 | hexadecimal value | HEX. informat | |
1MAR90 | date value | DATE. informat | |
Invalid Numeric Data | |||
23 - | minus sign follows number | Put minus sign before number or solve programmatically. (table note 1) | |
.. | double instead of single periods | Code missing values as a single period or use the ?? modifier in the INPUT statement to code any invalid input value as a missing value. | |
J23 | not a number | Read as a character value, or edit the raw data to change it to a valid number. |
TABLE NOTE 1: It might be possible to use the S370FZDTw.d informat, but positive values require the trailing plus sign (+).
Remember the following rules for reading numeric data:
Parentheses or a minus sign preceding the number (without an intervening blank) indicates a negative value.
Leading zeros and the placement of a value in the input field do not affect the value assigned to the variable. Leading zeros and leading and trailing blanks are not stored with the value. Unlike some languages, SAS does not read trailing blanks as zeros by default. To cause trailing blanks to be read as zeros, use the BZ. informat described in SAS Language Reference: Dictionary.
Numeric data can have leading and trailing blanks but cannot have embedded blanks (unless they are read with a COMMA. or BZ. informat).
To read decimal values from input lines that do not contain explicit decimal points, indicate where the decimal point belongs by using a decimal parameter with column input or an informat with formatted input. See the full description of the INPUT statement in SAS Language Reference: Dictionary for more information. An explicit decimal point in the input data overrides any decimal specification in the INPUT statement.
Character Data |
A value that is read with an INPUT statement is assumed to be a character value if one of the following is true:
A dollar sign ($) follows the variable name in the INPUT statement.
The variable has been previously defined as character: for example, in a LENGTH statement, in the RETAIN statement, by an assignment statement, or in an expression.
Input data that you want to store in a character variable can include any character. Use the guidelines in the following table when your raw data includes leading blanks and semicolons.
Characters in the Data | What to Use | Reason | |
---|---|---|---|
leading or trailing blanks that you want to preserve | formatted input and the $CHARw. informat | List input trims leading and trailing blanks from a character value before the value is assigned to a variable. | |
semicolons in instream data | DATALINES4 or CARDS4 statements and four semicolons (;;;;) to mark the end of the data | With the normal DATALINES and CARDS statements, a semicolon in the data prematurely signals the end of the data. | |
delimiters, blank characters, or quoted strings | DSD option, with DLM= or DLMSTR= option on the INFILE statement | These options enable SAS to read a character value that contains a delimiter within a quoted string; these options can also treat two consecutive delimiters as a missing value and remove quotation marks from character values. |
Remember the following when reading character data:
In a DATA step, when you place a dollar sign ($) after a variable name in the INPUT statement, character data that is read from data lines remains in its original case. If you want SAS to read data from data lines as uppercase, use the CAPS system option or the $UPCASE informat.
If the value is shorter than the length of the variable, SAS adds blanks to the end of the value to give the value the specified length. This process is known as padding the value with blanks.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.