Delimited Files

Overview

In computer programming, a delimited text file is a file in which the individual data values contain embedded delimiters, such as quotation marks, commas, and tabs. A delimiter is a character that separates words or phrases in a text string that defines the beginning or end of a contiguous string of character data.
  • The delimiter is not considered part of the character data string.
  • The first row of data is usually read as column headings.
  • The column headings are then converted to SAS variable names.
  • A line character indicates a new row.
A delimited text file is also called a delimiter-separated values file (CSV or DSM).
Note: Support of delimited files is included in Base SAS. The SAS/ACCESS to PC Files license is not needed to use this list of features.

CSV Files

A comma-separated values file is a form of a delimited file. The data values are separated by commas. In a CSV-type file, each line can represent one of these items:
  • an entry
  • a record
  • a row
  • an observation in a database management system
  • other applications

Tab-Delimited Files

A tab-delimited file is a form of delimited file. The data values are separated by control characters that represent the TAB key. The data values form columns of a database table. The columns can be exported to a database table.

Other Delimiters

Files that have other delimiters such as spaces or semicolons are also known as delimited text files or delimited files.

External File Interface (EFI)

The SAS Import and Export Wizards use the SAS External File Interface methods to read and write data in delimited external files. Be aware of these behaviors when using the wizards and procedures to import or export data in delimited files.
  • When data values are enclosed in quotation marks, delimiters within the value are treated as character data.
  • Quotation marks are removed from character values.
  • Two consecutive delimiters indicate a missing value.
  • A delimiter can be specified as one or more characters.
  • While exporting data, the EXPORT procedure discards items that exceed the output line length. See the DROPOVER= option in the FILE statement in SAS Statements: Reference.
  • The delimiter can be in binary form. For example: delimiter='09'X
  • As the IMPORT procedure reaches the end of the current input data row, variables without any values are set to missing.

IMPORT and EXPORT Procedure Statements for Delimited Files

The supported file types are CSV (comma-separated values), TAB (tab-separated values), and DLM (delimiter-separated values).
IMPORT and EXPORT Procedure Statements
Delimited File Type
Statement Options
PROC Import
PROC Export
Valid Value
Default Value
CSV and TAB
DATAROW
Yes
No
1 to 2147483647
Depends on GETNAMES= option value
GETNAMES
Yes
No
Yes | No
Yes
GUESSINGROWS
Yes
No
1 to 2147483647
20
PUTNAMES
No
Yes
Yes | No
Yes
DLM
DATAROW
Yes
No
1 to 2147483647
2
DELIMITER
Yes
Yes
'char' | 'nn'x
' '
GETNAMES
Yes
No
Yes | No
Yes
GUESSINGROWS
Yes
No
1 to 2147483647
20
PUTNAMES
No
Yes
Yes | No
Yes
DATAROW= n
specifies the row number where the IMPORT procedure starts reading data.
Default:When GETNAMES=NO: 1; when GETNAMES=YES: 2
Range:1 to 2147483647
Restrictions:If GETNAMES = Yes, then DATAROW must be greater than or equal to 2.

If GETNAMES = No, then DATAROW must be greater than or equal to 1.

DELIMITER= 'char' | 'nn'x
specifies the delimiter (either a single character or hexadecimal value) that separates the columns of data for the IMPORT and EXPORT procedures.
Default:A blank character
Restriction:Support only for DLM type files.
GETNAMES= YES | NO
specifies whether the IMPORT procedure is to generate SAS variable names from the data values in the first row of the import file. If a data value in the first record contains special characters that are not valid in a SAS name, SAS converts the character to an underscore. For example, The column name Occupancy Code becomes the SAS variable name Occupancy_Code.
YES specifies that the IMPORT procedure generate SAS variable names from the data values in the first row of the imported Excel file.
NO specifies that the IMPORT procedure generate SAS variable names as F1, F2, F3, and so on.
Default:Yes
Restriction: Valid only for the IMPORT procedure.
GUESSINGROWS= n
specifies the number of rows that the IMPORT procedure is to scan to determine the appropriate data type for the columns. The scan process scans from row 1 to the row number that is specified by GUESSINGROWS = option.
The default row number can be changed in the SAS REGISTRY as follows: From the SAS menu, Solutions then select Accessories then selectRegistry Editor.
When the Registry Editor opens, select Products then selectBASE then selectEFI then selectGuessingRows. This opens the Edit Signed Integer Value window, where you can modify the Value Data item.
Default:20
Range:1 to 2147483647
PUTNAMES= YES | NO
YES specifies that the EXPORT procedure is to do the following tasks:
  • Write the SAS variable names to the first row of the exported data file as column headings.
  • Write the first row of the SAS data set to the second row of the exported data file.
NO specifies that the EXPORT procedure is to write the first row of SAS data set values to the first row of the exported data file.
Default:Yes
Restriction:Valid only for the EXPORT procedure.
Note:If you specify the LABEL= option, the SAS variable labels (not the variable names) are written as column headings.

Example 1: Import a Tab-Delimited File into SAS

This code illustrates how the IMPORT procedure uses the first row of the tab delimited file to generate SAS variable names. SAS starts to read data from row 2, and scans 10 rows of data to determine data types for each column. The invoice.txt file saves data with the tab character ('09'x) as the delimiter.
PROC IMPORT OUT=WORK.TEST
    FILE="&dlmdir.\invoice.txt"
        DBMS=TAB REPLACE;
        GETNAMES=YES;
    DATAROW=2;
    GUESSINGROWS=10;
RUN;

Example 2: Import a Space-Delimited File into SAS

The IMPORT procedure generates generic variable names such as VAR1 and VAR2. It starts to read data from row 2, and scans the default number of rows (20) to determine the data types for each column. '20'x is the hexadecimal value for a space in ASCII code.
 PROC IMPORT OUT=WORK.TEST
         DATAFILE="&dlmdir.\invoice.txt"
         DBMS=DLM REPLACE;
     DELIMITER='20'x;
     GETNAMES=NO;
     DATAROW=2;
RUN;

Example 3: Export a SAS Data Set to a CSV File

The EXPORT procedure exports the SAS data set, SDF.INVOICE, to a CSV file; invoice.csv. The SAS variable name is not used. The first row of the SAS data set is written to the first row of the CSV file.
LIBNAME SDF "&sasdir";
PROC EXPORT DATA=SDF.INVOICE
         OUTFILE='c:\temp\invoice.csv'
         DBMS=DLM REPLACE;
     DELIMITER=',';
     PUTNAMES=NO;
RUN;

Example 4: Import a Subset of a CSV File into SAS

The IMPORT procedure starts to read data in row 6. It reads ten observations from the selected columns in the customer CSV file. The global OBS= option limits the number of data rows to import. The OBS=MAX option resets the OBS= option to the default value.
OPTIONS OBS=10;
PROC IMPORT OUT= WORK.Test (KEEP=Customer_ID Name Address First-Ordered_Date)
         DATAFILE= "&dlmdir.\customer.csv"
         DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=6;
RUN;
OPTIONS OBS=MAX;