Data Preparation

Width Limitation for Text Fields

Character variables in SAS data sets cannot exceed 32K. So, when using the TEXTPARSE statement in a SAS program, you are limited to 32K for a text fields.

Encoding

The linguistic data files and libraries are sensitive to the NLS encoding. The encoding used in the TEXTPARSE statement is derived from the encoding of the active table that you parse.

A Document ID Variable Is Required

The TEXTPARSE statement requires that each document is uniquely identified by a column in the table. This column essentially holds a record ID in the table, as each row is considered a separate document.