In directives where
Spark is not the preferred runtime target, character columns are truncated
based on the value of the field Maximum length for SAS
columns. This field is available in the General
Preferences panel of the Configuration window.
The default value is 1024 characters. Source columns with string data
types such as VAR and VARCHAR are truncated in SAS when their length
exceeds the specified limit. The truncation occurs when SAS reads
source columns into memory.
In Spark-enabled directives,
the truncation of string columns differs between source columns that
return a length, and source columns that do not return a length. Hive
releases prior to 0.14.0 do not return a length for VAR and VARCHAR
columns.
When Spark is enabled,
and when columns do return a string length,
strings are truncated according to the value of the configuration
option EXPRESS_MAX_STRING_LENGTH. The value of the Maximum
length for SAS columns field is ignored.
When Spark is enabled,
and when string columns do not return a length,
strings are truncated differently. The maximum string length is determined
by the lesser value of the configuration option EXPRESS_MAX_STRING_LENGTH
or the field Maximum length for SAS columns.
The default value of
the EXPRESS_MAX_STRING_LENGTH configuration option is 5 MB. To specify
a different value, ask your Hadoop administrator to update the app.cfg
file on each node that runs the SAS Data Management Accelerator for
Spark. In those files, add or update the label/value pair for EXPRESS_MAX_STRING_LENGTH.
Note: The value of EXPRESS_MAX_STRING_LENGTH
also specifies the maximum amount of memory that is allocated for
the underlying expression. For this reason, Hadoop administrators
should be judicious when changing the default value.
VAR and VARCHAR columns
that do not return a length are converted to the STRING type in the
target so that they can receive a default length. To retain the original
column types, use the Manage Columns task in the directive. In Manage
Columns, the type of the target column needs to be VAR or VARCHAR
and a length specification is required.