SAS Data Set Reference

Overview

The SAS data set is a SASProprietary file format, which contains data values that are organized as a table of rows (SAS observations) and columns (SAS variables). A supported SAS data set uses the extension .sas7bdat.

Understanding the Driver for Base SAS 

The table services driver for Base SAS is a SASProprietary driver that provides Read and Update access to legacy SAS data sets. With the table services driver for Base, you can create SAS data sets that can be accessed by both the legacy and the table services data access services.
The driver supports much of the Base SAS functionality, such as SAS indexing and general integrity constraints, as well as much of the Federated Query Language (FedSQL) functionality.
The table services driver for Base SAS is an in-process driver, which means that it accesses data in the same process that executes the data access services. All server connections that are made with the table services driver for Base SAS use LOCKTABLE=SHARED and PATH_BIND=ACCESS connection options.

Data Service Connection Options for SAS Data Sets

Connection Options

To access data that is hosted on the table services, a client must submit a connection string, which defines how to connect to the data. The data service connection arguments for a SAS data set include connection options and advanced options. Here is an example:
driver=sql;conopts=(driver=base;catalog=acat;
schema=(name=dblib;primarypath=/u/path/mydir))
The following connection options are supported for SAS data sets:
Option
Description
CATALOG
CATALOG=catalog-identifier;
Specifies an arbitrary identifier for an SQL catalog, which groups logically related schemas. A catalog name can be up to 32 characters long. You must specify a catalog.
Note: The FedSQL language processor automatically quotes SQL identifiers that do not meet the regular naming convention as defined in SAS FedSQL Reference Guide.
DRIVER
DRIVER=BASE;
Identifies the data service to which you want to connect, which is a SAS data set.
Note: You must specify DRIVER=BASE to access a SAS data set.
(SCHEMA) NAME
NAME=schema-identifier;
Specifies an arbitrary identifier for an SQL schema. Any identifier is valid (for example, name=myfiles). The schema identifier is an alias for the physical location of the SAS library, which is much like the Base SAS libref. A schema name must be a valid SAS name and can be up to 32 characters long. You must specify a schema identifier.
PRIMARY PATH
PRIMARYPATH=physical-location;
Specifies the physical location for the SAS library, which is a collection of one or more SAS files. For example, in directory-based operating environments, a SAS library is a group of SAS files that are stored in the same directory. 
Note: You must specify a primary path.
SCHEMA (ATTRIBUTES)
SCHEMA=(attributes);
Specifies schema attributes that are specific to a SAS data set. A schema is a data container object that groups tables. The schema contains a name, which is unique within the catalog that qualifies table names. For a SAS data set, a schema is similar to a SAS library, which is a collection of tables with assigned attributes.

Advanced Options

Advanced driver options are additional options that are not required in order to connect to the data source. They are used to establish connections to catalogs, data source names (DSNs), and schemas. Although advanced options can also be used when connecting to a data service, doing so causes the specified options to apply to all data service connections.
The following advanced options are supported for SAS data sets:
Option
Description
ACCESS
ACCESS=READONLY | TEMP;
  • READONLY Assigns a read-only attribute to the schema. You cannot open a SAS data set to update or write new information.
  • TEMP specifies that the SAS data sets be treated as scratch files. That is, the system will not consume CPU cycles to ensure that the files do not become corrupted.
    Tip
    Use ACCESS=TEMP to save resources only when the data is recoverable. If TEMP is specified, data in memory might not be written to disk on a regular basis. This saves I/O, but could cause a loss of data if there is a crash.
CT_PRESERVE
CT_PRESERVE = STRICT | SAFE | FORCE | FORCE_COL_SIZE
Enables users to control how data types are mapped. Note that data type mapping is disabled when CT_PRESERVE is set to STRICT. If the requested type does not exist on the target database, an error is returned. Here are the options:
  • STRICT The requested type must exist in the target database. No type promotion occurs. If the type does not exist, an error is returned.
  • SAFE Target data types are upscaled only if they do not result in a loss of precision or scale. When character encodings are changed, the new column size is recalculated to ensure all characters can be stored in the new encoding.
  • FORCE This is the default for all drivers. The best corresponding target data type is chosen, even if it could potentially result in a loss of precision or scale. When character encodings are changed, the new column size is recalculated to ensure that all characters can be stored in the new encoding.
  • FORCE_COL_SIZE This option is the same as FORCE, except that the column size for the new encoding is the same as the original encoding. This option can be used to avoid column size creep. However, the resulting column might be too large or too small for the target data.
COMPRESS
COMPRESS=NO | YES | CHAR | BINARY;
Controls the compression of rows in created SAS data sets.
  • NO Specifies that the rows in a newly created SAS data set are uncompressed (fixed-length records). This setting is the default.
  • YES | CHAR Specifies that the rows in a newly created SAS data set are compressed (variable-length records) by using RLE (Run Length Encoding). RLE compresses rows by reducing repeated consecutive characters (including blanks) to two- or three-byte representations.
    Tip
    Use this compression algorithm for character data.
  • BINARY Specifies that the rows in a newly created SAS data set are compressed (variable-length records) by using RDC (Ross Data Compression). RDC combines run-length encoding and sliding-window compression to compress the file.
    Tip
    This method is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data (numeric columns). Because the compression function operates on a single record at a time, the record length must be several hundred bytes or larger for effective compression.
DEFAULT_ATTR
DEFAULT_ATTR=(attr=value;...)
Used to specify connection handle or statement handle attributes that are supported for initial connect-time configuration, where attr=value corresponds to any of the following options:
  • CURSORS=n- Connection handle option. This option controls the driver’s use of client-side, result set cursors. The possible values are 0, 1, or 2.
    0 Causes the driver to use client-side static cursor emulation if a scrollable cursor is requested but the database server cannot provide one.
    1 Causes the driver to always use client-side static cursor emulation if a scrollable cursor is requested. The database server’s native cursor is not used.
    2 (Default) Causes the driver to never use client-side static cursor emulation if a scrollable cursor is requested. The database server’s native cursor is used if available. Otherwise, the cursor is forward-only.
    Example: DEFAULT_ATTR=(CURSORS=2)
  • USE_EVP=n - Statement handle option. This option optimizes the driver for large result sets. The possible values are 0 (OFF) or 1 (ON), which is the default. Example: DEFAULT_ATTR=(USE_EVP=0)
  • XCODE_WARN=n - Statement handle option. Used to warn about possible character transcoding errors that occur during row input or output operations. Possible values are 0 (returns an error), 1 (returns a warning), or 2 (ignore transaction errors). 0 is the default. Example: DEFAULT_ATTR=(XCODE_WARN=1)
ENCODING
ENCODING=encoding-value;
Overrides and transcodes the encoding for input or output processing of SAS data sets.
Note: The default value is the current operating system setting.
LOCKTABLE
LOCKTABLE=SHARED|EXCLUSIVE
Places exclusive or shared locks on SAS data sets. You can lock tables only if you are the owner or have been granted the necessary privilege. The default value for the table services is SHARED.
  • SHARED Locks tables in shared mode, allowing other users or processes to read data from the tables, but preventing other users from updating.
  • EXCLUSIVE Locks tables exclusively, preventing other users from accessing any table that you open.
PATH_BIND
PATH_BIND=CONNECT|ACCESS
Specifies when and how schemas are validated during connection. CONNECT validates the entire connection string at the time of connection and returns an error if one or more schemas is invalid. ACCESS validates schemas when they are accessed so that processing continues regardless of errors in the schema portion of the connection string. ACCESS is the default for the table services.