Data Encryption

Overview

The SAS LASR Analytic Server 2.6 release introduces on-disk encryption for SASHDAT tables. SASHDAT tables are created with the SASHDAT engine or are saves to disk by the SAS LASR Analytic Server. AES encryption with 256-bit keys is used.
To comply with import and export restrictions, the encryption software is delivered in an installation program that is separate from the program that installs the server. The SAS TKGrid Encryption Extension is installed with the TKGrid_Sec_x86_64.sh file. For information about installation, see SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide.
The data in a SASHDAT file is encrypted. The header is not encrypted and this enables procedures like PROC CONTENTS and components like HDFS browsers can show column information.
Note: If you implement encryption and find that performance suffers due to the processing required to decrypt data, consider anonymizing the data before it is transferred to the cluster and avoiding encryption.
For deployments that use SAS metadata, consider the following items:
  • Passphrases are not preserved during promotion. After the initial import of an encrypted SASHDAT library or server, you must use SAS Management Console to re-apply the passphrase in the target environment.
  • If you export or copy metadata for a SASHDAT library with encryption properties, the encryption key is ignored. You must use SAS Management Console to re-apply the passphrase.

Example: SAS Metadata Environment

In a deployment that uses SAS metadata, such as SAS Visual Analytics, administrators can register the encryption settings in SAS metadata. The metadata for the connection object for the Hadoop server must enable the SAS LASR authorization service. Users must have the Read permission on the SASHDAT table that is registered in metadata. Additional considerations are described in SAS Visual Analytics: Administration Guide.
options set=GRIDHOST="grid001.example.com" set=GRIDINSTALLLOC="/opt/TKGrid";
options metaserver="server.example.com" metaport=8561;  1
options metauser=sasdemo metapass="secret";

libname hdfs sashdat path="/hps" signer=
     "https://server.example.com/SASLASRAuthorization";  2

data hdfs.heart(replace=yes);  3
   set sashelp.heart;
run;

proc lasr create port=10010 
    signer="https://server.example.com/SASLASRAuthorization";  4
  performance nodes=all;
run;

proc lasr add data=hdfs.heart signer=
    "https://server.example.com/SASLASRAuthorization"  5
    signerfilepolicy noclass port=10010 verbose;
run;

libname example sasiola tag="hps" 
    signer="https://server.example.com/SASLASRAuthorization";

proc imstat signer="https://server.example.com/SASLASRAuthorization";
  table example.heart;
  save fullpath path="/hps/heart2" signerfilepolicy replace;  6
run;
1 The metadata-related options enable the SAS session to communicate with the SAS Metadata Server and to read encryption settings that are stored in metadata.
2 The SIGNER= option is used so that the engine can determine the metadata settings that are associated with a library. This enables the engine to exchange keys with the metadata server for decrypting tables as they are read. The library encryption settings also determine when an in-memory table should be encrypted as it is saved as a SASHDAT file.
3 If a SASHDAT engine library is registered in metadata that specifies encryption settings for a Hadoop server on host grid001.example.com (the GRIDHOST environment variable) and directory /hps(the PATH= option), then the Heart table is read from Sashelp and written to /hps/heart.sashdat in encrypted form.
4 In a metadata environment, a server must be started with the SIGNER= option.
5 The encrypted /hps/heart.sashdat file is decrypted and loaded to memory by the server.
6 Because the /hps directory that is associated with the Hadoop server is associated with encryption settings in SAS metadata (the same circumstance as item 3), the /hps/heart2.sashdat file is created with encryption.

Example: Environment without Metadata

In a deployment that does not use SAS metadata, programmers can specify passphrases themselves.
options set=GRIDHOST="grid001.example.com" set=GRIDINSTALLLOC="/opt/TKGrid";

libname hdfs sashdat path="/hps";

data hdfs.heart(replace=yes encrypt=aes encryptkey="secret");
   set sashelp.heart;
run;

proc lasr create port=10010;
  performance nodes=all;
run;

proc lasr add data=hdfs.heart encryptkey="secret" noclass port=10010 verbose;
run;

libname example sasiola tag="hps";

proc imstat;
  table example.heart;
  save fullpath path="/hps/heart2" encryptkey="moresecret" replace;
run;
If you write programs with passphrases in a metadata environment, it is possible to specify ENCRYPT=AES and the passphrase in the ENCRYPTKEY= option instead of using the SIGNER= option. However, be aware that in the metadata environment, the passphrase is managed at the level of the Hadoop server or SASHDAT library. If you encrypt files in a directory with more than one passphrase, some files cannot be opened in a metadata environment.
As a best practice, if you are working with a metadata environment, using the SIGNER= option and managing passphrases in metadata is simpler than specifying passphrases in programs.
In order to decrypt a SASHDAT table, the server must read the entire table into memory at load time. The use of a WHERE clause with to subset the rows at load time with PROC LASR ADD prevents the server from loading the table.

Encryption and Compression

Encrypted SASHDAT files are always uncompressed when they are loaded into the server. You can use compression to conserve disk space for an encrypted SASHDAT file. However, compressing an encrypted SASHDAT file does not conserve memory. Before an encrypted file is loaded, it must be decrypted—and decryption requires that the data be uncompressed.