Resources

What's New Table of Contents  

What's New in SAS Scalable Performance Data Server 4.3

Overview

This document summarizes enhancements and upgrades contained in SAS Scalable Performance Data Server 4.3. To view the enhancements and upgrades that were contained in SPD Server 4.4, see What's New in SAS Scalable Performance Data Server 4.4.


SAS 9.1.3 Compatibility and Large Table Support

SPD Server 4.3 is compatible with the improved I/O infrastructure of SAS 9.1.3.

SPD Server 4.3 provides on-disk structures that are compatible with SAS®9 and the large table capacities that it supports. Enterprise-wide data mining often creates immense tables. In order to generate business intelligence quickly, the ability to update tables that contain billions of rows is more important then ever. Previous versions of SPD Server were based on 32-bit architecture that supported just over 2 billion rows and 32,768 columns. SPD Server 4.3 is based on a 64-bit architecture which supports tables with over 9 quintillion rows and over 2 billion columns.

The architectural differences between SAS®9 and earlier versions mean that SPD Server 4.3 cannot access SPD Server 3.x stores, and vice versa. For more information on sharing SPD Server 3.x and SPD Server 4.3 data stores, see the chapter, "SPD Server 3.x and SPD Server 4.x Compatibility" in the in the online SPD Server 4.4 Administrator's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.


SPD Server 4.3 and SAS 9.1.3 Password Encoding

SPD Server 4.3 supports the integration of the SAS 9.1.3 PROC PWENCODE. This permits scripts to be generated that do not explicitly contain secure passwords that could easily be used without authorization. You must run PROC PWENCODE in Base SAS to enable the usage of script password encoding within SPD Server 4.3. See the Base SAS documentation for detailed instruction on running PROC PWENCODE for use with SPD Server 4.3.

The example below shows an SPD Server 4.3 LIBNAME statement that utilizes the password encoding option:

libname mylib sasspds 'spdsdata'
server=kaboom.5200
user='spdsuser'
password='{sas001}c3BkczEyMw=='


SPD Server 4.3 and the SAS Management Console Utility

The SAS Management Console (SMC) is a Java application that provides a single point of control for managing multiple SAS application resources. Rather than using a separate administrative interface for each application in your enterprise intelligence environment, you can use SAS Management Console's single interface to perform the administrative tasks required to create and maintain an integrated environment.

SAS Management Console manages resources and controls by creating and maintaining metadata definitions for entities such as:

After installing the SPD Server 4.3 Java plug-in file, SPD Server administrators can use the SPD Server Manager utility in SAS Management Console to configure SPD Server 4.3 user/ group passwords and ACLs instead of using the traditional SPD Server psmgr utility and PROC SPDO commands.

By default, SAS Management Console looks for plug-ins in the plugins subdirectory of each installed SAS product. The Java plug-in file that makes the SPD Server Manager utility available in SAS Management Console is located at:

SASROOT/spds43/plugins/sas.smc.SpdsMgr.jar

Note: SASROOT represents the path to the base directory of the SAS software installation on your client machine. The Java plug-in file path above is specifically for SPD Server 4.3.  The Java plug-in file for SPD Server 4.4 resides in a different location.


SPD Server 4.3 and SAS Data Integration Studio

You can integrate the processing power of SPD Server 4.3 with other SAS software tools, such as SAS Data Integration Studio. The same plug-in file that SPD Server uses to integrate with the SAS Management Console can be used to integrate SPD Server resources into the SAS Data Integration Studio user interface.

SAS Data Integration Studio is software that enables data warehouse specialists to create and manage metadata objects that define sources, targets, and the sequence of steps for the extraction, transformation, and loading of data into data marts or warehouses. SPD Server can be an excellent tool for managing the large tables of data associated with large data marts and warehouses.

By default, SAS Data Integration Studio looks for plug-ins in the plugins subdirectory of the SAS Data Integration Studios installation. To integrate SPD Server 4.3 functionality into the SAS Data Integration Studio user interface, copy the SPD Server 4.3 Java plug-in file into the SAS Data Integration Studio plugins subdirectory.

The SPD Server 4.3 Java plug-in file is located at:

SASROOT/spds43/plugins/sas.smc.SpdsMgr.jar

Note: SASROOT represents the path to the base directory of the SAS software installation on your client machine. Spds43/ represents the installed SPD Server software directory. The name of the installed SPD Server software directory varies according to the specific version and release of your SPD Server software. For example, the path to your SPD Server Java plug-in file might begin with SASROOT/spds43, SASROOT/spds43tsm1, or SASROOT/spds43tsm2, depending on if you have the original SPD Server 4.3 software, or the first or second maintenance release of the SPD Server 4.3 software.

Copy the SPD Server 4.3 Java plug-in file to the SAS Data Integration Studio plugins directory:

SASROOT/SASETLStudio/9.1/plugins/sas.smc.SpdsMgr.jar


SPD Server 4.3 Utility Requirements

SPD Server 4.3 provides NLS functionality, or National Language Support for multiple languages and character sets in database operations. As a result, all SPD Server 4.3 utilities require access to the InstallDir/bin64 directory, and you must ensure that the InstallDir/bin64 directory is included in your SPD Server 4.3 PATH.

Here is an example of statement that specifies the necessary path:

    LD_LIBRARY_PATH=$LD_LIBRARY_PATH:InstallDir/bin64
    export LD_LIBRARY_PATH

SPD Server 4.3 SQL Planner Enhancements

SPD Server 4.3 includes SQL Planner optimizations. SQL Planner optimizations improve the performance of the more frequent query types that used in data mining solutions such as Enterprise Marketing Automation. A key enhancement to the SPD Server 4.3 SQL Planner is optimizing correlated queries through the use of query rewrite techniques. Correlated queries are common in business and analytic intelligence data mining. Another significant enhancement is the tighter integration of the Parallel Group-By technology in the planner. The tighter integration adds performance benefits to nested Group-By syntax.

SPD Server 4.3 STARJOIN Facility

The SPD Server 4.3 enhanced SQL planner includes the new STARJOIN facility. The SPD Server 4.3 STARJOIN facility validates, optimizes, and executes SQL queries on data that is configured in a star schema. Star schemas are composed of two or more normalized dimension tables that surround a centralized fact table. The centralized fact table contains data elements of interest derived from the dimension tables.

For more detailed information on expression support for STARJOIN, see the SQL section in the chapter on, "STARJOIN Optimization," in the SPD Server 4.4 User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.

SPD Server 4.3 Index Scans

SPD Server 4.3 SQL provides users with the capability to use lightning-fast index scans on large tables. Rather than scanning entire tables which may have millions or billions of rows, in specific cases, SPD Server 4.3 SQL can utilize index data to resolve the query. Index data is compact, small, and faster to scan than an entire table. SPD Server 4.3 SQL provides enhanced index scan support for the following functions:

min, max, count, count distinct, nmiss, and range functions.

For more detailed information on server index scans, see the section on Index Scans in the chapter on, "Optimizing SAS Scalable Performance Data Server Performance," in the SPD Server 4.4 User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.

SPD Server 4.3 Optimized Correlated Queries

Intelligent storage must have the ability to interpret and process complex requests such as correlated queries. A correlated query is a select expression where a predicate within the query has a relationship to a column that is defined in another scope. Today's business and analytic intelligence tools often generate SQL queries that are nested 3 or 4 layers deep. Queries with cross nested relationships consume significant processor resources and require more time to complete processing. New algorithms in the SQL Planner of SPD Server 4.3 implement techniques that significantly improve the performance of correlated queries for patterns that permit query rewrites or query de-correlation.

SPD Server 4.3 Parallel Group-By

Parallel Group-By is a high performance parallel summarization of data executed using SQL. Parallel Group-By works against single tables used to aggregate data. Summarization tasks are common in data warehousing applications. Parallel Group-By was developed to speed up processor performance summarization tasks. Parallel Group-By is often used in SQL queries (through the use of sub queries) to apply selection lists for inclusion or exclusion.

The Parallel Group-By support in SPD Server 4.3 has been expanded in many areas. Parallel Group-By is integrated into the WHERE-clause planner code so that it will boost the capabilities of the SPD Server SQL engine. Any section of code that matches the Parallel Group-By trigger pattern will use it.

For more detailed information on server index scans, see the section on Index Scans in the section on Parallel Group-By in the chapter on, "Optimizing SAS Scalable Performance Data Server Performance," in the SPD Server 4.4 User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.

SPD Server 4.3 Parallel Join

Parallel join is a high performance pairwise join of two SPD Server tables. The parallel join feature enhances join performance in two ways. First, SPD Server parallel joins are performed using parallel threading. Second, SPD Server parallel joins use enhanced data summarization methods after rows in a table are joined.

For more detailed information on SPD Server parallel joins, see the section on Parallel Joins in the in the chapter on, "Optimizing SAS Scalable Performance Data Server Performance," in the SPD Server 4.4 User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.


SPD Server 4.3 Minmax Table Indexing

SPD Server 4.3 contains a new table option called MINMAXVARLIST=. The primary purpose of the MIINMAXVARLIST= table option is for use with SPD Server 4.3 dynamic cluster tables, where specific members in the dynamic cluster contain a set or range of values, such as sales data for a given month. When a SPD Server SQL sub setting where- clause specifies specific months from a range of sales data, the WHERE planner checks the min/max indexes. Based on the min/max index information, the SPD Server WHERE planner includes or eliminates member tables in the dynamic cluster for evaluation.

Use the MIINMAXVARLIST= table option with numeric columns. MINMAXVARLIST= uses the list of columns you submit to build an index. The MINMAXVARLIST= index contains only the minimum and maximum values for each column. The WHERE-clause planner uses the index to filter SQL predicates quickly, and to include or eliminate member tables belonging to the cluster table from the evaluation.

Although the MINMAXVARLIST= table option is primarily intended for use with dynamic clusters, it also works on standard SPD Server tables. MINMAXVARLIST= can help reduce the need to create many indexes on a table, which can save valuable resources and space.

For more detailed information on SPD Server parallel joins, see the section on Minmax Indexes in the in the chapter on, "Optimizing SAS Scalable Performance Data Server Performance," in the SPD Server 4.4 User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.


SPD Server 4.3 WHERE-Costing Improvements

The WHERE-clause Planner implemented in SPD Server 4.3 avoids computation-intensive operations and uses simple computations where possible. WHERE-clauses in large database operations can be very resource-intensive operations. In SPD Server 3.x and earlier releases, query authors often needed to manually "tune" queries for performance. The "tuning" was accomplished using macro variables and index settings. The WHERE-clause planner integrated into SPD Server 4.3 does the "tuning" work for the user by costing the different approaches to index evaluation.

For more detailed information on SPD Server WHERE-costing improvements, see the section on the WHERE-Clause Planner in the in the chapter on, "Optimizing SAS Scalable Performance Data Server Performance," in the SPD Server 4.4 User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.


SPD Server 4.3 Clustered Tables

SPD Server 4.3 uses a virtual table structure called a clustered table. Why are clustered tables important? Clustered data tables provide a storage architecture that SPD Server 4.3 can take advantage of by using its parallel processing and data management capabilities.

A clustered table is a structure that can store multiple SPD Server tables. A clustered table is composed of members (also called generations or partitions). Each member can store a single SPD Server table. The clustered table uses a layer of metadata to manage the members. clustered tables can be used in WHERE-clause costing as well. Each member in a clustered table is analyzed and assigned an EVAL strategy that best fits the data patterns in the member or slot. Using multiple EVAL strategies while performing WHERE-clause costing on a clustered table provides better process granularity, which can improve overall data throughput and performance.

Dynamic Cluster Tables

SPD Server Cluster Tables are virtual SPD Server table structures. SPD Server 4.3 cluster tables are a bound collection of multiple members. Each member is a standard SPD Server table. All member tables that belong to a given dynamic cluster table must share the same metadata formats and organization. SPD Server 4.3 clustered data tables use the metadata layer to manage the data that is contained in the member tables.

The SPD Server 4.3 dynamic cluster table structure provides an architecture that enables flexible loading, rapid storage, and parallel processing features for very large data tables. Using dynamic cluster tables, loading, removing data from, and refreshing very large data marts becomes easier and more timely. Dynamic cluster tables also provide organizational features and performance benefits that traditional SAS and SPD Server tables do not have.

For example, users can add new data or remove historical data from very large tables by accessing only the member tables that are affected by the change. It is possible to access the individual member tables in parallel. This strategy reduces the time needed for the job to complete and is accomplished using very simple commands. Furthermore, a complete refresh of a dynamic cluster table can take place using a fraction of the disk space that would be needed to refresh a large standard table that contained the same data.

For more detailed information on SPD Server dynamic cluster tables, see the chapter on, "Dynamic Cluster Tables," in the SPD Server User's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.

Unsupported Features in Clustered Tables

Because of differences in the load and read structures for clustered tables, some standard features normally associated with Base SAS and SPD Server tables are currently not supported in SPD Server 4.3 clustered tables.

The non-supported features in Dynamic Cluster tables are:

If you have a task for a Dynamic Cluster table that requires one of the above features, you should undo the cluster and create standard SPD Server tables before continuing.


SPD Server 4.3 Random Placement of Initial Data Partitions in DATAPATH= List

In SPD Server 3.x, the first data partition files for all tables in the same domain started in the first DATAPATH= setting that was defined in the libnames.parm LIBNAME configuration file.  Subsequent data partition files for that table would be placed in subsequent data paths.  When all SPD Server datapath contain a data partition file, the process returns to the first datapath and continues in the same fashion.  However, many SPD Server installations have many small-to-medium-sized tables that would not have data partition files in all of the available datapath. This strategy could unevenly balance the distribution of data on the disk, resulting with the first few datapath in a domain containing significantly more data than the last few datapath in the domain.  The skewed data distribution results in unbalanced I/O. 

In SPD Server 4.3, the first partition for tables in a domain is no longer assigned to the first DATAPATH= setting that was defined in the libnames.parm domain definition. Instead, SPD Server chooses randomly from the available data paths when placing the first partition for a large data table.  As a result, the data is physically distributed more evenly and permits more balanced I/O within SPD Server processing.

By default, SPD Server 4.3 is configured to utilize random placement of initial data partitions among SPD Server data paths.  The RANDOMPLACEDPF option is specified in the spdsserv.parm file.  To disable random placement of initial data partitions in the DATAPATH= list, remove the RANDOMPLACEDPF option from your spdsserv.parm file.

Additional information on the RANDOMPLACEDPF option can be found in the SPD Server Administrator's Guide, Second Edition: Setting Up SPD Server Parameter Files, under spdsserv.parm file options.


SPD Server 4.3 Debugging Tools

SPD Server 4.3 includes debugging tools that system administrators will find useful.  The debugging tools will allow SPD Server system administrators to create debug images and to evaluate test images without interfering with a pre-existing production SPD Server environment.  The debugging tools are for use with SPD Server 4.3 running on SAS 9.1.3.  The debugging tools are organized into LIBNAME statement options for debugging, and server parameter file options for debugging.   enable SPD Server 4.3 users to create a debug image that will not interfere with the SPD Server environment.  

For more detailed information on SPD Server debugging tools, see the chapter on, "SPD Server Debugging Tools," in the SPD Server 4.4 Administrator's Guide, located at http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html.