SAS Scalable Performance Data (SPD) Server Frequently Asked Questions

Does SPD Server support files that are larger than 2 Gigabytes in size?

Yes. SPD Server does so by breaking up larger files into partitions that are smaller than 2 Gigabytes. The SPD Server host performs this function automatically and it requires no special syntax.

Can I create file systems that are larger than 2 Gigabytes in size?

Yes, if you use a volume manager that lets you create file systems greater than 2 Gigabytes. SAS recommends this practice.

How do SPD Server client and server processes communicate?

An SPD Server client communicates with three SPD Server processes.
When a client issues a LIBNAME assignment to the SPD Server host, the client communicates with the SPD Server Name Server process using the HOST= and SERV= options that were specified in the LIBNAME connection. The HOST= option specifies the host system where the SPD Server Name Server is running, and the SERV= option is the well-known port number of the SPD Server Name Server that was specified when the software was started. The SPD Server Name Server ensures that the domain of the LIBNAME assignment is valid and returns the HOST= and SERV= option settings to the client. This ends the interaction of the client with the SPD Server Name Server for that LIBNAME assignment. The client communicates with the SPDSSERV process to complete the LIBNAME assignment.
The SPDSSERV process authenticates the USER and PASSWORD portion of the LIBNAME assignment, and validates whether the USER has access to the domain. If the LIBNAME is successfully authenticated, the SPDSSERV process forks and executes a user proxy, the SPDSBASE process, which continues to service all other client requests for that LIBNAME connection. Subsequent LIBNAME assignments from the same client that are resolved to the same SPD Server user and SPDSSERV context are passed directly to SPDSBASE for processing without any further SPDSSERV interaction. (No further interaction is required because the authentication is inherited by subsequent LIBNAME assignments.)
LIBNAME assignments from the same client for a different SPD Server user or LIBNAME assignments to a domain that is serviced by a different SPDSSERV results in a new SPDSBASE process to service that LIBNAME assignment.
Connections that use the record-level locking option LOCKING=YES to connect to a server in any domain are handled differently. All LIBNAME assignments share the same SPDSBASE record-level locking process. When the LOCKING=YES option is in force, instead of forking and executing a new user proxy, the SPDSSERV process initiates communication with the shared LOCKING=YES SPDSBASE process and the client.

How do I know which ports must be surfaced through an Internet firewall?

There are two ports that the SPD Server Name Server uses that you can specify using command-line options. The listenport option defines the port that must be used by clients (such as SAS) in LIBNAME and SQL CONNECT statements. The listenport option can also define the port that an ODBC data source requires to communicate with the SPD Server Name Server. The operport option defines a second port that is used for various command communications from SPD Server utilities. Either of these ports can be specified using well-known port definitions in the operating system's services file, instead of specifying them on the command-line. In UNIX systems, this is typically the /etc/services file. In the services file, the spdsname specification corresponds to listenport, and the spdsoper setting corresponds to the operport setting. Both of these ports should be surfaced through the firewall.
The SPDSSERV process uses two types of ports. The first type of port is a port that SPDSSERV uses for local machine communications, internal to SPD Server. The second type of ports is ports that must be accessed by SPD Server clients.
Ports in the first category are not discussed here, because they do not need to be visible beyond the local machine. Ports in the first category do not need firewall connectivity. There are two ports in the second category. The first port in the second category is the port that is defined by the SPDSSERV listenport command-line option. The SPDSSERV listenport command performs LIBNAME authentication of the SPD Server user and password, and validates access to the SPD Server domain. The second port in the second category is the port that is used for various communications from SPD Server utilities, and is defined by the SPDSSERV -operport command-line option.
The SPDSSERV listenport and operport specifications are registered in the SPD Server Name Server by the SPDSSERV process when it starts. Both specifications are returned to the SPD Server client from the SPD Server Name Server when it maps the LIBNAME domain to an SPDSSERV. If you do not specify a listenport or operport in the SPDSSERV command-line, any port that is available is used. Both of these ports should be specified in the SPDSSERV command-line and surfaced through the firewall.
Ports that the SPDSBASE process uses also fall into the two same categories. The first type of ports is used for local machine communications that are internal to SPD Server. The second type of ports is ports that must be accessed by SPD Server Clients. Like the SPDSSERV process, the SPDSBASE process only cares about the ports that outside clients need to access through an Internet firewall.
The way that the SPDSBASE processes use ports is complex and requires a range of port numbers that are declared using the SPD Server MINPORTNO=/MAXPORTNO= server parameter specifications. The MINPORTNO= and MAXPORTNO= parameters must both be specified to define the range of port numbers that are available to communicate with SPD Server clients. Therefore, they both require access from outside of the firewall. If the SPD Server parameters for MINPORTNO= and MAXPORTNO= are not specified, the SPDSBASE processes uses any port that is available to communicate with the SPD Server client.
How many port numbers need to be set aside for SPDSBASE proxy processes? Each SPDSBASE process produces its own operator port that can be accessed using command-line specifications issued by an SPD Server client. In addition, each SPD Server table that is opened creates its own port. Each table's port becomes a dedicated data transfer connection that is used to stream data transfers to and from the SPD Server client. SPD Server table ports are normally dynamically assigned, unless the MINPORTNO= and MAXPORTNO= parameters have been specified. If the MINPORTNO= and MAXPORTNO= parameters have been specified, SPD Server table ports are assigned from within the specified port range.
Therefore, it follows that the range of ports that is specified for the MINPORTNO= and MAXPORTNO= parameters must consider the peak number of concurrent LIBNAME connections that are made to the server, as well as the I/O streams that are channeled between the SPDSBASE processes and the SPD Server clients.
The following ports must be surfaced for access beyond the firewall:
  • Two SPD Server Name Server ports, listenport and operport , must be surfaced for access beyond the firewall. This is also true for any other ports that are identified in SPDSNAME and SPDSOPER services.
  • Two SPDSSERV ports: listenport and operport , as well as any other ports that are identified in SPDSSERV_SAS and SPDSSERV_OPER services.
  • Any other ports that are defined in the MINPORTNO= and MAXPORTNO= range that is specified in the spdsserv.parm file.

How does SPD Server interact with multi-homed hosts?

A multi-homed host is a machine that has two or more IP addresses. For SPD Server to work properly on host machines that have more than one IP address, you must define which IP address you want to associate with the socket bind calls. Socket bind calls listen for the SPD Server Name Server and the SPDSSERV processes. You use the SPDSBINDADDR environment variable to define the preferred IP address. You set the SPDSBINDADDR environment variable in the rc.spds script that you use to initiate the SPD Server Name Server and SPDSSERV processes on the SPD Server host machine.

Can I use standard UNIX backup procedures?

Yes. SPD Server files are standard files. If all the components of a table are in the same directory, then you can use the standard backup utility. This is our recommendation. SPD Server includes an incremental backup utility.

What do I need to know about SPD Server installation? How long does it take?

The SPD Server install is quick and easy to do. The hard-copy installation instructions and shell scripts that are included on the installation media guides you through the installation process. Installation and verification take less than an hour. You might need additional time if you have several SAS client platforms to update.
On UNIX, the SPD Server installation can be performed using a non-privileged UNIX account, although to implement all recommendations, UNIX root privilege is required.

Is it necessary to run UNIX SPD Server as root?

No. SAS recommends that you use a UNIX user ID other than root to run your production SPD Server environment. Root access is not required to run the SPD Server environment when you properly configure the UNIX directory ownership and permissions on your LIBNAME domains. There is no real benefit from running the SPD Server package as root. You should carefully consider whether any convenience that you might obtain justifies the potential risk from running as root.

What is the SPD Server Name Server, and why do I need it?

All access to SPD Server is controlled and managed by the SPD Server Name Server. All clients first connect to the Name Server, which acts as a gateway to named SPD Server domains. The Name Server maintains a dynamically updated list of valid SPD Server hosts and LIBNAME domains. When a user client needs a domain connection, the Name Server parses the requested LIBNAME domain into a physical address, and then creates a proxy connection to the corresponding SPD Server host. The SPD Server Name Server means that users do not have to keep track of the physical addresses of SPD Server hosts. The only server that an SPD Server client has to know about is the Name Server, which handles the details of connecting SPD Server client users to the appropriate domains.

Does every SPD Server client need a UNIX ID or Windows Networking ID?

No. SPD Server does not use UNIX or Windows networking IDs for login security. Each SPD Server client must have a valid SPD Server ID in order to login to the server. Access to the server is controlled by this ID. Access to individual data is controlled by ACLs that are created by the owner of the data.

Can an SPD Server host, SPD Server Name Server and an SPD Server client all run on the same machine?

Yes, they can. In fact, this even boosts performance because the client engine uses direct access where possible instead of issuing requests to the server. For example, the SPD Server client can perform direct reads from disk. WHERE clause evaluation and index retrieval are faster, too.

Can I have multiple SPD Server hosts on the same machine?

Yes. They can either be all connected to the same SPD Server Name Server or different SPD Server Name Servers. Within each Name Server, all SPD Server LIBNAME domains must be unique.

How do I create LIBNAME domains?

LIBNAME domains are defined in a LIBNAME startup file. The required SPD Server command-line option, -libnamefile, specifies the LIBNAME startup file. For more information about LIBNAME domains and LIBNAME startup files, see Domains and Data Spaces in SAS Scalable Performance Data Server: Administrator's Guide

How do I specify a LIBNAME domain in SAS?

LIBNAME domains are defined by using a SAS LIBNAME statement. A sample syntax is
LIBNAME sample sasspds 'ldname' server=spdshost.spdsname user='johndoe' prompt=yes ; 
where
sample is the name of the libref
sasspds is the name of the SPD Server engine
ldname is the LIBNAME domain
spdshost is the IP name of the node that is running the Name Server
spdsname is the port number that the Name Server uses
johndoe is the SPD Server login ID
prompt is the prompt for password Y | N

Is there anything else I have to change to run my existing SAS applications?

Typically, no. Once the librefs have been assigned, your existing SAS application runs unchanged.

How can I get existing data loaded into an SPD Server table?

There are several ways to accomplish this. Here are the three most common:
  1. Use PROC COPY:
      PROC COPY
       in=old
       out=spds
       memtype=data ;
      run ;
    
    This copies the data and build any existing indexes automatically.
  2. Use the DATA Step and SET statement:
       DATA spds.a ;
        set old.a ;
        run ;
    
    This copies the data. You have to specify the indexes that you want to build.
       DATA spds.a(index=(z));
        set old.a ;
       run ;
    
    This copies the data and create an index on variable Z.
  3. Use the Microsoft Windows ODBC driver.
Also, see Migrating Tables between SAS and SPD Server, which examines table conversions.

Can SPD Server create indexes in parallel?

Yes, SPD Server can create multiple indexes at the same time. It does this by launching one thread per index and driving them all at the same time. You can accomplish this with
PROC DATASETS lib=spds ;
  modify a(asyncindex=yes) ;
  index create x ;
  index create y ;
  index create comp=(x y) ;
quit;
In the above example, X, Y, and COMP are created in parallel. Notice the ASYNCINDEX=YES data set option in the MODIFY statement.
%LET spdsiasy=YES ;
PROC DATASETS lib=spds ;
  modify a ;
   index create x ;
   index create y ;
  modify a ;
   index create
   comp=(x y)
   comp2=(y x) ;
quit ;
In the above example, X and Y are created in parallel; COMP and COMP2 are created in a second parallel index create as soon as the first pair completes. Notice the use of the SPDSIASY macro variable to specify parallel index creation. In this example, a table scan is required for each batch of indexes identified for creation in parallel: one table scan for the X and Y indexes and a second table scan for the COMP and COMP2 indexes.
How many indexes should you create in parallel? It depends on how many CPUs are in the SMP configuration, available disk space for index key sorting, and other tasks. Some results show that on an 8-way UltraSparc, you can create four indexes in almost the same time it takes to create 1. You can group index creates to minimize table scans or auxiliary disk space consumption, but generally there is an inverse relationship between the two: minimizing table scans requires more auxiliary disk space and vice versa. The Help documentation contains more information about Parallel Index Creation.

Does SPD Server append indexes in parallel?

Yes, SPD Server appends indexes in parallel by default.

What are ACLs and how do I use them to control access to data tables?

ACLs define who can access a data table and what type of access they are granted. Currently, there are four levels of access defined: Access List Entry, Owner Access, Group Access, and Universal Access. Every SPD Server user has access to at least one group. During login, an SPD Server user must specify a particular ACL group if the SPD Server password file has the user entered as a member of more than one group. Every data table has an ACL owner and the owner's ACL group attached to it. The precedence of the access levels is the following:
  • Access List Entry
  • Owner Access
  • Group Access
  • Universal Access
Types of access are Read, Write, Alter, and Control. To create access lists that you must have CONTROL access. The owner by default has control access. For more information about ACL access lists and commands, see Using the ACL Command Set in SAS Scalable Performance Data Server: Administrator's Guide.

How do I get a list of the SAS macro variables introduced for SPD Server?

In a SAS session, get into PROC SPDO and issue the SPDSMAC command. For example:
   LIBNAME foo sasspds ... ;
   PROC SPDO lib=foo ;
   SPDSMAC ;

What about unique indexes? Can I do something to speed appends?

You can use the SPDSAUNQ=YES server option to speed up appends to unique indexes. For information about the SPDSAUNQ option, see SPDSAUNQ=.

What about disk compression for SPD Server tables?

You can request compression for an SPD Server table by using the COMPRESS= data set option. You can also set a macro variable named SPDSDCMP to the same value that you would set in the COMPRESS= option. This causes compression on all data sets you generate without explicitly specifying COMPRESS= on each DATA step. SPD Server compresses your table set by "blocks" and the way you control this amount is through the IOBLOCKSIZE= table option. Once you create a compressed table, the compression block size (that is, the number observations per block) cannot be changed. You must PROC COPY the data set to a new data set with a different IOBLOCKSIZE= on the output data set.
For more information about SPD Server disk compression settings, see COMPRESS=.
In any case, you select the default SPD Server compression by asserting COMPRESS=YES or using %let SPDSDCMP=YES. The default compression algorithm is a run-length compression.

What about estimates for disk space consumption when using SPD Server?

Overview of Disk Space Consumption

The answer to this question depends on what type of component file within the SPD Server data you need to estimate. Recall that there are three classes of component files that make up an SPD Server table: metadata, data, and indexes. You always get the first two for every table. You get an index component file for each index that you create on the table.

Metadata Space Consumption

The approximate estimate here is:
SpaceBytes = 12Kb + (#columns * 120) + (5Kb * #indexes)
This estimate increases if you delete observations from the table or use compression on the table. In general, the size of this component file should not exceed approximately 400K.

Data Space Consumption

The estimate here is for uncompressed tables:
SpaceBytes = #rows * RowLength
Your space consumption for compressed tables obviously varies with the compression factor for your table as a whole.

Hybrid Index Space Consumption

The hybrid index uses two data files. The .hbx file contains the global portion of the hybrid index. You can estimate space consumption approximately for the .hbx component of a hybrid index as follows:
If the index is NOT unique:
number_of_discrete_values_in_the_index * (22.5 +
  (length_of_columns_composing_the_index))
If the index IS unique:
number_of_discrete_value_in_the_index * (6 +
  (length_of_columns_composing_the_index))
The .idx file contains the per-value segment lists and bitmap portion of the hybrid index. Estimating disk space consumption for this file is much more difficult than the .hbx file. This is because the .idx file size depends on the distribution of the key values across the rows of the table. The size also depends on the number of updates and appends performed on the index. The .idx files of an indexed table initially created with "n" rows consumes considerably less space than the .idx files of an identical table created and with several append or updates performed afterward. The wasted space in the latter example can be reclaimed by reorganizing the index.
With the above in mind, a worst case estimate for space consumption of the .idx component of a hybrid index is:
8192 + (number_of_discrete_values_in_more_than_one_obs  * (24 +
  (avg_number_of_segments_per_value * (16 + (seg_size / 8)))))
This estimate does not consider the compression factor for the bitmaps, which could be substantial. The fewer occurrences of a value in a given segment, the more the bitmap for that segment can be compressed. The uncompressed bitmap size is the (seg_size/8) component of the algorithm.
To estimate the disk usage for a nonunique hybrid index on a column with a length of 8, where the column contains 1024 discrete values, and each value exists in an average of 4 segments, where each segment occupies 8192 rows, the calculation would be:
.hyb_size = 1024 * (22.5 + 8) = 31323 bytes

.idx_size = 8192 + (10000 * (24 + (4 * (16 + (8192/8))))) = 4343808 bytes
To estimate the disk usage of a unique hybrid index on a column with a length of 8 that contains 100000 values would be:
.hyb_size = 100000 * (6 + 8) = 1400000 bytes

.idx_size = 8192 + (0 * (...)) = 8192 bytes
Note: The size of the .idx file for a unique index will always be 8192 bytes because the unique index contains no values that are in more than one observation.
There is a hidden workspace requirement when creating indexes or when appending indexes in SPD Server. This need arises from the fact that SPD Server sorts the rows of the table by the key value before adding the key values to the hybrid index. This greatly improves the index create and append performance but comes with a price requiring temporary disk space to hold the sorted keys while the index create and append is in progress. This workspace is controlled for SPD Server by the WORKPATH= parameter in the SPD Server host parameter file.
You can estimate workspace requirements for index creation as follows for a given index "x":
SpaceBytes ~ #rows * SortLength(x)
where #rows = Number of rows in the table if creating; number of rows in the append if appending.
if KeyLength(x) >= 20 bytes
  SortLength(x) = (4 + KeyLength(x))
if KeyLength(x) < 20 bytes
  SortLength(x) = 4 + (4 * floor((KeyLength(x) + 3) / 4))
For example, consider the following SAS code:
DATA foo.test ;
  length xc $15 ;
  do x=1 to 1000 ;
   xc = left(x) ;
   output ;
  end ;
run ;

PROC DATASETS lib=foo ;
  modify test ;
  index create x xc xxc=(x xc) ;
quit ;
For index X, space would be:
SpaceBytes = 1000 * (4 + (4 * floor((8 + 3) / 4)))  
           = 1000 * (4 + (4 * floor(11 / 4)))
           = 1000 * (4 + 4 * 2)
           = 12,000
For index XC, space would be:
SpaceBytes = 1000 * (4 + (4 * floor(15 + 3) / 4)))
           = 1000 * (4 + (4 * floor(18 / 4)))
           = 1000 * (4 + 4 * 4)
           = 20,000
For index XXC, space would be:
SpaceBytes = 1000 * (4 + 23)
           = 1000 * 27
           = 27,000
There is one other factor that plays into workspace computation: Are you creating the indexes in parallel or serially? If you create the indexes in parallel by using the ASYNCINDEX=YES data set option or by asserting the SPDSIASY macro variable, you need to sum the space requirements for each index that you create in the same create phase.
As is noted in the FAQ example about creating SPD Server indexes in parallel, Can SPD Server create indexes in parallel?, the indexes X and Y constitute a create phase, as do COMP and COMP2. You would need to sum the space requirement for X and Y, and for COMP and COMP2, and take the maximum of these two numbers to get the workspace needed to complete the PROC DATASETS indexes successfully.
The same applies to PROC APPEND runs when appending to the table with indexes. In this case all of the indexes are appended in parallel, so you would need to sum the workspace requirement across all indexes.

How can I estimate the transient space needed to perform PROC SORT / BY processing?

Workspace is required for SPD Server sorting just as it is required for SPD Server sorted index creation. There are two modes of sorting in SPD Server: tag and non-tag sorting. In either case, you sort based on the columns selected in the BY clause. The difference is in the auxiliary data that is carried along by the sort in addition to the key constructed from the BY columns. The default for SPD Server is to use the non-tag sort.
In the case of non-tag sorting, SPD Server carries along the entire row contents (that is, all columns) as the auxiliary data for the key. In the mode of tag sorting, SPD Server only carries along the row ID that points back to the original table row as the auxiliary data. You control the amount of a sort problem that fits in memory at one time by the SPD Server parameter SORTSIZE. Obviously, for a given sort size the number of sort records that fits is a function of the sort mode(#records = SORTSIZE / (SortKeyLength + AuxillaryLength)). When the sort problem does not fit in one SORTSIZE bin, the bins are written to workspace on disk and then merged back to make the final sorted run.
Estimating the disk space required for SPD Server sorting depends on the mode.
For non-tag sorting the estimate is
SpaceBytes = #rows * (SortKeyLength + 4 + RowLength)
For tag sorting the estimate is
SpaceBytes = #rows * (SortKeyLength + 8)
So there is a very obvious question here: Because non-tag sort requires so much more space than a tag sort, why would you ever choose a non-tag sort, much less make it the default? The answer lies in the post-processing phase required for the tag sort. When the tag sort completes all you have is the sorted list of row IDs. You must probe the table using the row IDs to return the rows in the desired order. This generally means a highly randomized I/O access pattern to the original table that can add significantly to the time to complete the BY clause. There is definitely a trade-off between tag and non-tag sorting. The critical factors are the row length, the total number of rows to process, and the clustering of consecutive row IDs in the final ordering.

How do I, as a LIBNAME domain owner, allow others to create tables in my domain?

For example, Tom is a LIBNAME domain owner, and he wants to give Fred access to create tables in Tom's domain. Tom needs to do the following:
SAS Code to Give Access to User "Fred"
SAS code, by line
Remarks
LIBNAME dmowner sasspds
"tomdom"
host="samson"
serv="5555"
user="tom"
passwd="tompw" ;
  • dmowner is the libref for the location of the SPD Server data.
  • tomdom is the previously established SPD Server domain.
  • host= specifies the name of the computer where SPD Server resides.
  • serv= is followed by the port number of the SPD Server's Name Server.
  • passwd= is followed by the required password for tom.
PROC SPDO lib=dmowner ;
PROC SPDO opens the command set that allows the user tom to change ACLs in the tomdom domain using the libref dmowner.
set acluser tom ;
SET ACLUSER command allows ACLs under user ID tom to be modified.
add acl/LIBNAME ;
Command to add the ACL for a LIBNAME domain.
LIBNAME is the syntax used to indicate the LIBNAME domain assigned, which is tomdom to the libref that PROC SPDO is started with, which is dmowner.
modify acl/LIBNAME fred=(Y,Y,,) ;
Modifies the ACL in the LIBNAME domain ACL to give user ID fred Read and Write access to the tomdom domain.
quit ;
Fred can now connect to the TOMDOM domain and create tables.

How does the system administrator list the access control lists for "user 1"?

To see the ACL privileges for a domain, the system administrator lists them for each user.
For this to work, your SPD Server user ID must be previously set up to have the SPECIAL (level 7) privilege, to use the ACLSPECIAL=YES option in a LIBNAME statement.
Code to List ACLs
Command from command prompt >
Remark
LIBNAME test saspds
'temp'
server=servname.7880
prompt=yes ;
Issue LIBNAME statement for test domain, specify server and port number, ask system for a password prompt.
user="username" aclspecial=YES ;
aclspecial=YES now gives "username" access to special ACL commands, such as setting a new user ID.
PROC SPDO lib=test ;
Connects to the temp LIBNAME domain using the libref test.
set acluser user1 ;
Sets the SPD Server user scope to user1.
list acl _all_ ;
Lists all ACLs owned by user1.
The resulting output, described in the table below, lists all of the tables in "test".
Results from List Command
Resulting output from list acl _all_; command
Remarks
The SAS System 10:58 Tuesday, November 17, 2003
System message
ACL Info for A.DATA
This ACL affects table A if table A exists and user1 is the owner or has ACL control of table A.
Owner = USER1
USER1 created and owns the A.DATA ACL.
Group = TECH
This ACL was created while user1was connected with an ACL group of TECH. All group permissions affect the permissions of the members of the TECH ACL group.
Default Access (R,W,A,C) = (Y,N,N,N)
R=Read; W=Write; A=Alter (rename, delete, or replace tables) C=Control (define and update ACLs for a table)
Y=Yes; N=No;
Universal privileges are limited to read on table A.DATA.
Group Access (R,W,A,C) = (N,N,N,N)
Users in the ACL group TECH have no privileges on table A.DATA.
The SAS System 10:58 Tuesday, November 17, 2003
ACL Info for NTE*.DATA
NTE*.DATA refers to a set of tables, which begin with NTE. ACLs of this kind are created using the generic option. If you create a specific ACL for a table that starts with NTE, the specific ACL overrides the generic ACL.
Owner = user1
Group = TECH
Default Access (R,W,A,C) = (N,N,N,N)
Group Access (R,W,A,C) = (Y,Y,N,N)
Users from the ACL group TECH have Read and Write access to tables with names that start with NTE.

How do I change existing PROC SQL code that works with SAS to query SPD Server tables?

Overview

You do not have to change your PROC SQL code. The way to do this is to wrap your code inside a CONNECT statement, which points to the location of the SPD Server tables. This technique is referred to as the pass-through facility. Normal operating system and ACL privileges apply to the user ID making the query during the CONNECT process. Your PROC SQL code should work with a few exceptions. For more information, see Differences between SAS SQL and SPD Server SQL.
Once you establish a working CONNECT statement that points to the location of your SPD Server tables, you can assign a LIBNAME to the SPD Server table path with a libref command. This enables the simple name that you assign to the SPD Server table to be used in the SQL query, which keeps your SQL query as short as possible.
Here are four progressive examples:

Example 1: PROC SQL Query, Designed to Work with a SAS Data Set, with a two-level SAS Filename Example

Example 1 Code
Code
Remarks
  /* Issue a LIBNAME statement which   */
  /* creates a LIBREF called "test"    */
   LIBNAME test '/path/for/your/data' ;
  /* Query using base LIBREF of test   */
   PROC SQL ;
   select sum(table1+table2)
     as pass,
   carrier from test.carriers
   where carrier in('AA','JI')
     and bstate='TX'
   group by carrier ;
   quit ;
This is an example of SQL code that works with SAS. The code contains a two-level SAS filename reference, which is typical for PROC SQL, but it does not work if we attempt to use it inside a pass-through CONNECT statement.
Each of the following examples shows variations of this code, modified to access SPD Server information. We also discuss the pros and cons of each method.

Example 2: PROC SQL Query, without Using the Pass-through Facility, Pointing to an SPD Server Table, with a two-level SAS Filename

Why would you want to do this? You might NOT want to do this, because without the pass-through facility, all the processing is done on the CPU of the client machine. When processing large tables, this is impractical, if not impossible.
Example 2 Code
Code
Remarks:
/* Issue a SPD Server (mkt) library      */
/* LIBNAME statement                     */

   LIBNAME mkt sasspds 'mkt'
     server=servername.4228
     user='anonymous' ;
   PROC CONTENTS data=mkt.carriers ;
       run ;
/* query spds LIBREF (mkt)                */
/* (two-level SAS filename)              */
   PROC SQL;
    select sum(table1+table2)
      as pass, carrier
    from mkt.carriers
    where carrier in
      ('AA','JI')
      and bstate='TX'
   group by carrier ;
   quit ;
This example shows you how to make a query against SPD Server tables, using your original SQL code, without using the SQL pass-through facility.

Example 3: PROC SQL Query, Using Pass-through Facility, Pointing to an SPD Server Table, with a LIBNAME Example, with SQL Code Modified, to Avoid Using a two-level SAS Filename

Why would you want to do this? By modifying your SQL code slightly, you can use the pass-through facility to perform the work and send the results to the client.
Example 3 Code
Code
//* Query spds LIBREF (mkt) (pass-through one-level LIBREF )*/
   PROC SQL;
   connect to sasspds
    (dbq='mkt'
     serv='8770'
     user='anonymous'
     host='localhost') ;
   select *
     from connection
     to sasspds

   (select sum(table1+table2)
     as pass,
    carrier
      from carriers
      where carrier
      in('AA','JI')
      and bstate='TX'
   group by carrier) ;
   quit ;

Example 4: PROC SQL Query, Using the Pass-through Facility, Pointing to an SPD Server Table, Executing a Libref Statement on the Server, So That Existing Code Can Be Used "as Is"

Why would you want to do this? Without modifying your SQL code, you can use the pass-through facility so that SPD Server performs the work and sends the results to the client.
Example 4 Code
Code
   PROC SQL ;
   connect to sasspds
    (dbq='mkt'
     serv='8770'
     user='anonymous'
     host='localhost');
  /* Issue passthru LIBREF (mkt) for use */
  /* in two-level queries                */
   execute(LIBREF mkt)
     by sasspds;
  /* Query the SPD Server LIBREF (mkt)   */
  /* that is a pass-through LIBREF       */
  select *
   from connection
   to sasspds

  (select sum(table1+table2)
    as pass,
  carrier from mkt.carriers
    where carrier in('AA','JI')
    and bstate='TX'
  group by carrier) ;
  quit;

Can I use pass-through async to create multiple indexes on a single existing table?

No. Multiple create indexes on the same existing table are not supported with async.
PROC DATASETS can be used to create indexes in parallel for a single existing table.
For example:
PROC DATASETS lib=foo ;
modify customer(
  asyncindex=yes
  index=(state) ;
index create state ;
index create phoneno ;
index create custno ;
index create totsales ;
quit ;

Can I use pass-through async to create multiple indexes on existing tables?

Yes. As long as you create only one index per table, the index creation can be run with async.
For example, to create an index State on table Customer, an index Totals on table Billing, and an index Orderno on table Orders asynchronously, you use the following code:
execute(begin async operation)
  by sasspds ;

execute(create index state on customer(state))
  by sasspds ;

execute(create index totals on billing(totals))
  by sasspds ;

execute(create index orderno on orders(orderno))
  by sasspds ;

execute(end async operation)
  by sasspds ;

What size increases can I expect for tables that are stored in domains with BACKUP=YES?

Tables created in domains that have Backup=YES will have an additional 17 bytes per observation.

What files are created in the SPD Server WORKPATH directory?

Some SPD Server operations can create temporary files that are too large to fit in memory. The SPD Server WORKPATH directory contains these temporary files. The temporary files in the WORKPATH directory are also called spill files, because SPD Server spills intermediate files from volatile memory to a temporary file. Temporary files in the WORKPATH directory are removed when the operation that generated the temporary file completes. There are four types of temporary that SPD Server can store in the WORKPATH directory:
Parallel Sort Spill Files
When SPD Server sorts a table, it uses multiple concurrent threads to sort portions of the table in memory. Each thread creates a sort bin to which it can spill temporary results files to. When the sort threads complete, the sort bin contents are merged to produce the final result. The sort bin files are named using the following convention:
spdssr_<pid>_<unique_thread_id>.0.0.0.spds9
pid is the identifier for the SPDSBASE user proxy that is performing the parallel sort operation.
Parallel GROUP BY Spill Files
When SPD Server performs a parallel GROUP BY operation, it uses multiple concurrent threads to group intermediate results in parallel. Each thread creates a sort bin to which it can spill temporary results files to. When the GROUP BY threads complete, the sort bin contents are merged to produce the final result. The GROUP BY bin files are named using the following convention:
spdspgb_<pid>_<unique_thread_id>.0.0.0.spds9
pid is the identifier for the SPDSBASE user proxy that is performing the parallel GROUP BY operation.
Parallel Join Spill Files
When SPD Server performs a parallel join operation, it uses multiple concurrent threads to join portions of the table in memory. Each thread creates a join bin to which it can spill temporary results files to. When the parallel join threads complete, the join bin contents are merged to produce the final result. The join bin files are named using the following convention:
spdspllj_<pid>_<unique_thread_id>.0.0.0.spds9
pid is the identifier for the SPDSBASE user proxy that is performing the parallel join operation.
SQL Temporary Work Files
SPD Server creates temporary work space files for SQL operations that require intermediate results to be spilled to a workspace file. These temporary workspace files are named using the following convention:
spds_<pid>_<unique_id>.0.0.0.spds9
pid is the identifier for the SPDSBASE SQL proxy that created the temporary workspace file.
Debug Log Files
SPD Server can create temporary work space files that contain SPD Server debugging information. The debugging information can be found in the SPD Server SAS log, under headings such as “WHERE Debug” or “SQL Planning.” These temporary workspace files are generally small and are named using the following convention:
spdslog_<pid>_<unique_id>.0.0.0.spds9
pid is the identifier for the SPDSBASE user proxy that created the temporary workspace file.

How can I identify the spdsbase user proxy processes for a given SPD Server user?

The spdsbase process contains identifier start-up parameters. The start-up parameters can be used to determine which SPD User submitted the command to create a particular proxy process.
  • The first parameter is the SPD Server User ID that the spdsbase proxy process was created on behalf of.
  • The second parameter is the operating system User ID of the client that the spdsbase proxy process was created on behalf of.
  • The fifth parameter is the IP address of the client that the spdsbase proxy process was created on behalf of.
You can display spdsbase process parameter information by submitting the command ps -ef in UNIX, or by viewing the command line information in Windows task manager.