/u/hadoop/hdist/cdh/cdh52
,
then the following syntax sets the option appropriately.
HADOOPJAR=/u/hadoop/hdist/cdh/cdh52;
HADOOPJAR=/u/hadoop/hdist/cdh/cdh52:/u/myjars/cdh/cdh52;
/u/hadoop/hdist/cdh/confdir
,
then the following syntax sets the option appropriately:
HADOOPCFG=/u/hadoop/hdist/cdh/confdir;
FOO.ORG
, and your
keytab file is in /<keytab-path>/userid.keytab
,
then the following LIBNAME parameter file syntax sets the HADOOPREALM=
and HADOOPKEYTAB= options appropriately:
HADOOPREALM=FOO.ORG;
HADOOPKEYTAB=<keytab_path/userid.keytab>;
rc.spds
file.
If you define an SPD Server LIBNAME domain that contains a HADOOP=YES
setting, SPD Server creates a directory using the following schema:
<HADOOPACLPATH>/HADOOPACLS/<domain_name>
.
<HADOOPACLPATH>
value
is either the HADOOPACLPATH= parameter option, or the ACLDIR= start
up option in the rc.spds
file. The specified
location contains the ACLs for the declared libref.
InstallDir/site
directory.
If you do not keep the file in this location, you need to specify
the location using the ACLDIR start up option. This location must
be on the local file system, not HDFS. This restriction exists for
the following reasons:
Libname
MYDOMAIN added to Name Server (HADOOP=yes)
HADOOPCFG=/u/hadoop/hdist/cdh/confdir;
HADOOPJAR=/u/hadoop/hdist/cdh/cdh52;
HADOOPACLPATH=/u/mylocal/acls/HADOOPACLS/MYDOMAIN;
LIBNAME
foo sasspds ‘public’ server=myhost.5400 user=”anonymous”;
LIBNAME foo sasspds 'mydomain' server=myhost.5400 user="anonymous"; NOTE: This is a SPD 5.2 Engine executing SAS (r) 9.4 (TS1M2) on the Linux platform. NOTE: User anonymous(ACL Group ) connected to SPD(LAX) 5.2 server at 10.24.7.79. NOTE: Libref FOO was successfully assigned as follows: Engine: SASSPDS Physical Name: :23107/my_Hadoop_domain_path/ data foo.atable; x=1; run; NOTE: The data set FOO.TABLE has 1 observations and 1 variables. NOTE: DATA statement useed (Total process time): real time 3.69 seconds cpu time 0.06 seconds proc datasets lib=foo; run; Directory Libref foo Engine SASSPDS Physical Name :23107/my_Hadoop_domain_path/ Local Host Name myhost Local Host IP addr 10.24.7.79 Server Hostname N/A Server IP addr 10.24.7.79 Server Portno 55120 Free Space (Kbytes) 9.0071993E15 Metapath '/my_Hadoop_domain_path/' Indexpath '/my_Hadoop_domain_path/' Datapath '/my_Hadoop_domain_path/' Member # Name Type 1 ATABLE DATA NOTE: PROCEDURE DATASETS used (Total process time): real time 2:02.19 cpu time 0.16 seconds PROC CONTENTS data=foo.atable; run; PROC PRINT data=foo.atable; run; The CONTENTS Procedure Data Set Name FOO.ATABLE Observations 1 Member Type DATA Variables 1 Engine SASSPDS Indexes 0 Created 01/28/2015 11:01:58 Observation Length 8 Last Modified 01/28/2015 11:01:58 Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Data Representation Default Encoding latin1 Western (ISO) Engine/Host Dependent Information Blocking Factor (obs/block) 131072 ACL Entry NO The CONTENTS Procedure Engine/Host Dependent Information ACL User Access(R,W,A,C) (Y,Y,Y,Y) ACL UserName ANONYMOU ACL OwnerName ANONYMOU Data set is Ranged NO Data set is a Cluster NO Alphabetic List of Variables and Attributes # Variable Type Len 1 x Num 8 NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.50 seconds cpu time 0.04 seconds
specifies whether a domain can access data in the Hadoop file system.
specifies that a domain can access data in the Hadoop file system.
specifies that a domain cannot access data in a Hadoop file system.
Valid in | LIBNAME parameter file |
Default | NO |
specifies the Java version used in the Hadoop environment when performing SPD Server WHERE processing optimization with MapReduce.
Valid in | SPD Server parameter file |
Interactions | The HADOOPACCELJVER= option affects only domains that specify HADOOP=YES or domains that specify any other SPD Server HADOOP* option. |
The SPD engine submits the Java class to the Hadoop cluster as a component in a MapReduce program. By requesting that data subsetting be performed in the Hadoop cluster, performance might be improved by taking advantage of the filtering and ordering capabilities of the MapReduce framework. As a result, only the subset of the data is returned to the SAS client. Performance is often improved with large data sets when the WHERE expression qualifies only a relatively small subset. The default value is 1.6. | |
See | For more information about SPD Server and WHERE optimization with MapReduce, see SPD Server Hadoop WHERE Processing Optimization Data Set and SAS Code Requirements |
Example | HADOOPACCELJVER= <Java-version> ; |
specifies whether to perform SPD Server WHERE processing optimization with MapReduce by performing data subsetting in the Hadoop cluster.
Valid in | SPD Server parameter file |
Interactions | The HADOOPACCELWH= option affects only domains that specify HADOOP=YES or domains that specify any other SPD Server HADOOP* option. |
By requesting that data subsetting be performed in the Hadoop cluster, performance might be improved by taking advantage of the filtering and ordering capabilities of the MapReduce framework. As a result, only the subset of the data is returned to the SPD Server client. Performance is often improved with large data sets when the WHERE expression qualifies only a relatively small subset. | |
See | For more information about SPD Server and WHERE optimization with MapReduce, see SPD Server Hadoop WHERE Processing Optimization Data Set and SAS Code Requirements |
Example | HADOOPACCELWH= YES|NO ; |
specifies the local file system directory to store Hadoop ACL files.
Valid in | SPD Server parameter file |
Default | The ACL files are created in the location identified in the –acldir spdsserv process start-up parameter. This is the same location as the psmgr database. |
Interaction | If you specify a value for the -hadoopaclpath option with the SPDSCLEAN utility, that value overrides any other HADOOPACLPATH= settings that are specified in the SPD Server parameter file. |
specifies the local path to the Hadoop configuration files.
Valid in | LIBNAME parameter file |
SPD Server parameter file | |
Interactions | When you specify this option in the SPD parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. | |
Example | HADOOPCFG=/u/hadoop/hdist/cdh/confdir ;
|
specifies the local path to the Hadoop JAR files.
Valid in | LIBNAME parameter file |
SPD Server parameter file | |
Interactions | When you specify this option in the SPD Server parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. | |
Example | HADOOPJAR=/u/hadoop/hdist/cdh/cdh52 ;
|
specifies whether a Hadoop cluster requires authentication using Kerberos.
specifies that a Hadoop cluster requires authentication using Kerberos.
specifies that a Hadoop cluster does not require authentication using Kerberos.
Valid in | LIBNAME parameter file |
Interactions | To access a Hadoop cluster that is secured with Kerberos, the HADOOPREALM= and the HADOOPKEYTAB= options must be specified in either the LIBNAME parameter file or the SPD Server parameter file. |
If either the HADOOPREALM= HADOOPKEYTAB= option is specified in the SPD Server parameter file, then a domain that has HADOOP=YES specified is secured by Kerberos unless HADOOPKERBEROS=NO. |
specifies the path to a Kerberos keytab file when accessing a Hadoop cluster that is secured with Kerberos. If you specify any SPD Server HADOOP* domain option, then SPD Server assumes that HADOOP=YES is set for that domain.
Valid in | LIBNAME parameter file |
SPD Server parameter file | |
Interactions | When you specify this option in the SPD Server parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. |
specifies the Kerberos realm to use to access a Hadoop cluster that is secured with Kerberos.
Valid in | LIBNAME parameter file |
SPD Server parameter file | |
Interactions | When you specify this option in the SPD Server parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. |
specifies the path to the directory in the Hadoop Distributetd File System that stores the temporary results of the MapReduce output.
Valid in | SPD Server parameter file |
Default | /tmp |
Interaction | This option affects only domains that specify HADOOP=YES or domains that specify any other SPD Server HADOOP* option. |
/site
directory
of your SPD Server installation location.
spdsclean
script,
or you can incorporate the parameter value statements within the spdsclean
script
itself.
export SAS_HADOOP_CONFIG_PATH=/u/fedadmin/hadoopcfg/cdh52p1 export SAS_HADOOP_JAR_PATH=/u/fedadmin/hadoopjars/cdh52
spdsclean
script
to reference Hadoop configuration information that is defined in your
LIBNAME parameter file, your SPD Server parameter file, or both.
spdsclean
script:spdsclean -libnamefile libnames.parm -parmfile spdsserv.parm
libname=Stuff1 pathname=/user/userlname hadoopcfg=/u/fedadmin/hadoopcfg/cdh52p1 hadoopjar=/u/fedadmin/hadoopjars/cdh52 hadoop=yes;
libname=foo pathname=/user/userlname/mydomain hadoopcfg=/u/hadoop/hdist/cdh/confdir/hdp20p1 hadoopjar=/u/hadoop/hdist/cdh/sas_cdh20u;
HADOOPCFG=/u/hadoop/hdist/cdh/confdir/hdp20p1; HADOOPJAR=/u/hadoop/hdist/cdh/sas_cdh20u;
libname=Stuff1
pathname=/user/userlname/mydomain1 hadoop=yes;
libname=Stuff2
pathname=/user/userlname/mydomain2 hadoop=yes;
libname
Stuff1 sasspds ‘Stuff1’ server=lax94d01.14526 user=”anonymous”;
1.6
.
/tmp
.
The Hadoop workpath that you specify must be an existing path structure.
If you specify a HADOOPWORKPATH= parameter that does not exist, the
map reduce job fails.
NO
in
the server parameter file by the SPD Server administrator, no WHERE
processing optimization is allowed. User overrides are not available.
YES
in
the server parameter file by the SPD Server administrator, SPD Server
users can override and disable WHERE processing optimization by setting
the SPDSACWH macro to NO
or by issuing
the ACCELWHERE=NO table option.
YES
or
by issuing the ACCELWHERE=YES table option in a statement.
WHERE lastname;
WHERE z = —(x+y);
WHERE lastname=: ‘S’;
create table as
select ….
query, causing the selected rows to
be read from SPD Server to SAS. The selected rows are returned to
SPD Server to create the table.