/u/hadoop/hdist/cdh/cdh52,
then the following syntax sets the option appropriately.
HADOOPJAR=/u/hadoop/hdist/cdh/cdh52;
HADOOPJAR=/u/hadoop/hdist/cdh/cdh52:/u/myjars/cdh/cdh52;/u/hadoop/hdist/cdh/confdir,
then the following syntax sets the option appropriately:
HADOOPCFG=/u/hadoop/hdist/cdh/confdir;FOO.ORG, and your
keytab file is in /<keytab-path>/userid.keytab,
then the following LIBNAME parameter file syntax sets the HADOOPREALM=
and HADOOPKEYTAB= options appropriately:
HADOOPREALM=FOO.ORG;
HADOOPKEYTAB=<keytab_path/userid.keytab>;rc.spds file.
If you define an SPD Server LIBNAME domain that contains a HADOOP=YES
setting, SPD Server creates a directory using the following schema:
<HADOOPACLPATH>/HADOOPACLS/<domain_name>.
<HADOOPACLPATH> value
is either the HADOOPACLPATH= parameter option, or the ACLDIR= start
up option in the rc.spds file. The specified
location contains the ACLs for the declared libref.
InstallDir/site directory.
If you do not keep the file in this location, you need to specify
the location using the ACLDIR start up option. This location must
be on the local file system, not HDFS. This restriction exists for
the following reasons:
Libname
MYDOMAIN added to Name Server (HADOOP=yes)HADOOPCFG=/u/hadoop/hdist/cdh/confdir;HADOOPJAR=/u/hadoop/hdist/cdh/cdh52; HADOOPACLPATH=/u/mylocal/acls/HADOOPACLS/MYDOMAIN;LIBNAME
foo sasspds ‘public’ server=myhost.5400 user=”anonymous”;
LIBNAME foo sasspds 'mydomain' server=myhost.5400 user="anonymous";
NOTE: This is a SPD 5.2 Engine
executing SAS (r) 9.4 (TS1M2) on the Linux platform.
NOTE: User anonymous(ACL Group ) connected to SPD(LAX) 5.2 server at
10.24.7.79.
NOTE: Libref FOO was successfully assigned as follows:
Engine: SASSPDS
Physical Name: :23107/my_Hadoop_domain_path/
data foo.atable;
x=1;
run;
NOTE: The data set FOO.TABLE has 1 observations and 1 variables.
NOTE: DATA statement useed (Total process time):
real time 3.69 seconds
cpu time 0.06 seconds
proc datasets lib=foo; run;
Directory
Libref foo
Engine SASSPDS
Physical Name :23107/my_Hadoop_domain_path/
Local Host Name myhost
Local Host IP addr 10.24.7.79
Server Hostname N/A
Server IP addr 10.24.7.79
Server Portno 55120
Free Space (Kbytes) 9.0071993E15
Metapath '/my_Hadoop_domain_path/'
Indexpath '/my_Hadoop_domain_path/'
Datapath '/my_Hadoop_domain_path/'
Member
# Name Type
1 ATABLE DATA
NOTE: PROCEDURE DATASETS used (Total process time):
real time 2:02.19
cpu time 0.16 seconds
PROC CONTENTS data=foo.atable; run;
PROC PRINT data=foo.atable; run;
The CONTENTS Procedure
Data Set Name FOO.ATABLE Observations 1
Member Type DATA Variables 1
Engine SASSPDS Indexes 0
Created 01/28/2015 11:01:58 Observation Length 8
Last Modified 01/28/2015 11:01:58 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation Default
Encoding latin1 Western (ISO)
Engine/Host Dependent Information
Blocking Factor (obs/block) 131072
ACL Entry NO
The CONTENTS Procedure
Engine/Host Dependent Information
ACL User Access(R,W,A,C) (Y,Y,Y,Y)
ACL UserName ANONYMOU
ACL OwnerName ANONYMOU
Data set is Ranged NO
Data set is a Cluster NO
Alphabetic List of Variables and Attributes
# Variable Type Len
1 x Num 8
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.50 seconds
cpu time 0.04 seconds
specifies whether a domain can access data in the Hadoop file system.
specifies that a domain can access data in the Hadoop file system.
specifies that a domain cannot access data in a Hadoop file system.
| Valid in | LIBNAME parameter file |
| Default | NO |
specifies the Java version used in the Hadoop environment when performing SPD Server WHERE processing optimization with MapReduce.
| Valid in | SPD Server parameter file |
| Interactions | The HADOOPACCELJVER= option affects only domains that specify HADOOP=YES or domains that specify any other SPD Server HADOOP* option. |
| The SPD engine submits the Java class to the Hadoop cluster as a component in a MapReduce program. By requesting that data subsetting be performed in the Hadoop cluster, performance might be improved by taking advantage of the filtering and ordering capabilities of the MapReduce framework. As a result, only the subset of the data is returned to the SAS client. Performance is often improved with large data sets when the WHERE expression qualifies only a relatively small subset. The default value is 1.6. | |
| See | For more information about SPD Server and WHERE optimization with MapReduce, see SPD Server Hadoop WHERE Processing Optimization Data Set and SAS Code Requirements |
| Example | HADOOPACCELJVER= <Java-version> ; |
specifies whether to perform SPD Server WHERE processing optimization with MapReduce by performing data subsetting in the Hadoop cluster.
| Valid in | SPD Server parameter file |
| Interactions | The HADOOPACCELWH= option affects only domains that specify HADOOP=YES or domains that specify any other SPD Server HADOOP* option. |
| By requesting that data subsetting be performed in the Hadoop cluster, performance might be improved by taking advantage of the filtering and ordering capabilities of the MapReduce framework. As a result, only the subset of the data is returned to the SPD Server client. Performance is often improved with large data sets when the WHERE expression qualifies only a relatively small subset. | |
| See | For more information about SPD Server and WHERE optimization with MapReduce, see SPD Server Hadoop WHERE Processing Optimization Data Set and SAS Code Requirements |
| Example | HADOOPACCELWH= YES|NO ; |
specifies the local file system directory to store Hadoop ACL files.
| Valid in | SPD Server parameter file |
| Default | The ACL files are created in the location identified in the –acldir spdsserv process start-up parameter. This is the same location as the psmgr database. |
| Interaction | If you specify a value for the -hadoopaclpath option with the SPDSCLEAN utility, that value overrides any other HADOOPACLPATH= settings that are specified in the SPD Server parameter file. |
specifies the local path to the Hadoop configuration files.
| Valid in | LIBNAME parameter file |
| SPD Server parameter file | |
| Interactions | When you specify this option in the SPD parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
| If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. | |
| Example | HADOOPCFG=/u/hadoop/hdist/cdh/confdir;
|
specifies the local path to the Hadoop JAR files.
| Valid in | LIBNAME parameter file |
| SPD Server parameter file | |
| Interactions | When you specify this option in the SPD Server parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
| If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. | |
| Example | HADOOPJAR=/u/hadoop/hdist/cdh/cdh52;
|
specifies whether a Hadoop cluster requires authentication using Kerberos.
specifies that a Hadoop cluster requires authentication using Kerberos.
specifies that a Hadoop cluster does not require authentication using Kerberos.
| Valid in | LIBNAME parameter file |
| Interactions | To access a Hadoop cluster that is secured with Kerberos, the HADOOPREALM= and the HADOOPKEYTAB= options must be specified in either the LIBNAME parameter file or the SPD Server parameter file. |
| If either the HADOOPREALM= HADOOPKEYTAB= option is specified in the SPD Server parameter file, then a domain that has HADOOP=YES specified is secured by Kerberos unless HADOOPKERBEROS=NO. |
specifies the path to a Kerberos keytab file when accessing a Hadoop cluster that is secured with Kerberos. If you specify any SPD Server HADOOP* domain option, then SPD Server assumes that HADOOP=YES is set for that domain.
| Valid in | LIBNAME parameter file |
| SPD Server parameter file | |
| Interactions | When you specify this option in the SPD Server parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
| If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. |
specifies the Kerberos realm to use to access a Hadoop cluster that is secured with Kerberos.
| Valid in | LIBNAME parameter file |
| SPD Server parameter file | |
| Interactions | When you specify this option in the SPD Server parameter file, this option affects only domains that specify HADOOP=YES or domains that have other SPD Server HADOOP* option specified. |
| If you specify any SPD Server HADOOP* domain option, the SPD Server assumes that the HADOOP=YES option is set for that domain. |
specifies the path to the directory in the Hadoop Distributetd File System that stores the temporary results of the MapReduce output.
| Valid in | SPD Server parameter file |
| Default | /tmp |
| Interaction | This option affects only domains that specify HADOOP=YES or domains that specify any other SPD Server HADOOP* option. |
/site directory
of your SPD Server installation location.
spdsclean script,
or you can incorporate the parameter value statements within the spdsclean script
itself.
export SAS_HADOOP_CONFIG_PATH=/u/fedadmin/hadoopcfg/cdh52p1 export SAS_HADOOP_JAR_PATH=/u/fedadmin/hadoopjars/cdh52
spdsclean script
to reference Hadoop configuration information that is defined in your
LIBNAME parameter file, your SPD Server parameter file, or both.
spdsclean script:spdsclean -libnamefile libnames.parm -parmfile spdsserv.parm
libname=Stuff1 pathname=/user/userlname hadoopcfg=/u/fedadmin/hadoopcfg/cdh52p1 hadoopjar=/u/fedadmin/hadoopjars/cdh52 hadoop=yes;
libname=foo pathname=/user/userlname/mydomain hadoopcfg=/u/hadoop/hdist/cdh/confdir/hdp20p1 hadoopjar=/u/hadoop/hdist/cdh/sas_cdh20u;
HADOOPCFG=/u/hadoop/hdist/cdh/confdir/hdp20p1; HADOOPJAR=/u/hadoop/hdist/cdh/sas_cdh20u;
libname=Stuff1
pathname=/user/userlname/mydomain1 hadoop=yes;
libname=Stuff2
pathname=/user/userlname/mydomain2 hadoop=yes;libname
Stuff1 sasspds ‘Stuff1’ server=lax94d01.14526 user=”anonymous”;1.6.
/tmp.
The Hadoop workpath that you specify must be an existing path structure.
If you specify a HADOOPWORKPATH= parameter that does not exist, the
map reduce job fails.
NO in
the server parameter file by the SPD Server administrator, no WHERE
processing optimization is allowed. User overrides are not available.
YES in
the server parameter file by the SPD Server administrator, SPD Server
users can override and disable WHERE processing optimization by setting
the SPDSACWH macro to NO or by issuing
the ACCELWHERE=NO table option.
YES or
by issuing the ACCELWHERE=YES table option in a statement.
WHERE lastname;WHERE z = —(x+y);WHERE lastname=: ‘S’;create table as
select …. query, causing the selected rows to
be read from SPD Server to SAS. The selected rows are returned to
SPD Server to create the table.