Before running the %INDHD_RUN_MODEL macro, the metadata
file for the input data file must be present in an HDFS location.
There are three ways
that the metadata file can be present depending on where the file
exists.
-
The file is not in Hive.
No metadata exists.
You must use PROC HDMD to create the metadata file. Here is an example:
/***************************************************************
* Assign a libname to the Hadoop Engine that specifies the
* locations where data, metadata,and temporary data will be stored.
*******************************************************************/
libname hdlib hadoop
server=hdoop
user=hadoop_user1
HDFS_METADIR="/metadata"
HDFS_TEMPDIR="/tmp";
/***************************************************************
* Create a metadata file for input file defined under data_file.
* The metadata file name is defined in the NAME= option and is
* stored under the HDFS folder defined in HDFS_METADIR.
****************************************************************/
proc hdmd
name=hdlib.pilotmd
format=delimited
sep=',’
data_file='pilot.dat';
column EmployeeID char(6);
column FirstName char(13);
column LastName char(15);
column JobCode char(7);
column Salary char(6);
column Category char(3);
run;
-
The file is in a Hive library.
Metadata
is associated with the file in Hive, but the metadata is not in HDMD
format. You must generate an HDMD file for it. Here is an example:
/********************************************************
* Assigns a libname to the Hadooop Engine that specifies the locations
* where data, metadata and temporary data will be stored.
**********************************************************************/
libname gridlib hadoop server="cdh123"
user="hadoop"
HDFS_TEMPDIR="/data/temp"
HDFS_DATADIR="/data/dlm/data"
HDFS_METADIR="/data/dlm/meta"
DBCREATE_TABLE_EXTERNAL=NO;
/********************************************************
* Assigns a libname to the Hadooop Engine that specifies
* that the data and metdata will be in Hive
*********************************************************/
libname hive hadoop server="cdh123"
user=hadoop
database=hpsumm
subprotocol=hive2;
/************************************************************
* Creates an HDMD file from Hive table 'stthive' and stores
* it under the directory specified in HDFS_METADIR option of
* the 'gridlib' libname.
*************************************************************/
proc hdmd
from=hive.stthive
name=gridlib.sttout;
run;
-
The file is created with the ACCESS
Hadoop engine.
When
a file is created with a Hadoop LIBNAME statement that contains the
HDFS_DATADIR= and HDFS_METADIR options, the HDMD file is automatically
generated. Here is an example:
/****************************************************
* Assigns a libname to the Hadooop Engine that specifies
* the locations where data, metadata and temporary data
* will be stored.
********************************************************/
libname gridlib hadoop server="cdh123"
user="hadoop"
HDFS_TEMPDIR="/data/temp"
HDFS_DATADIR="/data/dlm/data"
HDFS_METADIR="/data/dlm/meta"
DBCREATE_TABLE_EXTERNAL=NO;
/*****************************************
* Assigns a libname to a local SAS directory
*****************************************/
libname mydata "C:/tmp/myfiles"
/**************************************************************
* Creates a Hadoop file from mydata.intrid along with its HDMD
* file and stores under what was specified on HDFS_DATADIR of
* 'gridlib'.
***************************************************************/
proc sql;
create table gridlib.flights98
as select * from mydata.intrid;
quit;
The metadata file has
the extension .sashdmd.
For more information,
see PROC HDMD in SAS/ACCESS for Relational Databases: Reference.