Creating a Metadata File for the Input Data File

Before running the %INDHD_RUN_MODEL macro, the metadata file for the input data file must be present in an HDFS location.
There are three ways that the metadata file can be present depending on where the file exists.
  • The file is not in Hive.
    No metadata exists. You must use PROC HDMD to create the metadata file. Here is an example:
    /***************************************************************
    * Assign a libname to the Hadoop Engine that specifies the 
    * locations where data, metadata,and temporary data will be stored.
    *******************************************************************/
    libname hdlib hadoop
       server=hdoop
       user=hadoop_user1
       HDFS_METADIR="/metadata"
       HDFS_TEMPDIR="/tmp";
    
    /***************************************************************
    * Create a metadata file for input file defined under data_file.
    * The metadata file name is defined in the NAME= option and is 
    * stored under the HDFS folder defined in HDFS_METADIR.
    ****************************************************************/
    proc hdmd
       name=hdlib.pilotmd 
       format=delimited
       sep=',’
       data_file='pilot.dat';
    
          column EmployeeID char(6);
          column FirstName  char(13);
          column LastName   char(15);
          column JobCode    char(7);
          column Salary     char(6);
          column Category   char(3);
    run;
    
  • The file is in a Hive library.
    Metadata is associated with the file in Hive, but the metadata is not in HDMD format. You must generate an HDMD file for it. Here is an example:
    /********************************************************
    * Assigns a libname to the Hadooop Engine that specifies the locations
    * where data, metadata and temporary data will be stored.
    **********************************************************************/
    libname gridlib hadoop server="cdh123"
       user="hadoop"
       HDFS_TEMPDIR="/data/temp"
       HDFS_DATADIR="/data/dlm/data"
       HDFS_METADIR="/data/dlm/meta"
       DBCREATE_TABLE_EXTERNAL=NO;
    
    /********************************************************
    * Assigns a libname to the Hadooop Engine that specifies 
    * that the data and metdata will be in Hive
    *********************************************************/
    libname hive hadoop server="cdh123"
       user=hadoop
       database=hpsumm
       subprotocol=hive2;
    
    /************************************************************
    * Creates an HDMD file from Hive table 'stthive' and stores
    * it under the directory specified in HDFS_METADIR option of
    * the 'gridlib' libname.
    *************************************************************/
    proc hdmd
       from=hive.stthive
       name=gridlib.sttout;
    run;
    
  • The file is created with the ACCESS Hadoop engine.
    When a file is created with a Hadoop LIBNAME statement that contains the HDFS_DATADIR= and HDFS_METADIR options, the HDMD file is automatically generated. Here is an example:
    /****************************************************
    * Assigns a libname to the Hadooop Engine that specifies 
    * the locations where data, metadata and temporary data 
    * will be stored.
    ********************************************************/
    libname gridlib hadoop server="cdh123"
       user="hadoop"
      
       HDFS_TEMPDIR="/data/temp"
       HDFS_DATADIR="/data/dlm/data"
       HDFS_METADIR="/data/dlm/meta"
       DBCREATE_TABLE_EXTERNAL=NO;
    
    /*****************************************
    * Assigns a libname to a local SAS directory
    *****************************************/
    libname mydata "C:/tmp/myfiles"
    
    /**************************************************************
    * Creates a Hadoop file from mydata.intrid along with its HDMD
    * file and stores under what was specified on HDFS_DATADIR of 
    * 'gridlib'.
    ***************************************************************/
    proc sql;
    create table gridlib.flights98
            as select * from mydata.intrid;
    quit;
    
    
The metadata file has the extension .sashdmd.
For more information, see PROC HDMD in SAS/ACCESS for Relational Databases: Reference.