HCatalog is a table
management layer that presents a relational view of data in the HDFS
to applications within the Hadoop ecosystem. With HCatalog, data structures
that are registered in the Hive metastore, including SAS data, can
be accessed through standard MapReduce code and Pig. HCatalog is part
of Apache Hive.
The SAS In-Database
Scoring Accelerator for Hadoop uses HCatalog to read the native Hive
file types Avro, ORC, Parquet, and RCFile.
By default, an output
file is delimited. You can use the %INDHD_RUN_MODEL macro’s OUTRECORDFORMAT argument to write a
binary file.
Consider these requirements
when using HCatalog:
-
Data that you want to access with
HCatalog must first be registered in the Hive metastore.
-
The recommended Hive version for
the SAS In-Database Scoring Accelerator for Hadoop is 0.13.0.
-
Support for HCatalog varies by
vendor. For more information, see the documentation for your Hadoop
vendor.
-
There are additional configuration
steps that are needed when processing HCatalog files.