Checklist to Verify the Hadoop Environment

A good understanding of your Hadoop environment is critical to a successful connection between SPD Server and Hadoop. It is recommended that you verify your Hadoop environment by becoming familiar with the following items:
  • Gain working knowledge of the Hadoop distribution that you are using (for example, Cloudera). You will also need working knowledge of HDFS and services for MapReduce 1, MapReduce 2, and YARN. For more information, see the Apache website or the vendor’s website.
  • Ensure that the HDFS, MapReduce, and YARN services are running on the Hadoop cluster.
  • Know the location of the MapReduce home.
  • Know the host name of the NameNode.
  • Determine where the HDFS cluster is running.
  • Understand and verify your Hadoop user authentication.
  • Understand and verify your security setup. It is recommended that you enable Kerberos for data security.
  • Verify that you can connect to the Hadoop cluster from your client machine outside of SPD Server with your defined security protocol.