CDP Environment Prerequisites for Deploying Gathr

Below are the CDP environment prerequisites for installing Gathr on CentOS/RHEL:

  • Gathr Deployment node must be Cluster’s Edge node or Ghost node. The SPARK, HADOOP, HIVE service’s parcels must be available on the machine.

  • Cluster URL must be accessible and should have valid admin user details.

  • If cluster is running with SSL, then user must import the required certificates into Gathr node’s JVM.

  • If Kafka service is available, then the keystore.jks and trustore.jks files with their respective certificates’ passwords are needed.

  • If Kerberos is enabled, then two keytab files are needed; one, the service keytab file and the other, zk keytab file.

  • If Namenode and Resource manager are running with HA and Kerberos then the core-site.xml, hdfs-site.xml, yarn-site.xml, tez-site.xml, and krb5.conf files are required.

  • Gathr requires HDFS file system locations to store pipeline jar and pipeline data.

  • With Application user, Gathr should be able to submit spark jobs into Yarn Queues.

  • Local file system requires dir for storing keytabs and hive connection.

    Example: /tmp/Kerberos, /tmp/hive

Top