CDP Environment Prerequisites for Deploying Gathr
Below are the CDP environment prerequisites for installing Gathr on CentOS/RHEL:
Gathr Deployment node must be Cluster’s Edge node or Ghost node. The SPARK, HADOOP, HIVE service’s parcels must be available on the machine.
Cluster URL must be accessible and should have valid admin user details.
If cluster is running with SSL, then user must import the required certificates into Gathr node’s JVM.
If Kafka service is available, then the keystore.jks and trustore.jks files with their respective certificates’ passwords are needed.
If Kerberos is enabled, then two keytab files are needed; one, the service keytab file and the other, zk keytab file.
If Namenode and Resource manager are running with HA and Kerberos then the core-site.xml, hdfs-site.xml, yarn-site.xml, tez-site.xml, and krb5.conf files are required.
Gathr requires HDFS file system locations to store pipeline jar and pipeline data.
With Application user, Gathr should be able to submit spark jobs into Yarn Queues.
Local file system requires dir for storing keytabs and hive connection.
Example: /tmp/Kerberos, /tmp/hive
If you have any feedback on Gathr documentation, please email us!