Deploy Gathr Using Ansible Playbook
Deploy Gathr and its essential components, such as Elasticsearch, RabbitMQ, Zookeeper, PostgreSQL, Spark Standalone, and HDFS/YARN, using Ansible Playbook.
Prerequisites
Ensure the following prerequisites to successfully set up and deploy these components.
1. Ansible Server
- Ansible should be installed on the server from where you will run the playbook. 
- Password less SSH must be enabled from the Ansible server to the Gathr and HAProxy servers. 
2. System Requirements for Gathr
- Operating System: RHEL 9 / OEL 9 is required. 
- High Availability (HA) Deployment: HAProxy and Gathr must run on separate machines. 
- Hardware Specifications: - Service - CPU - RAM - Optional - Zookeeper - 2 CPU - 4 GB - No - Postgres - 2 CPU - 4 GB - No - Gathr - 8 CPU - 32 GB - No - Elasticsearch - 2 CPU - 4 GB - Yes - RabbitMQ - 1 CPU - 2 GB - Yes - Spark Standalone - Custom - Custom - Yes - HDFS & YARN - Custom - Custom - Yes 
- Software Dependencies: - Python 3.9 must be available (preinstalled on RHEL 9 / OEL 9). 
- SELinux must be disabled on the servers. 
 
3. Shared Location
- Create a shared path accessible by both Gathr and Spark/YARN nodes. 
- For Gathr Analytics Deployment, ensure the same path is available on Analytics Nodes. 
4. HAProxy Server
- Ensure the HAProxy Server is running with SSL for Analytics Deployment. Non-SSL is acceptable if not using Analytics. 
- A valid .pem file is required for SSL on HAProxy. 
5. Port Availability
Ensure the following ports are available on the servers where the services are deployed:
| Service | Ports (Default) | Optional | Notes | 
|---|---|---|---|
| Zookeeper | 2181, 2888, 3888, 8081 | No | Ports can be modified before running the playbook. | 
| Postgres | 5432 | No | Port can be modified before running the playbook. | 
| Gathr | 8090, 9595 | No | Ports can be modified before running the playbook. | 
| Elasticsearch | 9200, 9300 | Yes | Ports can be modified before running the playbook. | 
| RabbitMQ | 5672, 15672 | Yes | Ports can be modified before running the playbook. | 
| Spark Standalone | 7077, 8080, 8081, 6066, 18080 | Yes | Ports can be modified before running the playbook. | 
| HDFS & YARN (Non-HA) | 8020, 9870, 9864, 9866, 9867, 8141, 8088, 8030, 8025, 8050, 8042, 8040, 19888, 10020, 10033 | Yes | Ports are non-configurable. | 
| HDFS & YARN (HA) | 8485, 8480, 8020, 50070, 8019, 9866, 9867, 50075, 8141, 8088, 8030, 8025, 8050, 8042, 8040, 19888, 10020, 10033 | Yes | Ports are non-configurable. | 
6. Application User and Group
- Create an application user on the servers with password-based authentication.
7. (Optional) OpenAI API Key Requirement
- An OpenAI API key is necessary for the Gathr IQ features. 
- Internet access is needed for this feature to work. 
8. (Optional) Data Intelligence
- Shared Storage for Logs and Data (Optional for single-node setups) - Configure a shared mount (EFS/NFS) on the Docker-installed machine for Data Intelligence logs and data.
 
- Logstash Configuration - Install and configure Logstash (version 6.8.23) on the chosen node, ensuring it is seamlessly integrated with the designated Elasticsearch instance. 
- Additionally, configure Logstash to process logs from the DI log directory for efficient log management. 
- Logstash service should run on the same node where DI application logs are stored. 
 
- Access Control for Gathr User - Ensure Gathr users have access to Data Intelligence Docker logs.
- Set up ACL permissions if necessary.
 
- Load Balancer for Multi-DI Deployments (Optional for single node DI Docker) - Configure a private load balancer for multi-DI Docker deployments.
 
- AWS CloudWatch Integration (Applicable for AWS Deployments Only) - Configure AWS CloudWatch and Agent for log monitoring.
 
9. (Optional) For MLflow
- Ensure a private docker registry is available for MLflow images. 
- Kubernetes Cluster should be operational. 
- Artifact Storage Configuration: - MLFlow generates artifacts when a model is registered, which can be stored in either S3 or NFS. 
- If using S3 or Ceph for artifact storage, ensure you have the following credentials: - S3 Access Key 
- S3 Secret Key 
- S3 Endpoint URL 
 
 
- If using NFS, make sure you have the PVC (Persistent Volume Claim) name that the Gathr pod is using to share the path between the Gathr and MLFlow pods. 
- (Optional) Private Docker Registry Access should be available to pull from the private docker repository. 
Steps to Install Gathr and Supported Components
- Download the playbook bundle shared by the Gathr team. 
- Extract the bundle using the following command: - tar -xzf GathrBundle.tar.gz
- Navigate to the Playbook Path: - cd /path/to/playbook
- Open the - saxconfig_parametersfile with a text editor. Refer comments in the file to modify the configuration properties as required for your environment.
- Execute the following command to reload the Ansible variables: - ./config.sh
- Execute the playbook with the following command: - export ANSIBLE_DISPLAY_SKIPPED_HOSTS=false ansible-playbook -i hosts installGathrStack.yml -vv
Post Deployment Validation
- Navigate to the Gathr UI using the URL - http://<GATHR_HOST>:8090/. When the interface loads, a license agreement page will display: - Check the “I accept” box and click the “Accept” button. 
- The Upload License page will then be visible:  - Upload your valid Gathr license and click “Confirm”.  
- Proceed by clicking “Continue” on the welcome page:  - This will lead you to the Login Page:  - Log in using the default superuser credentials: - Email: super@mailinator.com 
- Password: superuser 
 
- The deployment of Gathr is now complete.  
If you have any feedback on Gathr documentation, please email us!