Gathr Integrated Deployment
- Prerequisites- 1. Verify Docker and Docker Compose
- 2. System Requirements
- 3. PEM File Requirement for HAProxy (Optional for HAProxy with SSL)
- 4. Ansible Server
- 5. NFS Server
- 6. Port Availability
- 7. User Permissions for Docker
- 8. Spark Cluster Requirement for Pipeline Execution
- 9. Local Volume for Gathr Data
- 10. (Optional) OpenAI API Key Requirement
- 11. (Optional) Data Intelligence (DI)
- 12. (Optional) Mlflow Configuration
 
- Steps to Install Gathr Using Playbook
- Post Deployment Validation
In this article
- Prerequisites- 1. Verify Docker and Docker Compose
- 2. System Requirements
- 3. PEM File Requirement for HAProxy (Optional for HAProxy with SSL)
- 4. Ansible Server
- 5. NFS Server
- 6. Port Availability
- 7. User Permissions for Docker
- 8. Spark Cluster Requirement for Pipeline Execution
- 9. Local Volume for Gathr Data
- 10. (Optional) OpenAI API Key Requirement
- 11. (Optional) Data Intelligence (DI)
- 12. (Optional) Mlflow Configuration
 
- Steps to Install Gathr Using Playbook
- Post Deployment Validation
See the prerequisites and steps to deploy Gathr on container.
Prerequisites
1. Verify Docker and Docker Compose
Ensure that the latest versions of Docker and Docker Compose are installed on the Gathr deployment node.
Verify installation:
$ docker –version
$ docker compose version
2. System Requirements
- The base OS must be RHEL 9 / OEL 9. 
- Three machines are required: - Two for Gathr in HA (including Zookeeper, RMQ, ES, and Postgres). 
- One for HAProxy. 
 
- Note: HAProxy and Gathr cannot be on the same machine. 
- Minimum system requirements: - 8 CPU cores and 32 GB RAM.
 
- vm.max_map_count value should be 262144 in - /etc/sysctl.conf.
- Python3 and pip3 are required. 
- The Docker image includes Debian Linux 11 as its OS. 
3. PEM File Requirement for HAProxy (Optional for HAProxy with SSL)
- A valid .pem file is required to start HAProxy with SSL.
4. Ansible Server
- Ansible should be installed on the server where you will run the playbook. 
- Password-less SSH must be enabled from the Ansible server to the Gathr and HAProxy machines. 
- Note: Password-less SSH should be configured for the root user on the remote server. 
5. NFS Server
- The NFS server must be up and running. 
- Create a shared path that is accessible by both Gathr nodes. 
6. Port Availability
Ports 8090 and 9595 must be free on the machines where Gathr and HAProxy will be deployed.
Check port status:
$ netstat -anp | egrep "8090|9595"
7. User Permissions for Docker
- Create an application user on the server with password-based authentication. 
- Note: The UID and GID for this user must be the same across all machines. 
- The - dockerand- docker composecommands should be accessible by this application user.
- Ensure the application user is part of the docker group. 
Verify the user’s group membership:
$ id <username>
Confirm Docker commands are working:
$ docker ps
8. Spark Cluster Requirement for Pipeline Execution
One of the following Spark clusters is required:
- Spark Standalone v3.5.0 
- Spark on YARN v3.3.3 
- Spark on Kubernetes (K8s) v3.5.0 
These services must be accessible from the Docker container node.
9. Local Volume for Gathr Data
A local directory is required to store Gathr data.
Note: This path should be accessible on the Gathr nodes via NFS.
Create the directory:
$ mkdir -p /path/to/gathr-volume
10. (Optional) OpenAI API Key Requirement
- An OpenAI API key is required to use the Gathr IQ feature. 
- Internet access is mandatory for this feature. 
11. (Optional) Data Intelligence (DI)
Shared Storage for Logs and Data (Optional for Single-Node Setups)
- Configure a shared mount (EFS/NFS) on the Docker-installed machine to store DI logs and data efficiently.
Logstash Configuration
- Install and configure Logstash (version 6.8.23) on the chosen node. 
- Ensure Logstash is integrated with Elasticsearch. 
- Configure Logstash to process logs from the DI log directory. 
- Logstash should run on the same node where DI logs are stored. 
Access Control for Gathr User
- Ensure Gathr user has read access to DI Docker logs. 
- If access is missing, set up ACL permissions. 
Load Balancer for Multi-DI Deployments (Optional)
- Configure a private load balancer for multi-DI Docker deployments. 
- Ensure it listens to the port specified in the Gathr configuration. 
AWS CloudWatch Integration (For AWS Deployments Only)
- Configure AWS CloudWatch and CloudWatch Agent for log monitoring.
12. (Optional) Mlflow Configuration
- A private Docker registry is required. 
- The Gathr team provides Mlflow images. Load these into your repository. 
- K8s Cluster should be up and running. 
Artifact Storage Configuration
- MLflow generates artifacts when a model is registered, which can be stored in: - S3 or Ceph 
- NFS 
 
If using S3 or Ceph, ensure you have:
- S3 Access Key 
- S3 Secret Key 
- S3 Endpoint URL 
If using NFS, ensure you have the PVC (Persistent Volume Claim) name.
(Optional) Private Docker Registry Access
- The machine should have private Docker registry access.
Steps to Install Gathr Using Playbook
1. Download the Playbook Bundle
Download the playbook bundle shared by the Gathr team on the Ansible server.
2. Extract the Bundle
# tar -xzf GathrBundle.tar.gz
3. Navigate to the Playbook Directory
# cd /path/to/playbook
4. (Optional) Add Host Entries Inside the Gathr Container
Create a file named hosts in the packages folder.
Example:
# vim packages/hosts
Add the following entries:
10.0.0.1 gathr-node1
10.0.0.2 gathr-node2
Save the file.
5. (Optional) Copy HAProxy PEM File
Copy the haproxy.pem file inside the packages folder.
This file is required to enable SSL on HAProxy.
6. Update the Properties in saxconfig_parameters 
A sample saxconfig_parameters file is provided with useful comments.
Update it with the appropriate values.
7. Reload Ansible Variables
Run the following command to reload Ansible variables:
# ./config.sh saxconfig_parameters
8. Run the Playbook
Execute the following command to deploy Gathr:
# ansible-playbook -i hosts gathr_one.yaml -v
Post Deployment Validation
- Access the Gathr UI on – https://<haproxy_hostname>:8090/ - After Gathr is up, you will see a license agreement page:  - Click on the “I accept” check box and click on “Accept” button. 
- An Upload license page will appear:  - Upload the valid Gathr license and click on “confirm”.  
- Click “continue” on the welcome page:  - Login Page will appear:  - You can now login with the default superuser creds: - email - your email address password – your password 
- Gathr is deployed successfully.  
If you have any feedback on Gathr documentation, please email us!