Deploy Gathr Using Ansible Playbook

Deploy Gathr and its essential components, such as Elasticsearch, RabbitMQ, Zookeeper, PostgreSQL, Spark Standalone, and HDFS/YARN, using Ansible Playbook.


Prerequisites

Ensure the following prerequisites to successfully set up and deploy these components.

1. Ansible Server

  • Ansible should be installed on the server from where you will run the playbook.

  • Password less SSH must be enabled from the Ansible server to the Gathr and HAProxy servers.


2. System Requirements for Gathr

  • Operating System: RHEL 9 / OEL 9 is required.

  • High Availability (HA) Deployment: HAProxy and Gathr must run on separate machines.

  • Hardware Specifications:

    ServiceCPURAMOptional
    Zookeeper2 CPU4 GBNo
    Postgres2 CPU4 GBNo
    Gathr8 CPU32 GBNo
    Elasticsearch2 CPU4 GBYes
    RabbitMQ1 CPU2 GBYes
    Spark StandaloneCustomCustomYes
    HDFS & YARNCustomCustomYes
  • Software Dependencies:

    • Python 3.9 must be available (preinstalled on RHEL 9 / OEL 9).

    • SELinux must be disabled on the servers.


3. Shared Location

  • Create a shared path accessible by both Gathr and Spark/YARN nodes.

  • For Gathr Analytics Deployment, ensure the same path is available on Analytics Nodes.


4. HAProxy Server

  • Ensure the HAProxy Server is running with SSL for Analytics Deployment. Non-SSL is acceptable if not using Analytics.

  • A valid .pem file is required for SSL on HAProxy.


5. Port Availability

Ensure the following ports are available on the servers where the services are deployed:

ServicePorts (Default)OptionalNotes
Zookeeper2181, 2888, 3888, 8081NoPorts can be modified before running the playbook.
Postgres5432NoPort can be modified before running the playbook.
Gathr8090, 9595NoPorts can be modified before running the playbook.
Elasticsearch9200, 9300YesPorts can be modified before running the playbook.
RabbitMQ5672, 15672YesPorts can be modified before running the playbook.
Spark Standalone7077, 8080, 8081, 6066, 18080YesPorts can be modified before running the playbook.
HDFS & YARN (Non-HA)8020, 9870, 9864, 9866, 9867, 8141, 8088, 8030, 8025, 8050, 8042, 8040, 19888, 10020, 10033YesPorts are non-configurable.
HDFS & YARN (HA)8485, 8480, 8020, 50070, 8019, 9866, 9867, 50075, 8141, 8088, 8030, 8025, 8050, 8042, 8040, 19888, 10020, 10033YesPorts are non-configurable.

6. Application User and Group

  • Create an application user on the servers with password-based authentication.

7. (Optional) OpenAI API Key Requirement

  • An OpenAI API key is necessary for the Gathr IQ features.

  • Internet access is needed for this feature to work.


8. (Optional) Data Intelligence

  • Shared Storage for Logs and Data (Optional for single-node setups)

    • Configure a shared mount (EFS/NFS) on the Docker-installed machine for Data Intelligence logs and data.
  • Logstash Configuration

    • Install and configure Logstash (version 6.8.23) on the chosen node, ensuring it is seamlessly integrated with the designated Elasticsearch instance.

    • Additionally, configure Logstash to process logs from the DI log directory for efficient log management.

    • Logstash service should run on the same node where DI application logs are stored.

  • Access Control for Gathr User

    • Ensure Gathr users have access to Data Intelligence Docker logs.
    • Set up ACL permissions if necessary.
  • Load Balancer for Multi-DI Deployments (Optional for single node DI Docker)

    • Configure a private load balancer for multi-DI Docker deployments.
  • AWS CloudWatch Integration (Applicable for AWS Deployments Only)

    • Configure AWS CloudWatch and Agent for log monitoring.

9. (Optional) For MLflow

  • Ensure a private docker registry is available for MLflow images.

  • Kubernetes Cluster should be operational.

  • Artifact Storage Configuration:

    • MLFlow generates artifacts when a model is registered, which can be stored in either S3 or NFS.

    • If using S3 or Ceph for artifact storage, ensure you have the following credentials:

      • S3 Access Key

      • S3 Secret Key

      • S3 Endpoint URL

  • If using NFS, make sure you have the PVC (Persistent Volume Claim) name that the Gathr pod is using to share the path between the Gathr and MLFlow pods.

  • (Optional) Private Docker Registry Access should be available to pull from the private docker repository.


Steps to Install Gathr and Supported Components

  1. Download the playbook bundle shared by the Gathr team.

  2. Extract the bundle using the following command:

    tar -xzf GathrBundle.tar.gz
    
  3. Navigate to the Playbook Path:

    cd /path/to/playbook
    
  4. Open the saxconfig_parameters file with a text editor. Refer comments in the file to modify the configuration properties as required for your environment.

  5. Execute the following command to reload the Ansible variables:

    ./config.sh
    
  6. Execute the playbook with the following command:

    export ANSIBLE_DISPLAY_SKIPPED_HOSTS=false
    ansible-playbook -i hosts installGathrStack.yml -vv
    

Post Deployment Validation

  1. Navigate to the Gathr UI using the URL http://<GATHR_HOST>:8090/. When the interface loads, a license agreement page will display:

    License Agreement

    Check the “I accept” box and click the “Accept” button.

  2. The Upload License page will then be visible:

    Upload License

    Upload your valid Gathr license and click “Confirm”.

    Confirm License

  3. Proceed by clicking “Continue” on the welcome page:

    Welcome Page

    This will lead you to the Login Page:

    Login Page

    Log in using the default superuser credentials:

    • Email: super@mailinator.com

    • Password: superuser

  4. The deployment of Gathr is now complete.

    Deployment Success

Top