Enable HA on Gathr Webstudio

This document guides the procedure to setup high availability of Gathr tomcat.

When HA is enabled on Gathr, it provides surety on the availability of Gathr Webstudio. It is therefore recommended that Gathr Webstudio should be HA enabled for better availability.

This topic describes the procedure to setup High Availability (HA) on Gathr Webstudio.

Prerequisites

Minimum Two nodes are required for this deployment. Before start deploy Gathr we need to check Gathr pre-requisites.
Load balancer requires preferable haproxy.
NFS mount point or common shared location should be accessible between the two Gathr Webstudio machines.
Haproxy configurations has to be done beforehand. (Refer point 13 for haproxy configurations)
Tomcat should be deployed on the first machine. To know more, see Embedded Gathr →

Steps to Enable HA

In the config.properties - set deployment.mode as Cluster instead of standalone on the first machine.

Update below config parameters in env-config.yaml

sax.web.url : http://<HA proxy host>:8090/Gathr
sax.ui.host: <HA-proxy host>

Take backup of existing Gathr folder.
Create NFS mount point directory.
Create one common folder e.g. “gathrfiles”
Move the below folders from Gathr installation dir to NFS mount point from first machine:

lib
conf
pipelineData
uploadjar
workflowData
pythonVirtualEnvironments
work
workflowData
udfjar
external (If using external templates)

Copy the Gathr installation folder from first machine to other machines, folder structure for Gathr should be the same.
Create softlink from mountpoint to Gathr installation directory (on both machines).

E.g. run the command from Gathr installation directory

```
ln -s <mountdir>/workflowData workflowData
ln -s <mountdir>/lib lib
ln -s <mountdir>/conf conf
ln -s <mountdir>/pipelinedata pipelinedata
ln -s <mountdir>/uploadjar uploadjar
ln -s <mountdir>/pythonVirtualEnvironments pythonVirtualEnvironments
ln -s <mountdir>/work work
ln -s <mountdir>/udfjar udfjar
ln -s <mountdir>/external external


```

Start Gathr on first node with config.reload=true , if it is up and running and able to access URL then start Gathr on other nodes without config.reload=true.
Both URL should be accessible separately now:

If you create pipeline – same should be available on the other Gathr as well.
You should be able to inspect pipeline on both the Gathr.

Both URLs should be accessible separately now:

Validation:

If you create a pipeline, then the same should be available on the other tomcat.
You should be able to inspect pipelines on both the tomcats.

Make the changes below in haproxy.cfg file.

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2 debug

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    0
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          0
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend web_server
   bind 0.0.0.0:8090
   mode http
   #http-request redirect scheme https code 301 unless { ssl_fc }
   stats enable
   stats uri /haproxy?stats
   stats realm Strictly\ Private
   option http-server-close
   option forwardfor

   stats auth admin:admin



   option httplog
   option logasap

frontend web_server2
   bind 0.0.0.0:9595
   #log 127.0.0.1:514 local0 debug
   mode http
   #http-request redirect scheme https code 301 unless { ssl_fc }
   stats enable
   stats uri /haproxy?stats
   stats realm Strictly\ Private
   option http-server-close
   option forwardfor
   use_backend web_server2


backend web_server
  mode http
  balance roundrobin
  cookie WEBSTUDIOID insert indirect nocache
  server g1 10.80.72.187:8090 cookie g1
  server g2 10.80.72.204:8090 cookie g2


backend web_server2
   mode http

   cookie WEBSTUDIOID insert indirect nocache
   server g1 10.80.72.187:9595 cookie g1
   server g2 10.80.72.204:9595 cookie g2

Restart haproxy server then open Gathr with haproxy url.

If you have any feedback on Gathr documentation, please email us!

Enable HA on Gathr Webstudio

Prerequisites #

Steps to Enable HA #

Prerequisites

Steps to Enable HA