Gathr Metrics

Set up Gathr Metrics monitoring and alerting by installing monitoring tools using automated script and ansible playbook.

Installation of Prometheus, Grafana, alertmanager, NodeExporter, BlackBox exporter, Postgres Exporter, Promtail, and Loki are supported.


Gathr Metrics Monitoring Setup Prerequisites

  • Ansible should be installed on any server.

  • One service user is required for SSH connectivity from the Ansible server to the machine where we need to install monitoring services.


Installation of Software Required for Metrics & Alerting

  1. Download the setup bundle “monitoring.tar.gz” provided by Gathr team.

  2. Untar it using below command:

tar -zxvf monitoring.tar.gz
  1. Go inside the untared directory “monitoring” using below command:
cd monitoring
  1. Open “user_input” file in a text editor and update install flags for the services you want to install.

Content of user_input file:

# User Input File

## install: Specify whether the services required or not to install.
## bundle_tar_path: Specify the path from where installer bundle needs to be fetched, you can specify absolute file path (/some/path) OR http download URL (https//some-url)

services:
  prometheus:
    install: "yes"
    tar_bundle_path: "./roles/prometheus/files/prometheus-2.50.0-rc.1.linux-amd64.tar.gz"

  grafana:
    install: "yes"
    tar_bundle_path: "./roles/grafana/files/grafana-v10.3.3.linux-amd64.tar.gz"

  alertmanager:
    install: "yes"
    tar_bundle_path: "./roles/alertmanager/files/alertmanager-0.27.0-rc.0.linux-amd64.tar.gz"

  blackbox:
    install: "yes"
    tar_bundle_path: "./roles/blackbox_exporter/files/blackbox_exporter-0.24.0.linux-amd64.tar.gz"

  nodeexporter:
    install: "yes"
    tar_bundle_path: "./roles/node_exporter/node_exporter-1.7.0.linux-amd64.tar.gz"

  postgresexporter:
    install: "yes"
    tar_bundle_path: "./roles/postgres_exporter/files/postgres_exporter_v0.5.1_linux-amd64.tar.gz"

  promtail:
    install: "yes" 
    tar_bundle_path: "./roles/promtail/files/blackbox_exporter-0.24.0.linux-amd64.tar.gz"

  loki:
    install: "yes" 
    tar_bundle_path: "/home/sax/monitoring/roles/loki/files/loki-linux-amd64.tar.gz"
  • For the install flag, setting it to “yes” signifies that the service will be installed, while setting it to “no” indicates that the service installation will be skipped.

  • If you want to install versions other than the default ones provided, you can modify the tar_bundle_path.

  1. Run the deployment script “run.sh” using below command:
./run.sh

It will ask for below prompts based on the services considered for installation:

i. Change Default Ports for the Services

Do you want to change default ports for Prometheus, Node Exporter, Alert Manager, Grafana, Blackbox Exporter, Postgres Exporter or Loki (y/n)?:

If set to n:

Select n if you want to proceed with default ports.

If set to y:

Select y if you want to change the port configurations.

=> Enter port for Prometheus (default: 9120): 
=> Enter port for Node Exporter (default: 9125): 
=> Enter port for Alert Manager (default: 9093): 
=> Enter port for Grafana (default: 3000): 
=> Enter port for Blackbox (default: 9115): 
=> Enter port for Postgres Exporter (default: 9187): 
=> Enter port for Loki (default: 3100): 
=> Enter port for Promtail (default: 9080):

ii. Change SSH User for Ansible Connection

Do you want to change ssh user for ansible connection (y/n)? The user must exist on all systems in inventory:

If set to n:

Select n if you want to proceed with default ssh user (i.e., root).

If set to y:

Select y if you want to change the ssh user.

=> Enter SSH username (default: root):

Iii. Change Inventory File

Do you want to change inventory file (y/n)?

If set to n:

Select n if you want to proceed with default inventory (i.e., inventory file).

If set to y:

Select y then inventory file will open, modify it as per your target IP/hostnames for installation.

Sample inventory file:

[prometheus]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[node_exporter]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[alertmanager]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[grafana]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[blackbox_exporter]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[postgres_exporter]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[loki]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"

Replace ‘IP ADDRESS’ with your actual IP.

iv. Change Alertmanager Email Variables File

Do you want to change alertmanager email variables file (y/n)?:

If set to n:

Select n if you want to proceed with default alertmanager email variables file (i.e., ./roles/alertmanager/vars/main.yml).

If set to y:

Select y then alertmanager var file will open. Change below 4 values as per your SMTP configs.

Below are the sample values, change it accordingly.

  • to_email: “receiver@org.com”

  • from_email: “sender@org.com”

  • smtp_server: “smtpserveraddress:587”

  • auth_pass: “changeit”

v. Change Postgres Env File

Do you want to change Postgres env file (y/n)?:

If set to n:

Select n if you want to proceed with default Postgres env file (i.e., ./roles/postgres_exporter/vars/main.yml).

If set to y:

Put y if you want to change Postgres env file. Below are the sample values, change it accordingly.

DATA_SOURCE_NAME="postgresql://username:password@<IP>:<Port>/?sslmode=disableā€


After the above 5 inputs, the playbook will start executing for all the services that you want to install as per the flags in user_input file.

Once playbook is completed, check status of the installed services using below commands:

sudo systemctl status prometheus
sudo systemctl status node_exporter
sudo systemctl status blackbox
sudo systemctl status alertmanager
sudo systemctl status postgres_exporter
sudo systemctl status loki
sudo systemctl status promtail
sudo systemctl status grafana

Update Spark Static Configurations

After successful installation, in Prometheus, update the file /etc/prometheus/prometheus.conf using below Spark static configurations and update host and port accordingly.

- job_name: spark-driver

    metrics_path: /metrics/prometheus
    static_configs:

- targets: ["localhost:4040"]

- job_name: spark-executor

    metrics_path: /metrics/executors/prometheus/
    static_configs:

     - targets: ["localhost:4040"]

- job_name: spark-master

    metrics_path: /metrics/master/prometheus
    static_configs:

     - targets: ["localhost:8081"]

- job_name: spark-apps

    metrics_path: /metrics/applications/prometheus/
    static_configs:

     - targets: ["localhost:8081"]

- job_name: spark-worker

    metrics_path: /metrics/prometheus
    static_configs:

     - targets: ["localhost:8082"]

If Spark is SSL enabled, add below section into prometheus.conf file:

Supply SSL configurations parallel to “targets” key under each job name for respective SSL-enabled jobs:

scheme : 'https'

tls_config:

ca_file: '/etc/ssl/certs/sa-certs/gathr_impetus_com.pem' # Path to the CA certificate used by Spark Metrics

cert_file: '/etc/ssl/certs/sa-certs/my_key_store.crt' # Path to the client certificate (if required)

key_file: '/etc/ssl/certs/sa-certs/my_store.key' # Path to the client private key (if required)

Restart Prometheus

After updating all the configuration properties, restart Prometheus service using below command:

sudo systemctl restart prometheus

Once it is up, all alerts from Gathr will start coming to Prometheus.


Superuser Role Configuration Settings

In Gathr, set the Prometheus configuration in Superuser role configurations.

By providing these details in Superuser Main Menu > Configurations > Others section > Prometheus tab:

Enable Spark Metrics: Enable to see spark cluster metrics-based visualizations on Grafana dashboards.

Grafana URL: Grafana dashboard URL, where the defined role access user will be able to view the metrics-based visualizations.

Top