Gathr Metrics
Set up Gathr Metrics monitoring and alerting by installing monitoring tools using automated script and ansible playbook.
Installation of Prometheus, Grafana, alertmanager, NodeExporter, BlackBox exporter, Postgres Exporter, Promtail, and Loki are supported.
Gathr Metrics Monitoring Setup Prerequisites
Ansible should be installed on any server.
One service user is required for SSH connectivity from the Ansible server to the machine where we need to install monitoring services.
Installation of Software Required for Metrics & Alerting
Download the setup bundle “monitoring.tar.gz” provided by Gathr team.
Untar it using below command:
tar -zxvf monitoring.tar.gz
- Go inside the untared directory “monitoring” using below command:
cd monitoring
- Open “user_input” file in a text editor and update install flags for the services you want to install.
Content of user_input file:
# User Input File
## install: Specify whether the services required or not to install.
## bundle_tar_path: Specify the path from where installer bundle needs to be fetched, you can specify absolute file path (/some/path) OR http download URL (https//some-url)
services:
prometheus:
install: "yes"
tar_bundle_path: "./roles/prometheus/files/prometheus-2.50.0-rc.1.linux-amd64.tar.gz"
grafana:
install: "yes"
tar_bundle_path: "./roles/grafana/files/grafana-v10.3.3.linux-amd64.tar.gz"
alertmanager:
install: "yes"
tar_bundle_path: "./roles/alertmanager/files/alertmanager-0.27.0-rc.0.linux-amd64.tar.gz"
blackbox:
install: "yes"
tar_bundle_path: "./roles/blackbox_exporter/files/blackbox_exporter-0.24.0.linux-amd64.tar.gz"
nodeexporter:
install: "yes"
tar_bundle_path: "./roles/node_exporter/node_exporter-1.7.0.linux-amd64.tar.gz"
postgresexporter:
install: "yes"
tar_bundle_path: "./roles/postgres_exporter/files/postgres_exporter_v0.5.1_linux-amd64.tar.gz"
promtail:
install: "yes"
tar_bundle_path: "./roles/promtail/files/blackbox_exporter-0.24.0.linux-amd64.tar.gz"
loki:
install: "yes"
tar_bundle_path: "/home/sax/monitoring/roles/loki/files/loki-linux-amd64.tar.gz"
For the
install
flag, setting it to “yes” signifies that the service will be installed, while setting it to “no” indicates that the service installation will be skipped.If you want to install versions other than the default ones provided, you can modify the
tar_bundle_path
.
- Run the deployment script “run.sh” using below command:
./run.sh
It will ask for below prompts based on the services considered for installation:
i. Change Default Ports for the Services
Do you want to change default ports for Prometheus, Node Exporter, Alert Manager, Grafana, Blackbox Exporter, Postgres Exporter or Loki (y/n)?:
If set to n:
Select n if you want to proceed with default ports.
If set to y:
Select y if you want to change the port configurations.
=> Enter port for Prometheus (default: 9120):
=> Enter port for Node Exporter (default: 9125):
=> Enter port for Alert Manager (default: 9093):
=> Enter port for Grafana (default: 3000):
=> Enter port for Blackbox (default: 9115):
=> Enter port for Postgres Exporter (default: 9187):
=> Enter port for Loki (default: 3100):
=> Enter port for Promtail (default: 9080):
ii. Change SSH User for Ansible Connection
Do you want to change ssh user for ansible connection (y/n)? The user must exist on all systems in inventory:
If set to n:
Select n if you want to proceed with default ssh user (i.e., root).
If set to y:
Select y if you want to change the ssh user.
=> Enter SSH username (default: root):
Iii. Change Inventory File
Do you want to change inventory file (y/n)?
If set to n:
Select n if you want to proceed with default inventory (i.e., inventory file).
If set to y:
Select y then inventory file will open, modify it as per your target IP/hostnames for installation.
Sample inventory file:
[prometheus]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[node_exporter]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[alertmanager]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[grafana]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[blackbox_exporter]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[postgres_exporter]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
[loki]
<IP ADDRESS> ansible_python_interpreter="/usr/bin/python3"
Replace ‘IP ADDRESS’ with your actual IP.
iv. Change Alertmanager Email Variables File
Do you want to change alertmanager email variables file (y/n)?:
If set to n:
Select n if you want to proceed with default alertmanager email variables file (i.e., ./roles/alertmanager/vars/main.yml).
If set to y:
Select y then alertmanager var file will open. Change below 4 values as per your SMTP configs.
Below are the sample values, change it accordingly.
to_email: “receiver@org.com”
from_email: “sender@org.com”
smtp_server: “smtpserveraddress:587”
auth_pass: “changeit”
v. Change Postgres Env File
Do you want to change Postgres env file (y/n)?:
If set to n:
Select n if you want to proceed with default Postgres env file (i.e., ./roles/postgres_exporter/vars/main.yml).
If set to y:
Put y if you want to change Postgres env file. Below are the sample values, change it accordingly.
DATA_SOURCE_NAME="postgresql://username:password@<IP>:<Port>/?sslmode=disableā
After the above 5 inputs, the playbook will start executing for all the services that you want to install as per the flags in user_input file.
Once playbook is completed, check status of the installed services using below commands:
sudo systemctl status prometheus
sudo systemctl status node_exporter
sudo systemctl status blackbox
sudo systemctl status alertmanager
sudo systemctl status postgres_exporter
sudo systemctl status loki
sudo systemctl status promtail
sudo systemctl status grafana
Update Spark Static Configurations
After successful installation, in Prometheus, update the file /etc/prometheus/prometheus.conf using below Spark static configurations and update host and port accordingly.
- job_name: spark-driver
metrics_path: /metrics/prometheus
static_configs:
- targets: ["localhost:4040"]
- job_name: spark-executor
metrics_path: /metrics/executors/prometheus/
static_configs:
- targets: ["localhost:4040"]
- job_name: spark-master
metrics_path: /metrics/master/prometheus
static_configs:
- targets: ["localhost:8081"]
- job_name: spark-apps
metrics_path: /metrics/applications/prometheus/
static_configs:
- targets: ["localhost:8081"]
- job_name: spark-worker
metrics_path: /metrics/prometheus
static_configs:
- targets: ["localhost:8082"]
If Spark is SSL enabled, add below section into prometheus.conf file:
Supply SSL configurations parallel to “targets” key under each job name for respective SSL-enabled jobs:
scheme : 'https'
tls_config:
ca_file: '/etc/ssl/certs/sa-certs/gathr_impetus_com.pem' # Path to the CA certificate used by Spark Metrics
cert_file: '/etc/ssl/certs/sa-certs/my_key_store.crt' # Path to the client certificate (if required)
key_file: '/etc/ssl/certs/sa-certs/my_store.key' # Path to the client private key (if required)
Restart Prometheus
After updating all the configuration properties, restart Prometheus service using below command:
sudo systemctl restart prometheus
Once it is up, all alerts from Gathr will start coming to Prometheus.
Superuser Role Configuration Settings
In Gathr, set the Prometheus configuration in Superuser role configurations.
By providing these details in Superuser Main Menu > Configurations > Others section > Prometheus tab:
Enable Spark Metrics: Enable to see spark cluster metrics-based visualizations on Grafana dashboards.
Grafana URL: Grafana dashboard URL, where the defined role access user will be able to view the metrics-based visualizations.
If you have any feedback on Gathr documentation, please email us!