Prometheus
22 Apr 2018Prometheus
- for live reload either enable
--web.enable-lifecycle
when starting up and thencurl -X POST localhost:9090/-/reload
or kill -1 $prometheus_PID
send SIGHUP
Configuration
- relabel instances (aka remove ports) - in main Prometheus YAML file
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['host:port']
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: ^(localhost):9100*
replacement: mac-${1}
- source_labels: [__address__]
target_label: instance
regex: ^.*:9091
replacement: pushgw
Alerting (for version 2.x)
- Alert rules configuration - in main Prometheus YAML file
rule_files:
- "rules/test.rules"
- the
test.rules
file
groups:
- name: host
rules:
- alert: idle_below_30pct
expr: (100 * (1 - avg by(job)(irate(node_cpu{mode='idle'}[1m])))) < 30
annotations:
summary: "Instance CPU usage is dangerously high"
description: " is using a LOT of CPU. CPU usage is %."
labels:
severity: warning
- alert: low_disk_space_root
expr: node_filesystem_avail{mountpoint="/"} < 50_000_000_000
annotations:
summary: "Instance disk is low on space"
description: "Description disk is at B"
labels:
group: storage
severity: critical
- Alert configuration in Alertmanager
route:
receiver: default
routes:
- match:
alertname: ALL
repeat_interval: 1m
receiver: all
receivers:
- name: default
- name: ALL