Prometheus
22 Apr 2018Prometheus
- for live reload either enable --web.enable-lifecyclewhen starting up and thencurl -X POST localhost:9090/-/reloador
- kill -1 $prometheus_PIDsend SIGHUP
Configuration
- relabel instances (aka remove ports) - in main Prometheus YAML file
scrape_configs:
    - job_name: 'prometheus'
      static_configs:
          - targets: ['host:port']
      relabel_configs:
          - source_labels: [__address__]
            target_label:  instance
            regex: ^(localhost):9100*
            replacement: mac-${1}
          - source_labels: [__address__]
            target_label:  instance
            regex: ^.*:9091
            replacement: pushgw
Alerting (for version 2.x)
- Alert rules configuration - in main Prometheus YAML file
rule_files:
    - "rules/test.rules"
- the test.rulesfile
groups:
- name: host
  rules:
  - alert: idle_below_30pct
    expr:  (100 * (1 - avg by(job)(irate(node_cpu{mode='idle'}[1m])))) < 30
    annotations:
      summary: "Instance  CPU usage is dangerously high"
      description: " is using a LOT of CPU. CPU usage is %."
    labels:
      severity: warning
  - alert: low_disk_space_root
    expr: node_filesystem_avail{mountpoint="/"} < 50_000_000_000
    annotations:
        summary: "Instance  disk  is low on space"
        description: "Description disk  is at B"
    labels:
        group: storage
        severity: critical
- Alert configuration in Alertmanager
route:
    receiver: default
    routes:
        - match:
            alertname: ALL
          repeat_interval: 1m
          receiver: all
receivers:
- name: default
- name: ALL