October 27, 2023

Reading Time:

Share

Validate Prometheus Alert rules and config using the promtool

Share

Prometheus is a powerful, widely used open source monitoring tool. Alerting is one of the most critical features of Prometheus that allows users to define alerts based on predefined rules and thresholds. One of the key benefits of alerting in Prometheus is its flexibility and granularity. Users can define alerts at different levels of specificity, from individual hosts or applications to entire clusters or environments. This allows users to tailor their alerting strategies to their specific needs and requirements while ensuring they are notified only when relevant issues arise.

In typical production environments where hundreds of alerts are configured, it can quickly become difficult to ensure that changes to alert rules are valid. Invalid changes usually result in Prometheus rejecting all of the defined alert rules. It is highly recommended to validate alert rules before they are applied to Prometheus. This blog post explores a simple solution to quickly validate Prometheus alert config changes. A handy tool that can be used to validate alert rules is “promtool”, which is included in the standard Prometheus package.

Prometheus alerts are written in YAML. Here is an example of a Prometheus alert rule that is set to trigger when the free disk space on an instance falls below 10%:

- alert: HostOutOfDiskSpace

  annotations:
    description: Disk is almost full (< 10% left)  VALUE = {{ $value }}
    summary: Host out of disk space (instance {{ $labels.instance }})
    expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10
    and ON (instance, device, mountpoint) node_filesystem_readonly == 0
  for: 2m
  labels:
    severity: warning

Installing Promtool

Promtool can be installed on a Linux machine with the following steps:

curl -L https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-386.tar.gz -o prometheus-2.42.0.linux-386.tar.gz
tar xvf prometheus-2.42.0.linux-386.tar.gz
cd prometheus-2.42.0.linux-386/
./promtool

Validating a Rules File Using Promtool

Assuming the Prometheus rules are located in a rules.yaml file, run the following command to validate it:

$ ./promtool check rules alerts.yaml
Checking hosts-alerts.yaml
SUCCESS: 29 rules found

The tool displays a descriptive error message when there are errors in the rules file:

$ ./promtool check rules hosts-alerts.yaml
Checking hosts-alerts.yaml
  FAILED:
hosts-alerts.yaml: 290:13: group "node-exporter", rule 29, "HostClockNotSynchronising": could not parse expression: 1:86: parse error: unclosed left parenthesis

Promtool offers several other handy features as well. The “help” subcommand, for example, provides the following details:

$ ./promtool help
usage: promtool [<flags>] <command> [<args> ...]

Tooling for the Prometheus monitoring system.

Flags:
  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).
      --version              Show application version.
      --enable-feature= ...  Comma separated feature names to enable (only PromQL related and no-default-scrape-port). See
                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and more details.

Commands:
  help [<command>...]
    Show help.

  check service-discovery [<flags>] <config-file> <job>
    Perform service discovery for the given job name and report the results, including relabeling.

  check config [<flags>] <config-files>...
    Check if the config files are valid or not.

  check web-config <web-config-files>...
    Check if the web config files are valid or not.

  check rules [<flags>] <rule-files>...
    Check if the rule files are valid or not.

  check metrics
    Pass Prometheus metrics over stdin to lint them for consistency and correctness.

    examples:

    $ cat metrics.prom | promtool check metrics

    $ curl -s http://localhost:9090/metrics | promtool check metrics

  query instant [<flags>] <server> <expr>
    Run instant query.

  query range [<flags>] <server> <expr>
    Run range query.

  query series --match=MATCH [<flags>] <server>
    Run series query.

  query labels [<flags>] <server> <name>
    Run labels query.

  debug pprof <server>
    Fetch profiling debug information.

  debug metrics <server>
    Fetch metrics debug information.

  debug all <server>
    Fetch all debug information.

  test rules <test-rule-file>...
    Unit tests for rules.

  tsdb bench write [<flags>] [<file>]
    Run a write performance benchmark.

  tsdb analyze [<flags>] [<db path>] [<block id>]
    Analyze churn, label pair cardinality and compaction efficiency.

  tsdb list [<flags>] [<db path>]
    List tsdb blocks.

  tsdb dump [<flags>] [<db path>]
    Dump samples from a TSDB.

  tsdb create-blocks-from openmetrics <input file> [<output directory>]
    Import samples from OpenMetrics input and produce TSDB blocks. Please refer to the storage docs for more details.

  tsdb create-blocks-from rules --start=START [<flags>] <rule-files>...
    Create blocks of data for new recording rules<br>

We at OpsVerse use Prometheus extensively in ObserveNow, an open source-based observability tool built for enterprises. Learn more about ObserveNow here and take it for a free 14-day ride.

Share

Written by Arul Jegadish Francis

Subscribe to the OpsVerse blog

New posts straight to your inbox