When you create alerts in Prometheus, you don’t always want them to fire immediately. Sometimes issues are just temporary, like a short network glitch or a momentary CPU spike. This is where the for clause helps.
In this post, you’ll learn what the for clause does in alerting rules, why it’s important, and how to use it correctly with an example.
1. Why Not Fire Alerts Instantly?
Let’s say you're monitoring a service like node_exporter to check if it's running or not. If Prometheus checks once and sees it down, should it immediately fire an alert?
Probably not. The issue could be a temporary network glitch, a slow response, or a one-time blip.
If you alert too quickly, you might end up with a lot of false alarms, which can lead to alert fatigue and people ignoring important notifications.
2. What is the for Clause?
Using ‘for’ clause, we can only fire this alert if the condition stays true for a specific amount of time.
So, Prometheus will wait and watch the condition. If the problem goes away before the time is up, no alert is fired, and the time is reset.
3. Example Alert Rule Using for
alert_using_for.yaml
groups: - name: example_alerts rules: - alert: NodeExporterDown expr: up{job="node_exporter"} == 0 for: 2m labels: severity: critical annotations: summary: "Node Exporter is down" description: "No data received from node_exporter for more than 1 minute"
The rule checks if node_exporter is down. Since we set for to 2m, Prometheus wait for 2 minutes before firing the alert. So Prometheus must see the service down for 2 minutes continuously before sending a notification.
prometheus.yaml
global: scrape_interval: 15s evaluation_interval: 15s rule_files: - "alert_using_for.yaml" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node_exporter' static_configs: - targets: ['localhost:9100']
Start Prometheus application by executing below command.
prometheus --config.file=./prometheus.yml --web.enable-lifecycle
Navigate to the Prometheus alerts section (http://localhost:9090/alerts).
As you see the above screen, alert NodeExporterDown is inactive, as the node_exporter is running perfectly fine now.
Let me kill the node_exporter process.
$ps ax | grep node_exporter 24384 s001 S+ 0:00.71 node_exporter 30047 s002 S+ 0:00.00 grep node_exporter $ $kill -9 24384
Wait for 15 seconds and reload the alerts page.
As you see above screenshot, the alert is in PENDING state. PENDING state tells us that the Condition just became true, waiting for the ‘for’ duration before firing.
If the node_exporter is down for 2 minutes continuously, then this laert will get triggered to AlertManager.
You can observe that the NodeExporterDown alert will move to FIRING state after 2 minutes.
Let me start the node_exporter service again.
$node_exporter time=2025-04-14T10:16:49.293Z level=INFO source=node_exporter.go:216 msg="Starting node_exporter" version="(version=1.9.1, branch=, revision=unknown)" time=2025-04-14T10:16:49.293Z level=INFO source=node_exporter.go:217 msg="Build context" build_context="(go=go1.24.1, platform=darwin/arm64, user=Homebrew, date=, tags=unknown)" time=2025-04-14T10:16:49.293Z level=INFO source=filesystem_common.go:265 msg="Parsed flag --collector.filesystem.mount-points-exclude" collector=filesystem flag=^/(dev)($|/) time=2025-04-14T10:16:49.293Z level=INFO source=filesystem_common.go:294 msg="Parsed flag --collector.filesystem.fs-types-exclude" collector=filesystem flag=^devfs$ time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:135 msg="Enabled collectors" time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=boottime time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=cpu time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=diskstats time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=filesystem time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=loadavg time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=meminfo time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=netdev time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=os time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=powersupplyclass time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=textfile time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=thermal time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=time time=2025-04-14T10:16:49.294Z level=INFO source=node_exporter.go:141 msg=uname time=2025-04-14T10:16:49.296Z level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9100 time=2025-04-14T10:16:49.296Z level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9100
Wait for 15 seconds (until next evaluation_interval) and reload alerts page. You can observe that the alert is in INACTIVE state.
Previous Next Home
No comments:
Post a Comment