Programming for beginners: Match on specific labels using on keyword in Prometheus

When you use binary operators in PromQL like +, -, *, /, and, or, Prometheus matches time series based on all their labels by default.

But sometimes, you only want to match series based on specific labels and ignore all the others.

That’s where the on keyword comes in.

The on(...) keyword restricts the matching to only the labels you specify.

Example

node_cpu_seconds_total{mode="idle"} 
and on(cpu) 
node_cpu_seconds_total{mode="system"}

Here,

· I am using the and operator to find time series that exist in both sets.

· Normally, Prometheus would try to match all labels, including mode, instance, etc.

· But with on(cpu), it only uses the cpu label to match.

· So this query matches, all idle and system metrics that share the same cpu, even if they have different mode or instance.

Why is on useful?

Because sometimes:

· The labels differ, but you still want to match.

· You want precise control over which labels are used for matching.

· You want to avoid unexpected mismatches due to extra labels like mode, instance, or job.

Understanding Prometheus Aggregation Operators

When you query metrics in Prometheus, you often get a bunch of time series, sometimes too many. Aggregation operators helps you to combine, group, or filter these metrics into something more useful.

For example,

sum(prometheus_http_requests_total)

This gives you the total number of HTTP requests across all codes, methods, or other labels.

Following table summarizes the aggregation operators supported by Prometheus.

Operator	Description
sum	Adds up values
min	Gets the smallest value
max	Gets the largest value
avg	Calculates the average
count	Counts how many values there are
stddev	Measures how much the values vary
stdvar	Measures how spread out the values are (variance)
group	Groups values together and sets them to 1
count_values("label")	Counts how many times each value appears and labels it
topk(n, ...)	Gets the top N biggest values
bottomk(n, ...)	Gets the bottom N smallest values
quantile(φ, ...)	Gets the percentile (like median or 95th)
limitk(n, ...)	Gets N random samples (experimental)
limit_ratio(r, ...)	Gets a portion of samples based on a ratio (experimental)

Examples

Example 1: Sum of all requests

sum(prometheus_http_requests_total)

Example 2: Sum of all requests grouped by response code

sum(prometheus_http_requests_total) by (code)

Example 3: Top 2 HTTP codes by request count

topk(2, sum(prometheus_http_requests_total) by (code))

Example 4: Bottom 2 HTTP codes by request count

bottomk(2, sum(prometheus_http_requests_total) by (code))

Example 5: CPU usage grouped by mode

sum(node_cpu_seconds_total) by (mode)

Example 6: Count elements in the vector.

count(node_cpu_seconds_total)

In summary, Aggregation operators help you get a sense of large amounts of data, turn chaos into clarity, and build dashboards that actually tell a story. Start with simple aggregations. Try sum and count. Then slowly explore topk, quantile, and the rest as your needs grow.

References

https://prometheus.io/docs/prometheus/latest/querying/operators/

Previous Next Home

Programming for beginners

Wednesday, 4 June 2025

Match on specific labels using on keyword in Prometheus

No comments:

Post a Comment