Prometheus Signals Guide¶
Prometheus signals let you add application-level checks to idle detection and scale-down guards. Instead of relying on CPU alone, you can query actual business metrics.
How Prometheus Signals Work¶
Hybernate sends an instant query to the Prometheus API (/api/v1/query) with a 5-second timeout.
- Non-zero, non-empty result: signal confirms the action
- Zero value or empty result: signal denies the action
The key insight: write your PromQL so that a non-zero result means "yes, proceed with this action."
Configuration¶
Prometheus signals appear in two places:
Idle Detection Signals¶
| managedworkload.yaml | |
|---|---|
The query above returns 1 (non-zero) when the request rate is zero, confirming idle.
Scale-Down Guards¶
| managedworkload.yaml | |
|---|---|
The query returns 1 when active connections are below 10, confirming it's safe to scale down.
Prometheus Endpoint¶
The operator needs to know where your Prometheus instance lives. This is configured via the PROMETHEUS_ENDPOINT environment variable on the operator deployment:
| deployment.yaml | |
|---|---|
Example Signals¶
Idle Detection¶
| Use Case | PromQL | Logic |
|---|---|---|
| No HTTP traffic | rate(http_requests_total{service="api"}[10m]) == 0 |
Confirms idle when request rate is zero |
| No WebSocket connections | sum(websocket_connections{service="api"}) == 0 |
Confirms idle when no connections |
| No queue messages | sum(rabbitmq_queue_messages{queue="tasks"}) == 0 |
Confirms idle when queue is empty |
| No active sessions | sum(active_sessions{app="dashboard"}) == 0 |
Confirms idle when no users |
| No recent logins | increase(login_total{app="auth"}[1h]) == 0 |
Confirms idle when no logins in the last hour |
| No Kafka consumer lag | sum(kafka_consumer_lag{group="worker"}) == 0 |
Confirms idle when consumer is caught up |
Scale-Down Guards¶
| Use Case | PromQL | Logic |
|---|---|---|
| Low connection count | sum(active_connections{service="api"}) < 10 |
Safe to scale when connections are low |
| No in-flight jobs | sum(in_flight_jobs{service="worker"}) == 0 |
Safe when all jobs are complete |
| Low memory pressure | container_memory_working_set_bytes{pod=~"api-.*"} / container_spec_memory_limit_bytes{pod=~"api-.*"} < 0.7 |
Safe when memory is below 70% |
| Low request latency | histogram_quantile(0.99, rate(request_duration_seconds_bucket{service="api"}[5m])) < 0.5 |
Safe when p99 latency is under 500ms |
Writing Good Signals¶
Use comparison operators for idle signals¶
Idle signals should evaluate to a boolean. Use == 0 for "is this metric absent":
| managedworkload.yaml | |
|---|---|
Use appropriate time ranges¶
Too short: noisy, may trigger on brief pauses between requests. Too long: slow to react, may delay idle detection.
| managedworkload.yaml | |
|---|---|
Filter by the specific workload¶
Include labels that identify your workload to avoid matching other services:
| managedworkload.yaml | |
|---|---|
Debugging Signals¶
Test your PromQL query directly against Prometheus first:
curl -s 'http://prometheus:9090/api/v1/query?query=rate(http_requests_total{service="api"}[10m])==0'
Check operator events for signal evaluation results:
Events will show which signals confirmed or denied, helping you tune your queries.