Troubleshooting¶
Common Issues¶
Workload is idle but not being paused¶
Check 1: Is dry run enabled?
If true, the operator evaluates but doesn't act. Set to false to enable.
Check 2: Are all signals confirming?
Look at events for signal evaluation results. If any signal denies, idle detection resets.
Check 3: Is the grace period still running?
Check the idle signal metric:
Check 4: Is the prediction engine active?
If dailyPhase is Observing, the engine hasn't collected enough data yet (needs 24+ hours).
Workload keeps cycling between paused and running¶
This usually means idle detection triggers pause, then auto-resume immediately detects "not idle" (because paused workloads have zero CPU, which is below threshold, but the workload has no pods to measure).
Fix: Ensure your Prometheus signals check for actual traffic, not just CPU:
| managedworkload.yaml | |
|---|---|
Prediction confidence stays at 0¶
The confidence scorer needs a full 24-hour window of data before reporting. Wait 24+ hours after creating the ManagedWorkload.
Also check that metrics-server is running and returning data:
Scale-down is blocked¶
Look for:
- "in stabilization window": cooldown from a recent scale event. Wait for the stabilization period to elapse.
- Guard probe denial: a Prometheus guard query returned zero/empty. Check the query against Prometheus directly.
Target not found¶
If you see a Degraded condition with "target not found":
- Verify the target exists:
kubectl get deployment my-api -n staging - Check that
target.kindmatches (Deployment vs StatefulSet) - Ensure the ManagedWorkload is in the same namespace as the target
Duplicate target error¶
Only one ManagedWorkload can manage a given workload. Check for duplicates:
kubectl get managedworkloads -n staging -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.target.name}{"\n"}{end}'
Cost data shows $0.00¶
- Cost accumulation requires metrics-server data. Check
kubectl top pods. - On day 1 of the month,
estimatedMonthlyCostshows "pending" until day 2.
Debug Checklist¶
- Operator running?
kubectl get pods -n hybernate-system - CRDs installed?
kubectl get crd managedworkloads.hybernate.io - Metrics server running?
kubectl top nodes - RBAC correct?
kubectl auth can-i get pods --as=system:serviceaccount:hybernate-system:hybernate-controller-manager - Events?
kubectl describe managedworkload <name> -n <ns> - Logs?
kubectl logs -n hybernate-system deployment/hybernate-controller-manager - Status?
kubectl get managedworkload <name> -n <ns> -o yaml
Getting Help¶
If you've gone through this checklist and still have issues:
- Check the GitHub Issues for known problems
- Open a new issue with:
- Operator version and Kubernetes version
- ManagedWorkload YAML (sanitized)
- Relevant operator logs
- Output of
kubectl describe managedworkload