Standardized test checklist for health endpoints, Prometheus metrics, and structured logging. Every deployable component must pass these baseline observability tests.
Philosophy: If it runs, it should emit metrics. If it's important, it should have alerts. Tests must verify this is actually happening.
packagereconciler_testimport("testing""github.com/prometheus/client_golang/prometheus/testutil""gitlab.bluewillows.net/root/dnsweaver/internal/metrics")funcTestMetrics_ReconcileCounter(t*testing.T){// Reset metrics before testmetrics.ReconcileTotal.Reset()// ... run reconcile ...count:=testutil.ToFloat64(metrics.ReconcileTotal)ifcount!=1{t.Errorf("ReconcileTotal = %v, want 1",count)}}funcTestMetrics_DurationObserved(t*testing.T){metrics.ReconcileDuration.Reset()// ... run reconcile ...count:=testutil.CollectAndCount(metrics.ReconcileDuration)ifcount==0{t.Error("ReconcileDuration not observed")}}
All dnsweaver metrics follow Prometheus naming conventions:
Pattern
Example
Type
dnsweaver_<noun>_total
dnsweaver_reconcile_total
Counter
dnsweaver_<noun>_<unit>
dnsweaver_reconcile_duration_seconds
Histogram
dnsweaver_<noun>
dnsweaver_active_hostnames
Gauge
Labels should be bounded:
- provider — instance name (bounded by config)
- action — create, delete, update, skip (bounded enum)
- status — success, error (bounded enum)
- Never use hostnames as label values (unbounded cardinality)