Monitoring, Drift & Evaluation
Observability Pillars
- Metrics: latency (p50/p95), throughput, errors, GPU/CPU utilization
- Logs: inputs/outputs with policy‑aware redaction, decisions, guardrails
- Traces: end‑to‑end request path (feature fetch → inference → post‑processing)
ML/LLM Specifics
- Quality: accuracy, BLEU/ROUGE, hallucination rate, safety violations
- Data Drift: feature distribution shifts; prompt distribution changes for LLMs
- Performance: tokens/sec, context length utilization, cache hit rate
Continuous Evaluation
- Maintain golden datasets and prompt suites
- Automate offline eval in CI with thresholds to block deploys
- Sample online traffic for periodic eval and regression detection
- Export standardized metrics; use OpenTelemetry for traces
- Build eval pipelines; store results with run metadata
- Alerts on quality, drift, and anomalous inputs/outputs