Prediction accuracy (when ground truth is available) - Prediction confidence distributions - Prediction latency (p50, p95, p99) - Throughput (requests per second) - Error rates