Model compression (reduce model size without significant accuracy loss) - Model quantization (use lower-precision numbers for inference) - Batching (group multiple inference requests to amortize overhead) - Caching (cache predictions for frequently seen inputs) - ONNX conversion (optimized inference