5.3 Performance Optimization

Benchmark the serving setup: - Latency: Time to first token (TTFT) and inter-token latency (ITL) at various input lengths (128, 512, 1024, 2048 tokens). - Throughput: Maximum requests per second at different concurrency levels (1, 4, 8, 16, 32 concurrent requests). - Use a load-testing tool (e.g., `