Quarter 3: LLM Infrastructure

Studied inference optimization: KV caching, speculative decoding, vLLM - Learned about distributed inference for large models - Contributed to an open-source inference framework - Milestone: Deployed a production LLM serving pipeline