Support at least two LLM backends: - A cloud API (OpenAI GPT-4, Anthropic Claude, or equivalent). - A local model served via vLLM or Ollama (e.g., Llama 3.1 8B, Mistral 7B). - Implement a clean abstraction layer so backends can be swapped via configuration. - Support streaming responses (server-sent