Compare your system's performance to: - A text-only baseline (same system without image understanding). - A no-agent baseline (direct LLM call without tool use). - (If budget allows) A commercial API baseline (GPT-4o or Claude with vision). - Present comparisons in tables and/or charts.