Build an ingestion pipeline that processes a collection of documents and images into the multimodal knowledge base. - For each ingested item, store: - The original content (or a reference to it). - Text embedding(s). - Image embedding(s), if the content contains images. - CLIP embedding (for cross-m