1.3 Multimodal Embedding

Generate embeddings for both text and images in a shared embedding space. - Use a model such as CLIP (`openai/clip-vit-large-patch14`) or SigLIP for joint text-image embeddings. - For text-only content, also generate text embeddings using a sentence transformer (for higher-quality text retrieval). -