Support at least three document formats: PDF, HTML, and Markdown. - Use appropriate parsing libraries (e.g., `pymupdf` or `pdfplumber` for PDF, `beautifulsoup4` for HTML). - Extract and preserve document structure: titles, headings, paragraphs, tables, and lists. - Handle encoding issues, malformed