Chapter 12 Exercises: Multimodal Prompting
These exercises progress from single-modality practice to mixed-modal workflows. Complete them in the order presented, as later exercises build on earlier ones.
Section A: Image Inputs
Exercise 1 — First Image Analysis (Core)
Goal: Build your first image analysis prompt using the three-part structure.
Choose any professional image you have access to: - A screenshot of a website or application interface - A photo of a whiteboard or physical document - A marketing or advertising image (your own or a competitor's) - A diagram or flowchart
Write a three-part image prompt: 1. Context: What the image is and why you're analyzing it 2. Specific questions: At least 3 specific things you want to know or assess 3. Format: How you want the output structured
Run the prompt. Assess: did the AI understand the image correctly? Were there any misreadings of text or misinterpretations of visual elements? How close was the output to what you needed?
Exercise 2 — Screenshot Troubleshooting (Core)
Goal: Use vision prompting for technical or interface problem-solving.
Take a screenshot of an application error, an interface that isn't working as expected, or a layout issue you need to diagnose. Write a prompt that: - Describes what the screenshot shows and what should be happening instead - Asks the AI to transcribe any visible error text exactly - Asks for a diagnosis of likely causes - Asks for specific recommended next steps
Compare the AI's transcript of the error text against the actual screenshot — are there any OCR errors? How accurate was the diagnosis?
Exercise 3 — Design Feedback (Core)
Goal: Use vision prompting to get structured design feedback.
Find a marketing or interface design — a website, an email, an advertisement, a product mockup, a presentation slide — and write a design critique prompt.
Your prompt should: - Specify the role of the reviewer ("as an experienced UX designer" or "as a consumer encountering this product for the first time") - Ask for specific, named design elements to be evaluated - Request a rating or scoring on at least one dimension - Ask for specific improvement suggestions, not just issue identification
After reading the output: would a professional designer agree with this assessment? What did it catch that you hadn't noticed? What did it miss or get wrong?
Exercise 4 — Multi-Image Comparison (Extension)
Goal: Use vision prompting to compare multiple images systematically.
Collect 3-4 similar images — competitor product pages, examples of the same design element done differently, or before/after versions of the same asset.
Write a comparative analysis prompt that: - Identifies what each image is - Specifies exactly what you're comparing across them - Requests a structured comparison (table or criteria-by-criteria analysis) - Asks for a recommendation about which approach is most effective and why
What advantage does the multi-image comparison provide over analyzing each image individually? What are its limitations?
Exercise 5 — Whiteboard Extraction (Extension)
Goal: Practice extracting structured content from a photograph.
Take a photograph of a whiteboard, sticky-note session, handwritten notes, or hand-drawn diagram. Write a prompt to extract and organize the content.
Your prompt should: - Acknowledge the handwritten/informal format - Ask for transcription of text as accurately as possible - Ask for the content to be organized into a specified structure (action items, themes, diagram description, etc.) - Note any text that appears ambiguous or unclear
Verify the extraction against the original. What error rate did you observe? For what types of content is AI-assisted whiteboard extraction reliably useful vs. unreliable?
Section B: Document Inputs
Exercise 6 — Document Q&A (Core)
Goal: Practice extracting specific answers from a document.
Find a document of at least 10 pages — a report, a policy document, a research paper, a long article. Write a Q&A prompt that: - Describes the document - Asks 5 specific, factual questions about its content - Explicitly instructs the AI to answer only from the document (not general knowledge) - Requests a citation or location reference for each answer
Check each answer against the document. How many are correct? How many include invented information not in the document? How many say "not in document" when the information is actually present?
Exercise 7 — Structured Extraction from a Document (Core)
Goal: Use the Extractor pattern on a real document.
Choose a document type you process regularly — meeting notes, a report, a proposal, a contract, an article, an email chain. Build an Extractor prompt for this document type that: - Specifies exactly what fields to extract - Defines the output format (JSON, table, bulleted list) - Includes the "not in document" rule for absent fields - Extracts at least 5 distinct fields
Run the extraction. Review the output: did any fields get fabricated? Did the model infer information that wasn't stated? How would you use this extracted output downstream?
Exercise 8 — Multi-Document Comparison (Core)
Goal: Compare two or three related documents using a structured comparison prompt.
Choose 2-3 documents that should be compared on shared criteria: - Two competing vendor proposals - Two versions of a policy or plan - Multiple reports on the same topic from different sources - Your own plan vs. a competitor's
Write a comparison prompt with: - Brief description of each document - 4-6 specific comparison criteria - Table format output - A synthesis request at the end ("note any significant contradictions" or "provide a recommendation given these criteria")
Was the comparison reliable? Were there any cases where the AI misstated a document's position? How would you verify the comparison before acting on it?
Exercise 9 — Long Document Strategy (Core)
Goal: Practice the section-by-section approach for a long document.
Find a document over 30 pages long. Instead of uploading it all at once, apply the strategic approach:
- Upload only the table of contents and ask which sections are most relevant to a specific question or goal
- Upload the recommended sections and perform targeted extraction or Q&A
- Ask a synthesis question based on the extracted sections
Compare this workflow to uploading the full document and asking a general question. Which approach produced more accurate, relevant results? How much more efficient was the targeted approach?
Exercise 10 — Document Privacy Audit (Core)
Goal: Build the habit of reviewing documents before upload.
Take a document you might typically upload for AI analysis (a report, a contract, meeting notes). Before uploading, conduct a manual privacy audit:
- What sensitive information is in this document (personal names, financial data, company confidential, PII, etc.)?
- Is this information necessary for the analysis I need, or is it incidental?
- Can I redact or remove the sensitive information without affecting the analysis?
- What is my organization's policy on uploading this type of document to the AI tool I'm using?
Based on this audit: should you upload as-is, redact before uploading, paste only specific sections, or not use AI for this document? Document your decision and the reasoning.
Section C: Data Inputs
Exercise 11 — CSV Analysis Prompt (Core)
Goal: Practice data analysis prompting with pasted structured data.
Find or create a small dataset (under 100 rows) in CSV format. This could be: - Export from a spreadsheet you use - Sample data from a tool you work with - Public dataset from data.gov or Kaggle (small version)
Write a data analysis prompt that: - Describes the data source and what it represents - Asks for an initial data quality check (what might be wrong or incomplete) - Asks for 3-5 specific analytical questions that require reasoning about the data - Specifies the interpretation you need (not computation)
Verify any numerical claims the AI makes against manual calculation or your own spreadsheet tool. Where did it get numbers right? Where did it estimate incorrectly?
Exercise 12 — Advanced Data Analysis (Extension)
Goal: Use ChatGPT Advanced Data Analysis (or equivalent) for actual computation.
Note: This exercise requires access to ChatGPT Plus or equivalent with code execution capability.
Upload a real dataset (Excel or CSV) to ChatGPT with Advanced Data Analysis enabled.
Ask for: 1. A statistical summary (min, max, mean, median, standard deviation for key columns) 2. An identification of outliers with explanation 3. A visualization (histogram, scatter plot, or time series as appropriate to the data) 4. A plain-language interpretation of what the data shows
Compare this to what you would get from pasting the same data without code execution. What is qualitatively different? When would you use each approach?
Exercise 13 — Interpretation vs. Computation (Core)
Goal: Understand the boundary between what AI handles well (interpretation) and what requires tools (computation).
Take a dataset you have access to and write two different prompts:
Prompt A (Computation request): "Here is sales data for the last 12 months. Calculate the total, the monthly average, and identify which month had the highest and lowest revenue."
Prompt B (Interpretation request): "Here is sales data for the last 12 months. Describe the trend pattern, identify any anomalies, and explain what the data suggests about our business performance. Do not calculate totals — focus on patterns and interpretations."
Compare the outputs. In Prompt A, verify the calculations against your own calculation. How many errors are there? In Prompt B, is the interpretation insightful and accurate?
Use this exercise to calibrate your trust for each type of request.
Section D: Code Inputs
Exercise 14 — Code Review Prompt (Core)
Goal: Build an effective code review prompt.
Choose a piece of code — your own, open-source code from GitHub, or code from a tutorial. Write a code review prompt that: - Identifies the language and relevant context (framework, version) - Specifies the role of the reviewer (adjust for your stack) - Lists specific review criteria (security, performance, readability, test coverage — choose what's relevant) - Asks for quoted code with each issue and a specific improvement suggestion
Run the review. Are the identified issues genuine? Are the improvement suggestions accurate and applicable? Did the reviewer miss anything important?
Exercise 15 — Code Explanation for Non-Technical Audience (Core)
Goal: Use the Explainer pattern with code as input.
Find a piece of code that describes a process or logic that a non-technical person would benefit from understanding — perhaps a calculation, a data transformation, a business rule implemented in code.
Write a prompt that asks for an explanation of what the code does (not how it works at a technical level) for a specific non-technical audience. Use the Explainer pattern from Chapter 11: - Specify the audience knowledge level - Specify an analogy approach - Ask for what this means for the audience's use case
Is the explanation accurate? Is it at the right level for the specified audience?
Section E: Mixed-Modal Prompts
Exercise 16 — Text + Image Combo (Core)
Goal: Write a prompt that requires both the image and the text context to answer properly.
Design a prompt where: - An image provides visual information that couldn't be described in text efficiently - Text provides context, goals, or constraints that the image alone doesn't contain - The analysis requires integrating both
Example: Upload a screenshot of a webpage and add text context about your conversion goals. Or upload a diagram and add text about the problem you're trying to solve.
Write the prompt explicitly so both sources are needed. Could the analysis have been done with text alone? With the image alone? What does the combination enable that neither provides independently?
Exercise 17 — Document + Focused Question (Core)
Goal: Practice the most efficient document prompting workflow.
Instead of asking "summarize this document," write a document prompt that: - Provides the document - Specifies exactly one focused question that requires reading the document to answer - Specifies the level of detail you need - Instructs the model not to summarize the whole document — only answer the specific question
Compare this to a general "summarize this" prompt on the same document. Which produces more useful output for your specific need? How much shorter is the focused output?
Exercise 18 — Full Multimodal Workflow (Extension)
Goal: Design and execute a multi-step workflow combining at least 2 modalities.
Design a real professional task that benefits from multiple input types. Some examples: - Competitive analysis: use vision for ad screenshots + document upload for press releases + text synthesis - Meeting debrief: transcript as document + action item extraction + follow-up email draft - Product review: product photos + spec sheet + comparison with competitor specs
Plan the workflow in steps before executing: 1. What are the steps in what order? 2. What modality is used in each step? 3. What does the output of each step feed into the next?
Execute the workflow. Document: total time, where AI helped most, where human review was essential, and what you would do differently next time.
Exercise 19 — Multimodal Failure Analysis (Extension)
Goal: Understand when multimodal prompting fails and why.
Deliberately test the limitations described in the chapter: - Counting objects: upload an image with more than 15 items of the same type and ask the model to count them. How accurate is the count? - Fine text: take a photo of a document with small, complex text and ask the model to read it. Where does it make errors? - Spatial reasoning: upload an image with multiple labeled items and ask specific questions about relative positions ("which item is closest to the top-left corner?"). How accurate is the spatial reasoning? - Long document middle content: upload a 30+ page document and ask a specific question about content that appears near the middle. How does accuracy compare to asking about content in the first and last sections?
Document what you find. How would these limitations affect the workflows you were planning to build?
Exercise 20 — Build Your Multimodal Pattern Library (Core)
Goal: Extend your Chapter 11 pattern library with multimodal variants.
For the most common document or image types you encounter at work, build one multimodal pattern for each. Include the three-part structure: - What the input is (description/context) - What specific analysis to perform - How to format the output
Add each to your pattern library from Chapter 11 with a tag indicating it requires a specific modality. Test each pattern on a real example of that input type.
You should have at least 3 multimodal patterns in your library after this exercise. If you work with images regularly: at least 1 image analysis pattern. If you work with documents regularly: at least 2 document analysis patterns.