Chapter 19 Exercises: Specialized and Domain-Specific AI Tools

These exercises build practical skill in evaluating, selecting, and using specialized AI tools. Many exercises use the evaluation framework from the chapter and can be completed with tools you currently have access to. Several exercises require accessing free tiers or trials of specific tools.

Exercise 1: The Spectrum Mapping Exercise

Goal: Develop an eye for where tools actually sit on the general-to-specialized spectrum.

Take five AI tools you currently use or have heard of. For each, identify: 1. What underlying model powers it (if disclosed)? 2. What domain-specific training or fine-tuning has been applied (if any)? 3. What workflow integrations does it offer? 4. Where on the general-to-specialized spectrum does it actually sit?

Discuss: Are any of these tools positioned as "specialized" but are actually general models with domain-specific system prompts? How can you tell?

Exercise 2: The Six-Question Evaluation Framework Applied

Goal: Practice using the full evaluation framework on a real tool.

Choose a specialized AI tool in a domain relevant to your professional work. Apply all six evaluation questions from the chapter:

What training/fine-tuning data was used? (Research vendor documentation, press releases, academic papers if available)
Is there independent validation? (Search for third-party studies, not just vendor claims)
How does it handle uncertainty? (Test this directly — ask questions with genuinely uncertain answers)
What are the data privacy terms? (Read the actual privacy policy)
What are the documented failure modes? (Search for user reports, published evaluations, any documented errors)
Is human expert oversight built into the workflow?

Write up your findings in a 300-500 word evaluation summary. Would you recommend this tool for your use case? Why or why not?

Exercise 3: Elicit for Literature Review (Free Tier)

Goal: Experience AI-assisted academic literature search with a specialized research tool.

Go to Elicit (elicit.com) and set up a free account. Run a research question relevant to your professional domain. Suggested questions:

If you work in technology: "What evidence exists for specific programming practices improving code quality?"
If you work in marketing: "What factors predict customer retention in subscription businesses?"
If you work in consulting: "What management interventions have strong evidence for improving organizational change adoption?"
If you work in any field: choose a question you have actually wondered about

After Elicit returns results: 1. Review three of the papers it identified 2. How accurately did Elicit summarize the paper findings? 3. What did Elicit miss or oversimplify in the summaries? 4. Did Elicit surface papers you would not have found with a Google Scholar search? 5. What limitations of Elicit did you notice?

Exercise 4: Consensus for Specific Research Questions

Goal: Compare AI research synthesis tools for different question types.

Go to Consensus (consensus.app) and run three research questions — one where you expect strong consensus, one where you expect mixed evidence, and one that is genuinely contested.

Examples: - Strong consensus expected: "Does exercise reduce risk of cardiovascular disease?" - Mixed evidence: "Does remote work improve productivity?" - Contested: "Do stock buybacks create long-term shareholder value?"

For each question: - What does Consensus report about the evidence direction? - How many papers did it find? - Does its confidence level seem calibrated to the actual state of evidence? - What does it say about uncertainty?

Compare your Consensus results to a quick Google Scholar search on the same questions. What does the specialized tool add?

Exercise 5: General vs. Specialized Comparison Benchmark

Goal: Empirically compare a specialized tool against a general-purpose model for domain-specific tasks.

Choose a domain-specific task in your professional field. Run it on both a specialized tool and a general-purpose model (Claude or ChatGPT) with a well-crafted prompt. Run at least 5 tasks.

Example tasks by domain: - Legal: "Identify potential issues in this contract clause: [paste a contract clause]" - Marketing: "Write three email subject line variations for this campaign: [describe campaign]" - Research: "Summarize the key debate in the literature about [topic]" - HR: "Draft a job description for [role] emphasizing [specific criteria]"

For each of the 5 tasks, rate both outputs on: - Accuracy/correctness (1-5) - Domain appropriateness (1-5) - Usefulness for your actual workflow (1-5)

Tabulate results. What does the comparison show about whether the specialized tool is worth its cost?

Exercise 6: Privacy Policy Audit

Goal: Develop practical literacy in reading AI tool privacy policies for professional use.

Select two specialized AI tools — one you use or are considering using professionally, and one in a high-stakes domain (legal, medical, or financial). For each, find and read the actual privacy policy (not a summary). Answer:

Is user-submitted data used to train future models? Is there an opt-out?
Where is data stored geographically?
How long is data retained?
What happens in the event of a legal order or subpoena?
Does the tool have enterprise/business tiers with enhanced privacy protections?
Is there any mention of relevant compliance certifications (HIPAA, SOC 2, GDPR, etc.)?

Based on what you found, would you be comfortable using this tool with client confidential data? What questions would you need answered before doing so?

Exercise 7: The Hallucination Test for Specialized Tools

Goal: Identify how specialized tools handle questions at the edge of their domain.

Choose a specialized AI tool — legal, medical, research, or other. Run five tests designed to find its failure modes:

Ask about a case, study, citation, or piece of legislation that you know exists — verify the tool's answer
Ask about something that sounds plausible but does not exist (a fictional court case, a made-up drug name, a nonexistent regulation)
Ask about something current that may be beyond the training cutoff
Ask a cross-domain question that requires knowledge beyond the tool's specialization
Ask about a genuinely contested or unsettled question in the domain

For tests 1-3, verify the answers against authoritative sources. How accurate was the tool? Did it acknowledge uncertainty appropriately? Did it hallucinate confidently or hedge honestly?

Exercise 8: Meeting AI Tool Evaluation

Goal: Evaluate an AI meeting assistant against your actual meeting workflow.

If you have access to Otter.ai, Fireflies.ai, or a similar meeting transcription tool (most have free tiers), use it in an actual meeting. Evaluate afterward:

Transcription accuracy: what percentage of words were correct?
Speaker attribution accuracy (if applicable)
Summary quality: does the AI summary capture the key decisions and discussion points?
Action item extraction: did it identify the actual commitments made?
What did the manual note-taker capture that the AI missed?
What did the AI capture that was missed or forgotten in manual notes?

Would this tool improve your current meeting note-taking workflow? What would need to be different?

Exercise 9: The Tool Proliferation Audit

Goal: Assess and rationalize your current AI tool usage.

List every AI tool you currently use or pay for. For each: 1. What specific job does it do in your workflow? 2. Is it a specialized or general-purpose tool? 3. How frequently do you use it per week? 4. What is the annual cost (free tier, paid tier, estimate time cost of free use)? 5. Could a different tool you already use handle this job adequately?

After completing the audit: - Which tools have clearly overlapping functions? - Which tools are underused relative to their cost? - If you could keep only three AI tools, which would they be and why?

This exercise often reveals tool subscriptions being paid for that are rarely used, and unnecessary duplication.

Exercise 10: Evaluating Adobe Firefly for Commercial Creative Work

Goal: Understand the commercially-safe AI image generation positioning and its practical tradeoffs.

If you have access to Creative Cloud (or use the free tier at firefly.adobe.com):

Generate the same prompt in Adobe Firefly and either Midjourney or DALL·E 3
Compare output quality
Research Adobe Firefly's current commercial licensing terms — specifically, how does it differ from Midjourney and DALL·E 3 in terms of training data provenance and commercial rights?

For a practitioner who creates commercial visual content professionally: when would you choose Firefly over Midjourney despite potential quality differences? When would the reverse be true?

Exercise 11: Canva AI for Non-Designer Workflow

Goal: Evaluate AI-assisted design for professionals without design backgrounds.

Using Canva AI (free tier available): 1. Create a presentation slide for a business concept using AI-assisted layout and AI-generated imagery 2. Create a social media graphic using AI text and image features 3. Use the background removal AI on a photo you have

After completing each task, evaluate: - How much design skill was required? - How does the output quality compare to what you could produce without AI assistance? - Would this tool replace your need for stock photos, basic image editing, and design templates for common tasks? - What limitations did you encounter?

Exercise 12: Building Your Specialized Tool Evaluation Scorecard

Goal: Create a reusable evaluation tool for future specialized AI tool assessments.

Based on the six evaluation questions from the chapter and your experience with the other exercises, build a personal evaluation scorecard for specialized AI tools. Your scorecard should:

List all evaluation criteria (expand the six from the chapter based on your domain experience)
Weight each criterion by importance for your professional context
Include a rating scale for each criterion
Include a minimum threshold: what score is required for adoption?
Include a cost-benefit analysis section

Apply your scorecard to one specialized tool you evaluated in an earlier exercise. Does using the scorecard produce a clearer, more defensible decision than your gut reaction?

Exercise 13: Trust Calibration for Medical/Legal/Financial Content

Goal: Develop specific calibration for the highest-stakes tool categories.

Choose one of the three high-stakes domains (medical, legal, or financial). Using a specialized or general AI tool, ask it five questions in that domain ranging from simple to complex:

Simple: "What is the standard treatment for [common condition]?" Moderate: "What are the key differences between [two legal concepts]?" Complex: "How should I approach [a scenario requiring professional judgment]?"

For each answer: 1. How confident does the AI present itself as? 2. Does it recommend consulting a professional? 3. Verify the answer against authoritative sources in the domain 4. Rate the calibration: was the AI's confidence level appropriate to the actual reliability of its answer?

What patterns do you notice about when specialized tools express appropriate uncertainty versus when they are overconfident?

Exercise 14: The "One General Plus One Specialized" Strategy

Goal: Design a streamlined AI toolkit for your professional role.

Based on your professional domain and workflow, design the ideal "one general plus one specialized" AI toolkit:

Choose your general-purpose AI (Claude, ChatGPT, Gemini) and justify: why does this model fit your work better than the alternatives?
Identify your highest-volume, most distinctive professional task that a specialized tool could handle better than a general model
Research which specialized tool is most appropriate for that task (use the evaluation framework from Exercise 2)
Write a brief implementation plan: how would you integrate these two tools into your current workflow?

Present your toolkit to a colleague or write it up as if making a recommendation to your team.

Exercise 15: Multi-Domain Research Task Comparison

Goal: Experience where specialized research tools outperform and where they fall short.

Design a research task that crosses domain boundaries. Example: "I need to understand the evidence base for AI adoption in the healthcare sector, including both the clinical outcomes research and the business case for hospital systems."

Run this task using: 1. Elicit or Consensus (specialized research tool) 2. Claude or ChatGPT (general-purpose) 3. A combination: start with Elicit/Consensus for literature, then synthesize with a general model

Compare the quality of the final synthesis produced by each approach: - What did the specialized tool contribute? - What did the general model contribute? - What was best about the combined approach? - Which approach would you use for this type of task going forward?

Exercise 16: Domain Expert Interview (Optional Extended Exercise)

Goal: Ground your evaluation skills in actual domain expertise.

If you have access to a domain expert in a field where AI tools are active (a lawyer who uses Harvey or Casetext, a physician who uses Nuance DAX or Glass AI, a financial analyst who uses AlphaSense), conduct a 20-30 minute interview:

Questions to ask: 1. What AI tools do you use in your professional workflow, and for what tasks? 2. How has your trust in these tools evolved over time? 3. Have you caught the tool being wrong in a way that mattered? What happened? 4. What do you wish general users understood about the limits of these tools in your domain? 5. What do you see as the most significant risk of AI use in your domain?

Write up your findings. How does the expert's perspective compare to the vendor's claims?

Reflection Questions

After completing several exercises:

What is your current AI toolkit, and which tools have the clearest ROI versus which are underused or duplicative?
What evaluation criteria matter most for your specific professional domain?
Where have you encountered specialized AI tools being overconfident or hallucinating in domain-specific ways?
How has working through the evaluation framework changed how you respond to new tool announcements?
What one specialized tool, if you have not already adopted it, would you most want to evaluate for your workflow? What evaluation process would you use?