32 min read

If you've worked through Chapters 7, 8, and 9, you have a solid foundation. You know how to construct a prompt with clear task statements, relevant context, and explicit output specifications. You've seen how adding a single sentence of context can...

Chapter 10: Advanced Prompting Techniques

The Ceiling You're About to Break Through

If you've worked through Chapters 7, 8, and 9, you have a solid foundation. You know how to construct a prompt with clear task statements, relevant context, and explicit output specifications. You've seen how adding a single sentence of context can transform vague output into something useful. These fundamentals genuinely work — and they work better than most people's intuitive approach to prompting.

But they have a ceiling.

The ceiling shows up in specific situations: tasks that require multi-step reasoning, outputs that need to match a precise style, problems where the first answer is plausible but wrong, complex projects that are too large to address in a single exchange. For these situations, the foundational techniques — task + context + format + constraints — are necessary but not sufficient.

This chapter covers the techniques that push past that ceiling. They are not magic, and they are not difficult once you understand what they are doing. They are principled extensions of the same ideas you already know, applied to more demanding tasks.

The four core techniques in this chapter are:

  1. Chain-of-Thought (CoT) Prompting — getting the model to reason through problems step by step before answering, which dramatically improves accuracy on anything involving multiple logical steps
  2. Few-Shot Prompting — providing worked examples inside the prompt to teach the model your specific standards, format, and style
  3. Self-Critique and Self-Correction — asking the model to evaluate and improve its own output, adding a quality layer to the generation process
  4. Structured Decomposition — breaking complex tasks into explicit subtasks, either in the same prompt or across multiple exchanges

We will also briefly cover Tree-of-Thought reasoning and then spend significant time on how to combine these techniques for maximum effect.


1. Why Standard Prompting Hits a Ceiling

To understand why advanced techniques work, it helps to understand specifically where standard prompting fails.

Consider a standard prompt for a moderately complex task:

"What is the most cost-effective way for a 50-person SaaS company to move from AWS to a hybrid cloud/on-premise setup while maintaining 99.9% uptime?"

A capable AI given this prompt will produce a reasonable-sounding answer. It will mention cost analysis, phased migration, load balancing, redundancy. The answer will be coherent, professional, and in many respects informed.

It will also frequently be subtly wrong — or at least untrustworthy.

Why? Because the model jumped directly to output generation. It pattern-matched to responses it has seen for similar prompts, assembled a plausible-looking answer, and delivered it. It did not reason through the actual constraints. It did not check whether its suggested approach actually maintains 99.9% uptime during the migration window. It did not verify whether the hybrid setup it's recommending is less expensive than the AWS setup the company has, once you factor in on-premise capital costs, staffing, and maintenance.

The answer sounds right. But if you test it against the actual numbers for this company, it may fall apart.

This is the ceiling of standard prompting on complex tasks: plausibility without rigor. The model generates a plausible response; it does not reason to a correct one.

The advanced techniques in this chapter address this problem from different angles: - CoT forces the reasoning process to be explicit and verifiable - Few-shot examples show the model what "right" looks like for your specific standards - Self-critique asks the model to pressure-test its own output - Structured decomposition prevents the model from trying to answer everything at once

Each technique is addressing a specific failure mode. Understanding the failure mode helps you choose the right technique.


2. Chain-of-Thought (CoT) Prompting

What It Is

Chain-of-thought prompting is a technique where you instruct the model to show its reasoning process — to think through the problem step by step before arriving at an answer, rather than jumping directly to a conclusion.

The insight behind CoT is that many tasks require a sequence of intermediate reasoning steps, and if the model skips those steps in its generation process, it often gets the wrong answer. By making the reasoning process explicit, you accomplish two things simultaneously: you improve the model's accuracy on the task, and you make the reasoning transparent enough that you can catch errors.

Why It Works

The technical explanation involves how language models generate tokens sequentially. When a model generates an answer directly, it is essentially making a highly compressed inference from prompt to output. When it reasons step by step, each step becomes context for the next step — the model's intermediate conclusions shape subsequent conclusions. This makes complex multi-step reasoning substantially more reliable.

The practical result is striking. A classic example from the original CoT research: models given standard prompts solve grade-school math word problems at roughly 18% accuracy. The same models given chain-of-thought prompts solve the same problems at roughly 57% accuracy — more than three times better, with no change to the model itself, only to how the prompt is structured.

The improvement persists across task types: arithmetic, commonsense reasoning, symbolic reasoning, multi-step planning. Any task that requires more than one inferential step benefits from some form of CoT.

Zero-Shot CoT: "Let's Think Step by Step"

The simplest form of chain-of-thought is zero-shot CoT: adding the phrase "Let's think step by step" (or a variation) to the end of your prompt, without providing any examples of what the reasoning should look like.

This four-word addition consistently improves performance on reasoning tasks. It tells the model that reasoning is expected before answering, which shifts the generation process.

Effective zero-shot CoT triggers: - "Let's think step by step." - "Think through this carefully before answering." - "Walk me through your reasoning." - "Before giving your answer, work through the logic." - "Reason through this step by step, then state your conclusion."

Zero-shot CoT is best used when: - You want a quick reasoning boost without much prompt engineering - The task is moderately complex (not trivial, not extremely open-ended) - You need to see the reasoning process to verify correctness

Manual CoT: Showing the Format

Manual CoT goes further: you provide an example of what good step-by-step reasoning looks like for your type of problem, then ask the model to follow the same pattern for the new problem.

This is more work upfront, but it gives you control over the reasoning format and often produces higher-quality reasoning traces than zero-shot CoT alone.

Full Example 1: Numerical Reasoning (Zero-Shot CoT)

Standard Prompt:

"A project has a budget of $240,000. Phase 1 costs $80,000 and takes 3 months. Phase 2 costs 40% more than Phase 1. Phase 3 costs $60,000 but has a 25% chance of requiring a $20,000 contingency. What is the expected total cost, and is it within budget?"

Standard Response (typical):

"The expected total cost is approximately $244,000, which slightly exceeds the budget."

(This is actually wrong — see the reasoning below.)

CoT Prompt:

"A project has a budget of $240,000. Phase 1 costs $80,000 and takes 3 months. Phase 2 costs 40% more than Phase 1. Phase 3 costs $60,000 but has a 25% chance of requiring a $20,000 contingency. What is the expected total cost, and is it within budget? Let's think step by step."

CoT Response:

"Let me work through this step by step.

Phase 1 cost: $80,000 (given directly)

Phase 2 cost: Phase 2 costs 40% more than Phase 1. 40% of $80,000 = $32,000 Phase 2 cost = $80,000 + $32,000 = $112,000

Phase 3 cost (expected): Phase 3 has a base cost of $60,000 plus a 25% chance of a $20,000 contingency. Expected contingency = 0.25 × $20,000 = $5,000 Expected Phase 3 cost = $60,000 + $5,000 = $65,000

Total expected cost: $80,000 + $112,000 + $65,000 = $257,000

Comparison to budget: $257,000 exceeds the $240,000 budget by $17,000.

Conclusion: No, the expected total cost is $257,000, which exceeds the budget by $17,000 (about 7%). If the Phase 3 contingency does not trigger, the total would be $252,000 — still over budget by $12,000. The project as currently scoped is likely to exceed budget."

The CoT version catches the Phase 2 calculation correctly ($112,000 vs. the standard prompt's likely error) and properly handles the expected value of Phase 3. The transparent reasoning also lets you spot-check each step.

Full Example 2: Strategic Analysis (CoT for Multi-Step Decisions)

Task: Decide whether a small business should accept a bulk order that would double their production for three months.

CoT Prompt:

"A bakery generates $15,000/month in revenue with 60% gross margin. A new client wants a 3-month bulk order worth $45,000 total, but it would require hiring one part-time employee ($3,000/month) and a one-time equipment rental of $4,000. After the contract ends, the equipment is returned. The owner worries this will distract from regular customers. Reason through whether this is financially beneficial, step by step."

CoT Response:

"Let me analyze this systematically.

Current monthly economics: - Revenue: $15,000 - Gross profit (60% margin): $9,000

3-month baseline (without the contract): - Total revenue: $45,000 - Total gross profit: $27,000

New contract financials: - Contract revenue over 3 months: $45,000 - Additional labor cost: $3,000/month × 3 = $9,000 - Equipment rental: $4,000 (one-time) - Total additional costs: $13,000

But what is the gross margin on the bulk order? The prompt doesn't specify whether the $45,000 contract is at the same 60% margin or different. I'll note this is an important unknown. Assuming the same 60% margin: - Gross profit from contract: $45,000 × 60% = $27,000 - Net additional profit after contract-specific costs: $27,000 - $13,000 = $14,000

3-month financial picture with contract: - Regular revenue: $45,000 (assuming no loss of regular customers) - Contract revenue: $45,000 - Regular gross profit: $27,000 - Contract gross profit after costs: $14,000 - Total gross profit: $41,000 vs. baseline $27,000 = +$14,000 improvement

Key risks to quantify: 1. If regular customer revenue drops 10% due to distraction: -$2,700 in gross profit 2. If bulk order margin is 40% instead of 60%: gross profit = $18,000 - $13,000 = $5,000 (still positive but much smaller)

Conclusion: Financially beneficial under the stated assumptions — $14,000 incremental gross profit over 3 months. But the owner should clarify: (1) the actual margin on the bulk order, and (2) what capacity impact regular customer service will feel, as that's the main risk."

The CoT approach here not only gets the math right but surfaces the key unknown (margin on the bulk order) and the main risk in a way that a direct answer would not.

Full Example 3: Technical Debugging (CoT for Systematic Diagnosis)

Task: Debug a failing function without running the code.

CoT Prompt:

"This Python function is supposed to return the average of all values in a nested dictionary, but it's returning 0 in some cases. Think through what could cause this, step by step.

python def nested_average(data): total = 0 count = 0 for key, value in data.items(): if isinstance(value, dict): for inner_key, inner_val in value.items(): total += inner_val count += 1 else: total += value count += 1 return total / count"

CoT Response:

"Let me trace through potential failure modes step by step.

Step 1: When does it return 0? The function returns total / count. For this to return 0, either total = 0 (all values are 0), or there's a division producing 0 from integer division if total < count.

Wait — in Python 3, / is float division, so integer division isn't the issue. But in Python 2, total / count with integer operands would do integer division. Let me flag that as Possibility A.

Step 2: What if count = 0? If the dictionary is empty, count stays 0 and we get a ZeroDivisionError, not 0. So that's not the returning-0 case.

Step 3: What about deeply nested data? The function only handles one level of nesting. If a value is itself a dictionary with nested dictionaries, isinstance(value, dict) would be True, but then inner_val could itself be a dict. If you try to add a dict to total, you get a TypeError. But the question says it returns 0 in some cases, not that it crashes.

Step 4: Integer division (most likely cause) The most likely cause is Python 2 integer division. If total and count are both integers and total < count (possible when many values are 0 or small fractions), total / count returns 0. Example: total = 3, count = 5 → 3/5 = 0 in Python 2.

Step 5: Confirm and fix Fixes: - Python 2: change return total / count to return float(total) / count - Python 3: the function should work unless the inputs contain non-numeric values - Both: add input validation for empty dict and non-numeric values

Most likely root cause: Python 2 integer division, especially when the average is a fraction less than 1."

When to Use CoT (and When Not To)

Use CoT when: - The task requires multiple sequential reasoning steps - Accuracy is more important than response speed - You want to verify the reasoning, not just the conclusion - The problem involves math, logic, planning, or diagnosis - The model frequently makes errors on this task type without CoT

Do not use CoT when: - The task is factual retrieval (no reasoning required) - You need a short, direct output and reasoning would be noise - The task is highly creative and open-ended (CoT can over-constrain) - Latency or response length is a significant concern


3. Few-Shot Prompting

What It Is

Few-shot prompting provides the model with a small number of worked examples — input/output pairs — before presenting the actual task. Instead of describing what you want in abstract terms, you show what it looks like in practice.

The term comes from machine learning: zero-shot means no examples, one-shot means one example, few-shot means a small number (typically two to eight). In prompting, few-shot refers to including these examples directly in the prompt text.

Why It Works

Abstract instruction and concrete demonstration are different things. When you tell someone "write in a concise, direct style," you are describing an abstract quality. When you show them three examples of writing in that style, you are communicating the same thing more precisely and more completely.

AI models respond to this same dynamic. When you provide examples, the model can extract pattern information that is extremely difficult to specify verbally: the exact rhythm of your sentences, the specific vocabulary register you prefer, the way you handle transitions, the level of formality, what you always include and what you always omit.

Few-shot prompting is especially powerful for style imitation, classification, and structured extraction — any task where you have a clear standard that's easier to show than to describe.

How Many Examples? What Research Shows

The research on few-shot prompting yields some clear guidance:

  • Zero-shot vs. few-shot: Even one example improves performance on most tasks. The jump from zero to one example is typically larger than the jump from one to two.
  • Optimal range: For most tasks, two to six examples produces the best balance of performance and prompt length. Beyond eight examples, the marginal gain diminishes sharply and you start to use up context window space that could hold other useful information.
  • Quality over quantity: Three high-quality, representative examples outperform eight mediocre or mismatched examples. The examples themselves are the constraint.
  • Diminishing returns on complex models: Very large models (like current frontier models) require fewer examples than smaller models to reach the same performance, because they extract more pattern information from each example.

Practical rule of thumb: Start with three examples. If quality is still insufficient, try adding one or two more. If you are working with a very powerful model and a clear, well-chosen set of three examples, that is often enough.

Choosing Your Examples

The choice of examples is the most important decision in few-shot prompting. Poor example selection will produce poor results regardless of quantity.

Guidelines for selecting examples:

  1. Represent the range of your task. If your task involves different difficulty levels or different subtypes, include at least one example from each major subtype. Don't choose only your easiest examples.

  2. Demonstrate the qualities you most care about. If consistency of tone is your top priority, make sure all examples show that tone consistently. If structure is critical, make the structure explicit in every example.

  3. Match the actual input type you'll use. Examples with short inputs don't teach the model how to handle your long inputs. Make examples as similar to your real use cases as possible.

  4. Exclude examples that are atypical or edge cases unless edge cases are part of your task. Edge cases in few-shot examples can confuse the model about what "normal" looks like.

  5. Keep format consistent. If one example has a header for each section, all examples should have headers. Inconsistent format in examples produces inconsistent format in output.

Format: The Critical Consistency Rule

Whatever format your examples use, maintain it perfectly. This includes: - How you delimit the input from the output (a blank line, a label like "Output:", a separator like "---") - Whether there is a label at the start of each example ("Example 1:", "Input:", etc.) - Capitalization, punctuation, and spacing patterns - Length consistency (roughly similar output lengths across examples)

The model will imitate the format of your examples. If your examples vary in format, you get variable format in output.

Worked Example 1: Writing Style / Brand Voice

Context: Alex needs product descriptions in her company's brand voice — energetic, benefit-focused, under 50 words, present tense, no jargon.

Standard prompt:

"Write a product description for our new insulated water bottle."

Typical output: A generic product description in no particular style.

Few-Shot Prompt:

Write product descriptions in our brand voice: energetic, benefit-focused,
under 50 words, present tense, no jargon.

---
Product: Reusable tote bag
Description: Carry everything you love in a bag that loves the planet back.
This lightweight, machine-washable tote holds up to 40 lbs and folds into
your pocket. Perfect for farmers markets, grocery runs, or that spontaneous
beach trip you didn't plan for. One bag, every adventure.

---
Product: Bamboo phone stand
Description: Give your neck a break and your desk a glow-up. This sleek
bamboo stand holds any phone at the perfect viewing angle — streaming,
video calls, recipe browsing. Stable, sustainable, and under 4 inches wide.
Work smarter. Look better doing it.

---
Product: Stainless steel lunch container
Description: Hot at noon, cold at eight. This leak-proof container keeps
your food exactly how you made it, for up to 12 hours. Three sealed
compartments keep flavors separate. Dishwasher safe. Finally, a lunch
worth looking forward to.

---
Product: Insulated water bottle
Description:

Output:

"Stay hydrated on your terms. This 32 oz insulated bottle keeps drinks cold for 24 hours and hot for 12 — from morning commute to post-gym cool-down. Wide-mouth opening for easy filling and cleaning. One bottle, all day."

The few-shot approach reliably produces the right voice, length, structure, and energy level without describing any of those qualities abstractly.

Worked Example 2: Classification

Task: Classify customer support tickets by urgency (Urgent, Normal, Low) for routing.

Few-Shot Prompt:

Classify each customer support ticket as Urgent, Normal, or Low priority.

Urgent = account access issues, billing errors, service outages
Normal = product questions, change requests, minor bugs
Low = general feedback, feature requests, compliments

---
Ticket: "I've been charged twice for my subscription this month and need
this resolved immediately."
Classification: Urgent

---
Ticket: "Is there a way to change the email notifications to daily instead
of weekly?"
Classification: Normal

---
Ticket: "Just wanted to say the new dashboard design is really clean.
Great work!"
Classification: Low

---
Ticket: "I can't log in to my account — the password reset link isn't working."
Classification: Urgent

---
Ticket: "Would love to see dark mode added in a future update."
Classification: Low

---
Now classify these:

Ticket: "The payment page is throwing an error and I can't complete my purchase."
Classification:

Ticket: "How do I export my data to CSV?"
Classification:

Ticket: "I accidentally created two accounts. Can you merge them?"
Classification:

Output:

"Payment page error: Urgent CSV export question: Normal Duplicate accounts: Normal"

The few-shot examples do two things: they define the categories more precisely than the written criteria alone, and they establish the output format (a single-word classification without explanation) so the model doesn't pad its responses.

Worked Example 3: Structured Extraction

Task: Extract structured data from unstructured job postings.

Few-Shot Prompt:

Extract key information from job postings into this JSON structure:
{
  "title": "",
  "company": "",
  "location": "",
  "remote": true/false,
  "salary_range": "",
  "required_experience_years": 0,
  "key_skills": []
}

---
Job Posting 1:
"Senior Data Analyst at Meridian Health. Full-time, based in Chicago IL
(hybrid 3 days/week). $90,000–$115,000. We're looking for someone with 5+
years of analytics experience. Must know SQL, Python, and Tableau. Bonus
points for healthcare data experience."

Extraction:
{
  "title": "Senior Data Analyst",
  "company": "Meridian Health",
  "location": "Chicago, IL",
  "remote": false,
  "salary_range": "$90,000–$115,000",
  "required_experience_years": 5,
  "key_skills": ["SQL", "Python", "Tableau", "healthcare data (preferred)"]
}

---
Job Posting 2:
"Marketing Coordinator – Fully remote anywhere in the US. Startup
environment, $55K–$65K. 2 years experience in content marketing or social
media. Canva, Hootsuite, and HubSpot experience a plus."

Extraction:
{
  "title": "Marketing Coordinator",
  "company": "Not specified",
  "location": "Remote (US)",
  "remote": true,
  "salary_range": "$55,000–$65,000",
  "required_experience_years": 2,
  "key_skills": ["content marketing", "social media", "Canva (preferred)", "Hootsuite (preferred)", "HubSpot (preferred)"]
}

---
Now extract from this posting:
"Product Manager, Cloudify. New York or San Francisco, hybrid.
Compensation: $130K–$160K + equity. 4+ years in product management,
experience with B2B SaaS required. Agile/Scrum, Jira, strong
communication skills."

Extraction:

Output:

{
  "title": "Product Manager",
  "company": "Cloudify",
  "location": "New York or San Francisco",
  "remote": false,
  "salary_range": "$130,000–$160,000 + equity",
  "required_experience_years": 4,
  "key_skills": ["B2B SaaS", "Agile/Scrum", "Jira", "communication"]
}

4. Self-Critique and Self-Correction

The Core Idea

Self-critique prompting asks the model to evaluate its own output against explicit criteria, identify weaknesses, and produce an improved version. Instead of getting one output and hoping it's good, you get an output plus a quality assessment plus a revised output.

This technique works because the skills required to generate text and the skills required to evaluate text are partly separable. A model may produce a good first draft most of the time, but its internal quality checking during generation is compressed into the generation process itself. By explicitly separating generation from critique, you give the quality-assessment function room to work more carefully.

The Two-Step: Critique Then Improve

The basic pattern is:

Step 1 (Generation): [Your original prompt]

Step 2 (Critique): Now review your response above. Evaluate it against these criteria: [criteria].
Identify 2–3 specific weaknesses or areas for improvement.

Step 3 (Revision): Rewrite the response addressing the weaknesses you identified.

You can run this in a single prompt or across multiple exchanges. Single-prompt self-critique works because modern language models can generate text, evaluate it, and revise within one context window.

Constitutional-Style Self-Correction

A more structured variant asks the model to evaluate against a specific set of principles or standards — a "constitution" — rather than open-ended quality criteria. This is useful when you have clear, articulated standards.

Generate [output].

Then evaluate your output against these standards:
1. [Standard 1]
2. [Standard 2]
3. [Standard 3]

For each standard, note whether your output meets it. If any standard is not met,
revise the output to meet all standards.

Full Worked Example: Self-Critique for a Business Email

Original Prompt and First Output:

Draft an email to our largest client explaining that their project deadline
will be delayed by two weeks due to a key team member's unexpected departure.
Keep it professional and solution-focused.

Typical First Output:

"Dear [Client],

I wanted to reach out regarding the [Project Name] timeline. Unfortunately, we have experienced an unexpected staffing change that has impacted our capacity. As a result, we need to extend the project deadline by approximately two weeks, from [Original Date] to [New Date].

We understand this is not the news you were hoping for, and we sincerely apologize for any inconvenience this may cause. We remain fully committed to delivering high-quality work and will do everything in our power to minimize the impact.

Please let me know if you'd like to schedule a call to discuss.

Best regards, [Name]"

Self-Critique Prompt:

Review the email you just wrote. Evaluate it against these criteria:
1. Specificity: Does it give the client the information they actually need (exact new deadline, specific impact on deliverables)?
2. Solution orientation: Does it offer concrete next steps or mitigation, or just apologize?
3. Tone: Does it sound confident and in-control, or defensive and apologetic?
4. Client perspective: Does it acknowledge the specific inconvenience to this client, or speak in generalities?

Identify 2-3 specific improvements needed.

Self-Critique Response:

"Reviewing against the criteria:

  1. Specificity: Weak. The email uses placeholder text [Original Date] and [New Date] rather than specific dates. It says 'key team member's unexpected departure' but doesn't tell the client what deliverable is affected.

  2. Solution orientation: Insufficient. 'We will do everything in our power to minimize the impact' is a vague reassurance, not a plan. There's no mention of what specific actions are being taken — temporary resource allocation, reprioritization, or anything concrete.

  3. Tone: Defensive-leaning. 'We sincerely apologize for any inconvenience' repeated twice positions us as in the wrong rather than as professionals handling an unexpected situation with competence.

Main improvements needed: - Add specific dates and affected deliverables - Replace the vague commitment with 2–3 concrete mitigation steps - Shift the tone from apologetic to confident and action-oriented"

Revision Prompt:

"Now rewrite the email addressing those specific improvements."

Revised Output:

"Dear Sarah,

I'm writing to update you on the Horizon Dashboard project timeline. A key team member transitioned out of our company last week, and we've been assessing the impact on your deliverables carefully before reaching out.

Here's where we stand: the main dashboard build is on track; what will be delayed is the custom reporting module. The new delivery date for the full package is March 28 (shifted from March 14).

We've taken three steps to minimize the impact for your team: 1. Reassigned the reporting module to Marcus Chen, our senior developer, who is ramping up now 2. Delivered the dashboard wireframes 3 days early to give your team review time during the gap 3. Scheduled our weekly sync for this Thursday at 2 PM to walk through the revised plan in detail

Please confirm Thursday's call works for your team, or suggest an alternative time. I'm available to discuss any concerns directly.

Best, David"

The difference between the first and revised versions demonstrates exactly why self-critique is worth the extra step. The critique identified real, specific problems; the revision addressed them directly.

When Self-Critique Is Most Valuable

  • High-stakes written communications (emails to important clients, executive reports)
  • Content that needs to meet multiple specific criteria simultaneously
  • First drafts you would normally spend significant time editing manually
  • Situations where you are unsure whether you've forgotten to address a requirement

Limitations of Self-Critique

Self-critique is not infallible. Models can critique incorrectly (identifying false problems while missing real ones), and they can revise in ways that fix the stated problem but introduce new ones. Self-critique catches many errors but is not a substitute for human review on high-stakes outputs.


5. Structured Decomposition

The Core Problem: Scope Overload

When a task is large or complex, asking AI to address all of it in a single prompt often produces a superficial output — broad coverage at the expense of depth and accuracy. The model tries to do everything and does none of it well.

Structured decomposition solves this by breaking the task into defined subtasks, addressing each one with focused attention, and assembling the results.

Two Approaches

Single-Prompt Decomposition: You list the subtasks explicitly in your prompt and ask the model to address each one in sequence. Best for tasks that can be addressed in one long context.

I need to [overall goal]. Let's break this into steps:

Step 1: [Subtask 1]
Step 2: [Subtask 2]
Step 3: [Subtask 3 — which builds on Step 2]

Please work through each step in sequence.

Multi-Exchange Decomposition: You address each subtask in a separate exchange, using the output from one step as input to the next. Best for very large tasks or tasks where your review and decisions between steps matter.

"Plan Then Execute"

A particularly powerful variant of structured decomposition is "plan then execute":

  1. In the first prompt, ask the model to create a plan for addressing the task (outline, steps, structure) and do nothing else
  2. Review and revise the plan
  3. In subsequent prompts, execute each section of the approved plan

This approach prevents the model from making large structural decisions you disagree with and then having to backtrack. You approve the skeleton before the flesh goes on.

Full Worked Example: Strategic Analysis Document

Task: Write a 10-page competitive analysis of the enterprise CRM market for a startup's board presentation.

Without Decomposition (single prompt): The model produces a 10-page document that covers everything shallowly. Market size estimates are round numbers. Competitor analyses are generic. Strategic recommendations are safe and obvious. The document looks professional but a knowledgeable reader will find it thin.

With Plan Then Execute:

Prompt 1 (Plan):

"I need to write a 10-page competitive analysis of the enterprise CRM market for a board presentation. Our company is a 30-person startup with a product that focuses on AI-assisted data quality for CRM entries. Before writing any content, create a detailed outline: sections, what each section will cover, and what specific data points or analyses each section should include."

Output (Plan):

Detailed outline with 7 sections, specific competitor names to analyze, specific metrics to include, strategic framework to use, etc.

Prompt 2 (Execute Section 1):

"Great. Now write Section 1: Market Overview. [Paste section 1 spec from outline]. Include current market size, growth rate, and the 3 key trends driving enterprise CRM adoption. Be specific and cite ranges where exact figures vary."

And so on for each section.

The decomposed approach produces a substantially more detailed, accurate, and useful document — and gives you control over each component before moving to the next.


6. Tree-of-Thought: A Brief Overview

Tree-of-Thought (ToT) prompting is an extension of chain-of-thought that explores multiple reasoning paths simultaneously rather than committing to a single chain.

Instead of "think step by step," ToT says: "Generate three different approaches to this problem. For each approach, reason through the first two steps. Evaluate which approach looks most promising, then continue with that one."

This is particularly valuable for: - Open-ended problems where there are genuinely multiple viable approaches - Problems where it is easy to commit to a wrong path early - Creative tasks where one option explored fully is better than one option chosen arbitrarily

In practice, for most business tasks, chain-of-thought produces excellent results without the additional complexity of ToT. Consider ToT when your problem is genuinely multi-path and the early choice of path matters significantly.

Basic ToT Prompt Structure:

"Consider three different approaches to [problem]. For each approach, describe the first two steps of the solution. Then evaluate which approach is most likely to succeed, and why. Proceed with the most promising approach."


7. Combining Techniques

The real power of these techniques comes from combining them strategically. Here are the most productive combinations:

Few-Shot + CoT

Provide examples that include the reasoning steps, not just the answers. This teaches the model both the desired output format and the desired reasoning process.

Example:
Input: [Problem]
Reasoning: [Step 1... Step 2... Step 3...]
Answer: [Answer]

Now solve:
Input: [New Problem]
Reasoning:

Role + Self-Critique

Assign a specific role that naturally includes critical judgment, then use self-critique.

You are a senior editor with 20 years of experience at business publications.

First, write [content].

Then, review your draft as an editor who is known for being direct and
demanding. Identify the three weakest parts of what you just wrote and
explain why they're weak. Then revise those sections.

Decomposition + CoT

Use structured decomposition to break the problem into steps, and CoT within each step.

We're going to solve this in three steps. For each step, show your reasoning.

Step 1: [Task 1]. Think through this carefully.
Step 2: [Task 2] — building on Step 1. Think through this carefully.
Step 3: [Task 3] — using both previous steps. Reason through to a conclusion.

The "Master Prompt" Example

Here is a complete example that combines role + context + few-shot + CoT + self-critique for a high-stakes business analysis task:

You are a strategy consultant with deep expertise in SaaS pricing strategy.

CONTEXT: We are a project management SaaS with 500 customers on a $49/month
flat-rate plan. We want to migrate to a per-seat pricing model. Average
company size among our customers is 12 seats, but 20% of customers have 30+ seats.

Here is how you structure pricing migration analyses:

EXAMPLE ANALYSIS STRUCTURE:
[Short example showing the format and reasoning depth expected]

TASK: Analyze whether we should migrate to per-seat pricing.
Think through: (1) revenue impact by customer segment, (2) churn risk by
segment, (3) total expected revenue change, (4) recommended migration approach.

Reason through each element step by step before stating conclusions.

After completing the analysis, identify the two assumptions in your reasoning
that are most uncertain and explain what additional data would resolve them.

8. Scenario: Alex — Few-Shot for Brand-Consistent Copy

Alex manages marketing at Brightleaf Consumer Goods, a lifestyle brand known for its warm, conversational voice. Every week she needs to create product copy for social media, email subject lines, and product page descriptions. The brand voice is distinctive — casual but not flippant, personal but not unprofessional, optimistic without being saccharine.

For the first three months of using AI for copywriting, Alex described this voice in her prompts: "conversational, warm, authentic, lifestyle-oriented." The results were generic. The AI produced copy that could have come from any consumer brand.

She shifted to few-shot prompting. She collected her five best examples of existing brand copy — pieces that were consistently praised internally and performed well in testing — and built them into a reference section at the top of every copy prompt.

The change was immediate. By showing the AI exactly what the voice looked like in practice, she eliminated the interpretation gap between "conversational and warm" and what that actually meant for Brightleaf specifically.

Her current template:

Write [copy type] for [product] in Brightleaf's brand voice.

BRIGHTLEAF VOICE REFERENCE:
[5 examples of established brand copy]

---
NEW COPY REQUEST:
Product: [Product Name]
Key benefit: [1-2 sentence brief]
Audience: [Target persona]
Format: [Specific format]

Results: The AI-generated copy now consistently passes the brand review stage without major revisions, compared to the 60-70% revision rate she had before building the few-shot reference.

The lesson Alex learned: "Describing a voice never works as well as showing it. Three good examples are worth a page of description."


9. Scenario: Raj — CoT for Debugging Complex Logic

Raj is a senior engineer at a fintech company. He uses AI regularly for code review and debugging, but early on he found that AI-generated debugging advice was often superficial — identifying symptoms rather than causes, suggesting generic solutions that didn't address the actual problem.

The turning point came when he started using explicit chain-of-thought prompting for debugging sessions. His approach:

Here is a bug I'm investigating. Before suggesting any fix, I want you to:

1. Identify what the code is SUPPOSED to do (describe from the code)
2. Identify what it is ACTUALLY doing (what's wrong, as specifically as possible)
3. List every place in the code where a failure could cause this symptom
4. For each potential failure point, explain why it could or could not cause the specific symptom observed
5. Rank the most likely causes
6. THEN suggest a fix

Here is the code and the error:
[Code]
[Error output]

This approach transformed his debugging sessions. By forcing explicit reasoning through the code's behavior before jumping to solutions, the AI consistently identified the actual root cause rather than the surface symptom.

In one memorable case, the bug was a race condition in an asynchronous payment processing function that only manifested under specific load conditions. The initial prompt (without CoT) suggested checking the database connection — a reasonable guess but wrong. The CoT prompt traced through the execution path, identified that two coroutines were both reading and writing the same resource without locking, and pinpointed the exact condition under which the race occurred.

Total debugging time with the CoT approach: 8 minutes. Previous attempts using standard prompts and documentation searches had consumed 2.5 hours.


10. Scenario: Elena — Self-Critique for Quality-Checking Consulting Outputs

Elena is a strategy consultant who uses AI heavily for first-draft deliverables. Her concern is not whether AI can produce professional-sounding content — it can — but whether the content is actually right. Consulting deliverables that are confidently wrong are professionally dangerous.

Her solution is a systematic self-critique protocol she runs on every AI-generated section of a deliverable:

Step 1 (Generate): Full prompt with context, role, task, format requirements.

Step 2 (Factual Audit): "Review your response above. Flag any statement that is a factual claim and rate each claim as: (A) demonstrably correct, (B) plausible but unverified, or (C) potentially incorrect. List all Category B and C claims."

Step 3 (Logic Audit): "Now review the reasoning structure. Identify any logical leaps — places where a conclusion is drawn without sufficient support. List them."

Step 4 (Client-Fit Audit): "This deliverable is for [client type/industry]. Is everything in the response appropriate for that context? Are there any recommendations that wouldn't apply to their situation as described?"

Step 5 (Revise): "Rewrite the response, removing unverified claims, flagging where I'll need to verify data before finalizing, and tightening the logical structure."

This five-step protocol adds 10–15 minutes to her AI workflow but has dramatically reduced the incidence of errors in her final deliverables. More importantly, it creates a natural pipeline for her human review: the model identifies what needs human verification, so she knows exactly where to spend her checking time.

"I used to read everything the AI produced with the same uniform skepticism," she says. "Now the AI does the first pass of telling me where to be skeptical, and I use my expertise for the judgment calls, not the line-by-line reading."


11. Research Breakdown: Foundational Findings

Chain-of-Thought Prompting

The foundational paper — "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., 2022) — demonstrated that providing step-by-step reasoning examples in few-shot prompts dramatically improves performance on reasoning tasks, with larger models benefiting more. The paper showed 3× improvement on math word problems.

"Large Language Models are Zero-Shot Reasoners" (Kojima et al., 2022) showed that simply appending "Let's think step by step" achieves substantial gains without any examples — the zero-shot CoT finding.

Few-Shot Learning

"Language Models are Few-Shot Learners" (Brown et al., 2020) — the GPT-3 paper — established that very large language models can perform new tasks from just a few examples in the context window, without any fine-tuning. This formalized the few-shot prompting paradigm.

Self-Critique

"Constitutional AI" (Bai et al., 2022, Anthropic) introduced the idea of using explicit principles for AI self-evaluation and correction. While this paper primarily addresses training methodology, its principles apply directly to prompting-time self-critique techniques.

Practical Synthesis

Research across these papers supports the following practical conclusions: - CoT has near-universal benefit for tasks requiring more than one inferential step - Few-shot examples improve consistency and quality more reliably than equivalent word count spent on description - Self-critique at inference time catches a meaningful percentage of errors that generation alone misses - Combining techniques (especially CoT + few-shot) produces better results than either technique alone


12. Technique Selection Guide

Situation Best Technique Why
Multi-step math or logic problem Chain-of-Thought Forces explicit reasoning
Matching a specific style or format Few-Shot Shows rather than describes
Classifying items consistently Few-Shot Examples define category boundaries
High-stakes output needing QA Self-Critique Catches errors before human review
Large or complex project Structured Decomposition Prevents scope overload
Multiple viable approaches Tree-of-Thought Explores paths before committing
Style + accuracy required Few-Shot + CoT Addresses both dimensions
Diagnosis or debugging CoT + Decomposition Systematic reasoning through causes
Iterative improvement needed Self-Critique loop Multiple passes on same output
Recurring task Build few-shot template Reuse examples, reduce setup time

13. Content Blocks

💡 Intuition: Why CoT Works When a model generates text, each word is influenced by all preceding words. When you ask for the answer directly, the path from prompt to answer is short — and the model can "cut corners" by pattern-matching rather than reasoning. CoT extends the path. By generating intermediate reasoning steps, the model is forced to commit to intermediate conclusions, which constrain and improve the final conclusion. It's the difference between calculating in your head and showing your work on paper.


⚠️ Common Pitfall: The Fake Reasoning Trace CoT prompting can produce outputs that look like reasoning but aren't. The model writes "Step 1... Step 2..." while still pattern-matching to plausible responses at each step rather than actually reasoning. How to spot this: the steps are vague, don't build on each other specifically, or skip over exactly the step where the hard reasoning would happen. The fix: require explicit quantitative results at each step ("calculate X, then use that result in the next step") or require the model to explain why it's choosing each step.


✅ Best Practice: The Three-Example Rule Before spending time building a large few-shot library, test with three examples. In most cases, three well-chosen, high-quality examples produce 80-90% of the benefit you'd get from eight. Start with three. Add more only if quality is still insufficient. Choose examples that represent the range of your task, not just the easy cases.


⚠️ Common Pitfall: Self-Critique as Sycophancy Without explicit criteria, models sometimes produce self-critiques that are too positive — identifying trivial issues while missing real ones, because the model's default is to be helpful and validating rather than genuinely critical. Fix this by specifying the criteria explicitly ("evaluate against these 4 specific standards") and by asking the model to find weaknesses even if the output seems good ("assume there is at least one significant weakness — find it").


📋 Action Checklist: Implementing These Techniques - [ ] Identify your 3 most complex recurring tasks — candidates for CoT - [ ] Collect 3-5 examples of your best existing work for each style-dependent task — your few-shot library - [ ] Create a self-critique criteria list for your highest-stakes output type - [ ] Choose one large project task and practice plan-then-execute decomposition - [ ] Try combining role + CoT on your next analytical task - [ ] Document what works — build your technique-to-task matching intuition


🗣️ Script/Template: Chain-of-Thought ``` [Your task description]

Before giving your answer, think through this step by step: 1. [What needs to be established first] 2. [What follows from that] 3. [How those elements connect to the conclusion]

Show your reasoning for each step, then state your final answer. ```


🗣️ Script/Template: Few-Shot ``` [Task description and any relevant criteria]


[Input label]: Example 1 input: [Example 1 output]


[Input label]: Example 2 input: [Example 2 output]


[Input label]: Example 3 input: [Example 3 output]


[Input label]: [Your actual input]


🗣️ Script/Template: Self-Critique Loop ``` [Original task prompt]


Now review your response against these criteria: 1. [Criterion 1] 2. [Criterion 2] 3. [Criterion 3]

For each criterion, note whether your response meets it. Identify the 2-3 most significant improvements needed.

Then rewrite the response to address those improvements. ```


📊 Research Breakdown: Key Numbers - CoT prompting improves math word problem accuracy from ~18% to ~57% on benchmark models (Wei et al., 2022) - Zero-shot CoT ("Let's think step by step") produces roughly 60% of the benefit of full few-shot CoT examples, with zero additional examples required - 2-6 examples is the optimal few-shot range for most tasks; performance plateaus beyond 8 examples for large models - Self-critique at inference time catches approximately 20-40% of errors that first-draft generation misses (varies significantly by task type and critique criteria specificity)


Summary

Advanced prompting techniques are not exotic add-ons to basic prompting — they are principled responses to specific failure modes of direct instruction. Chain-of-thought addresses reasoning failures. Few-shot addresses style and format failures. Self-critique addresses quality-control failures. Structured decomposition addresses scope overload.

Each technique works because it redirects the model's generation process in a specific way: making reasoning explicit, showing rather than describing, adding a quality-checking pass, or breaking complexity into manageable pieces.

The skill is knowing which technique to apply when — and how to combine them for tasks that need more than one kind of support. The technique selection guide in this chapter gives you a starting framework; your own experience with your specific task types will refine it over time.

Chapter 11 extends this skill set in a specific direction: building these techniques into reusable patterns that you can apply systematically, rather than having to reconstruct them from scratch each time.