Chapter 18 Exercises: Image Generation — Midjourney, DALL·E, and Stable Diffusion

These exercises build practical skill with AI image generation through structured experimentation. Most exercises can be completed with any of the three major platforms, though some are platform-specific. Exercises marked with a platform indicator are designed for that specific tool but can often be adapted.

Exercise 1: The Prompt Progression Test

Goal: Experience how prompt specificity affects output quality.

Generate the same scene using three levels of prompt specificity. Do this on whichever platform you have access to:

Level 1 (vague): a person working at a computer

Level 2 (moderate): a software developer working at a standing desk, modern home office, evening

Level 3 (specific): a female software developer in her 30s at a height-adjustable standing desk, modern home office with bookshelves, large monitor showing code, warm desk lamp light, early evening, candid documentary style, shallow depth of field --ar 16:9

Generate each prompt. Compare the results. What specific information in Level 3 made the most difference in output quality? Which elements do you think contributed least?

Exercise 2: Style Reference Exploration [Midjourney]

Goal: Build vocabulary for describing visual styles.

Choose five visually distinct photographic or artistic styles. For each, write a Midjourney prompt that uses specific style vocabulary rather than just naming the style:

Examples of styles to describe without naming them directly: - Film noir photography (without using "film noir") - Impressionist painting (without using "Impressionist") - 1970s editorial magazine photography - Brutalist architectural photography - Japanese woodblock print aesthetic

For each attempt, run the prompt. If the result does not match your intended style, use /describe on a reference image to find better vocabulary. Iterate until you can reliably invoke the style through description.

Exercise 3: The /describe Reverse Engineering Exercise [Midjourney]

Goal: Learn to extract prompt vocabulary from reference images using /describe.

Find three images online that represent visual styles you like (landscapes, portraits, architectural, abstract — your choice). Upload each to Midjourney using /describe.

For each: 1. Read all four prompt suggestions Midjourney generates 2. Note which vocabulary terms appear repeatedly across suggestions (these are the most important signals) 3. Try generating with the most descriptive of the four suggested prompts 4. Compare the result to the original image

What vocabulary terms did you not know before? How accurately does Midjourney's /describe capture the original image's style?

Exercise 4: Aspect Ratio Impact Study

Goal: Understand how aspect ratio affects composition.

Take one prompt and generate it across four different aspect ratios: - --ar 1:1 (square) - --ar 16:9 (landscape) - --ar 2:3 (portrait) - --ar 9:16 (vertical, mobile)

Use this prompt or a similar one: mountain landscape at sunrise, golden morning light, dramatic clouds, landscape photography

Analyze: How does the composition change across ratios? Which ratio best suits the subject? What different use cases would each ratio serve?

Exercise 5: Chaos Parameter Exploration [Midjourney]

Goal: Understand and control variation in image generation.

Use the same prompt with three different chaos values:

abstract concept of growth and transformation, nature metaphor, professional editorial style --ar 16:9

Add --chaos 0, then --chaos 50, then --chaos 90 (or use 5, 50, 90 to get clearer contrast).

For each chaos level: - How similar are the four generated images to each other? - How well do all four serve the intended use case?

When would you use high chaos? Low chaos? Develop a personal rule for when to use which.

Exercise 6: Natural Language vs. Technical Prompt Comparison [DALL·E 3]

Goal: Understand DALL·E 3's natural language strength compared to technical prompting.

In ChatGPT, generate the same concept twice:

First: Use a direct, natural language description as you would explain it to a colleague: "I need an image for a presentation about how AI is changing healthcare — something that shows technology and human care working together, not cold or dystopian."

Second: Ask ChatGPT first: "Before generating an image, write me a technical, detailed image generation prompt for [same concept]. Show me the prompt, then generate from it."

Compare both the prompts ChatGPT produced (visible in the second approach) and the resulting images. What does seeing ChatGPT's prompt translation teach you about how it interprets natural language requests?

Goal: Practice the conversational refinement workflow in ChatGPT.

Start with a simple image request and go through at least six iterations of refinement, treating each as a conversation:

Starting request: "Generate an image for a book cover about productivity and focus"

After each generation, refine based on what is wrong or could be better: - "The colors are too dark — make it lighter and more energetic" - "Remove the person and make it more abstract" - "Make it feel more like a premium business book, less self-help" - And so on based on what you actually see

Document each iteration's prompt and result. After six rounds, evaluate: how close did you get to an image you would actually use? What did you learn about what you actually wanted?

Exercise 8: Hands and Common Failures Audit

Goal: Understand and learn to mitigate AI image failure modes.

Generate 10 images of people using this prompt (or similar): diverse group of people collaborating at a modern office meeting table, candid professional photography --ar 16:9

For each image, check: - Hand and finger anatomy (correct finger count? natural positions?) - Background coherence (does the background dissolve or become incoherent?) - Face consistency (do all faces look natural?) - Lighting logic (is the lighting direction consistent across the image?) - Text in the frame (any text visible? Is it legible?)

Count the frequency of each failure type. Now try with --no hands (Midjourney) or negative prompts focused on deformed hands. Does it reduce the hand failure rate?

Exercise 9: Professional Use Case Simulation — Mood Board [Midjourney]

Goal: Complete a realistic professional image generation task end to end.

You are a marketer pitching a new coffee brand to a client. The brief: "artisan quality, sustainable sourcing, modern but warm aesthetic, 25-35 demographic, premium but not pretentious."

Generate a mood board of 8-10 images that capture the visual direction. You can generate more and curate down. Include at least: - 2-3 product/lifestyle images - 2-3 environmental/atmospheric images - 1-2 people/moment images - 1-2 abstract/texture/color images

Write a brief (3-5 sentences) explaining your visual choices to the client. How accurately do the images capture the brief?

Exercise 10: Custom Presentation Graphic [DALL·E 3]

Goal: Create a genuinely usable presentation graphic for a real business concept.

Choose a concept you would actually need to illustrate in a presentation (if no current project, use: "the balance between speed and quality in creative work").

Create three different approaches to illustrating the concept: 1. A photographic/realistic approach 2. A conceptual illustration approach 3. An abstract metaphorical approach

Use ChatGPT's iterative refinement to develop each approach over 2-3 rounds. Evaluate which approach you would actually use in a professional presentation and why.

Exercise 11: Stable Diffusion First Run [Stable Diffusion]

Goal: Successfully run your first Stable Diffusion generation on a cloud service.

If you do not have local Stable Diffusion setup, use one of these cloud options: - DreamStudio (stability.ai) — free credits on signup - Replicate.com — pay per generation, no setup required - Civitai.com/generate — free tier available

Generate the same prompt you used in Exercise 1 (your Level 3 prompt). Experiment with: - Changing the CFG Scale (classifier-free guidance) — how does it affect adherence to your prompt? - Changing the number of steps — how does quality change with 20 vs. 30 vs. 50 steps? - Trying a different checkpoint/model if available

What differences do you notice compared to Midjourney or DALL·E 3 for the same prompt?

Exercise 12: Negative Prompt Mastery

Goal: Learn to use negative prompts effectively.

Generate this prompt without any negative prompts: elegant dinner party scene, sophisticated adults socializing, modern upscale restaurant, candid photography

Identify the issues in the output. Then develop negative prompts to address them. Common things to add to negative prompts: deformed hands, extra fingers, bad anatomy, blurry, low quality, ugly, distorted faces, oversaturated, watermark, text

Generate with your negative prompt additions. How much did the negative prompt improve the output? What remained problematic?

Exercise 13: Copyright Considerations Review

Goal: Develop practical literacy about image rights for professional use.

For each of the following use cases, research the applicable policies and answer: Is this permitted? What restrictions apply?

Using a Midjourney-generated image (generated on a paid account) in a printed marketing brochure for a client
Using a ChatGPT/DALL·E-generated image as the cover of a self-published e-book sold commercially
Using a Stable Diffusion-generated image on a company website
Generating an image "in the style of [living photographer]" and using it commercially
Using an AI-generated image in a news article

Sources to consult: Midjourney's terms of service, OpenAI's usage policies, your country's copyright law guidance on AI-generated works. Note the date of the terms you review — they change.

Exercise 14: Brand Consistency Challenge

Goal: Understand the limits of AI image consistency across a series.

Create a brief fictional brand (name, color palette, target audience, brand feeling). Then try to generate a series of 6 images that feel like they belong to the same brand campaign.

Techniques to try for consistency: - Using identical style descriptors across all prompts - Using Midjourney's --sref with your best image as style reference for subsequent generations - Using a consistent --seed value

How consistent does the final series feel? What elements vary most? What would you need to do in post-production (traditional design tools) to unify the series further?

Exercise 15: Text-in-Image Comparison

Goal: Understand which platform handles text in images best.

Generate the same prompt on both Midjourney and DALL·E 3:

Vintage-style poster for a jazz concert, art deco design, featuring the text "MIDNIGHT JAZZ FESTIVAL" in large letters, warm golden palette, elegant typography

Compare: How accurately is the text rendered? How legible is it? How does the overall design quality compare?

Now try in DALL·E 3: a simple image with shorter text: Professional business card mockup, clean minimal design, text reading "Sarah Chen, Strategy Director"

Is the text correct? If not, try refining. Note how many iterations it takes to get correct, readable text.

Exercise 16: The 90-Minute Professional Deliverable

Goal: Complete a professional image generation task within a realistic time constraint.

Set a timer for 90 minutes. Your task: create a set of images for a hypothetical situation of your choice from this list:

A) 8-image social media content set for a one-week campaign on any topic B) 5-image illustration set for a business presentation on any topic you choose C) A 10-image mood board representing a visual direction for any brand or creative project

At the end of 90 minutes, evaluate: - What quality level did you achieve? - What would require additional post-processing in traditional tools? - What worked smoothly? What frustrated you? - How does this compare to how long the same deliverable would have taken without AI tools?

Reflection Questions

After completing several exercises:

Which platform felt most natural to your working style? Why?
What surprised you most about what AI image generation can and cannot do?
What failure modes did you encounter most frequently? How did you learn to mitigate them?
For your specific professional role, which use cases offer the clearest value? Which are overhyped for your needs?
How did the copyright and rights question affect your thinking about using AI-generated images professionally?