Chapter 26 Quiz: A/B Testing Content and Offer Strategy


Instructions: Choose the best answer for each question. Answer key follows question 10.


1. The single most important rule of A/B testing — the one that is most commonly violated — is:

A) Always run tests for at least 14 days B) Change only ONE variable between Version A and Version B C) Use a p-value threshold of 0.05 or lower D) Test the most important variable first


2. In the context of A/B testing, a p-value of 0.05 means:

A) There is a 95% probability that Version B is actually better than Version A B) There is a 5% probability that the observed difference could have occurred by random chance if there were no real difference C) Version B performed 5% better than Version A D) The test has 95% statistical power


3. Maya tested two prices for her sustainability guide: $17 and $27. The $17 price converted at 4.2% and the $27 price converted at 3.1%. Which price generated MORE revenue per visitor?

A) $17, because it converted at a higher rate B) $27, because revenue per visitor = price × conversion rate = $0.84 vs. $0.71 C) They are equal because the difference is within margin of error D) $17, because conversion rate is always the most important metric


4. Which of the following describes "peeking" in the context of A/B testing?

A) Using your email platform's built-in analytics to monitor test progress B) Looking at intermediate test results and stopping the test early when Version B appears to be winning C) Running a test without notifying your audience that they are in an experiment D) Checking whether a p-value is below 0.05 before calculating statistical power


5. The Meridian Collective changed the description of their Discord from "Community Discord" to "Private Coaching Discord" and saw a 28% improvement in conversion. This is an example of testing:

A) Product pricing B) Technical copy — removing one of the product's features C) Framing/copy — same product, different description language D) Bundle composition — adding a new feature to the offer


6. For creators with small audiences (under 10,000 followers), the chapter recommends which approach when standard A/B testing is not statistically feasible?

A) Abandon testing entirely until audience size grows B) Use sequential testing (before/after) and/or qualitative research with community members C) Test multiple variables simultaneously to increase learning per experiment D) Use industry-wide best practices as a substitute for individual testing


7. Marcus Webb discovered that email subject lines with specific numbers ("Save $1,247") consistently outperformed vague subject lines ("Save money on taxes"). What kind of error would Marcus be making if he applied this finding indiscriminately to ALL email contexts?

A) A type I error — incorrectly rejecting the null hypothesis B) A type II error — failing to detect a real effect C) Overgeneralizing — applying a context-specific finding beyond its valid scope D) Confirmation bias — only seeking evidence that confirms a prior belief


8. You want to test whether a "Buy Now" CTA button or an "Enroll Today" CTA button produces more landing page conversions. Your landing page currently converts at 3.0%, and you want to detect improvements of 0.5 percentage points or more. What best describes why you would calculate required sample size BEFORE starting the test?

A) To determine which version is more likely to succeed and allocate more traffic to it B) To know how long to run the test so you have enough data to make valid statistical conclusions C) To comply with A/B testing legal requirements for digital commerce D) To pre-register the hypothesis with an academic institution for scientific validity


9. A chi-square test is most appropriate for which of the following A/B testing scenarios?

A) Comparing the average revenue generated by two different email campaigns B) Comparing two video completion rates measured as continuous time values C) Comparing the number of people who clicked (vs. did not click) across two email subject line variants D) Comparing three or more pricing tiers simultaneously


10. The chapter recommends that creators maintain an "iteration log" for A/B testing. The PRIMARY long-term value of this log is:

A) Legal protection in case a brand sponsor challenges your content performance claims B) Building institutional knowledge about your specific audience that compounds over time into audience-specific principles C) Providing documentation required by email platform terms of service D) Showing potential brand partners that your content is data-driven


Answer Key

Question Answer Explanation
1 B Changing only one variable is the fundamental requirement for valid causal inference. Changing multiple variables at once makes it impossible to know which change caused any observed difference.
2 B A p-value represents the probability of observing the data you saw (or more extreme data) if the null hypothesis (no real difference) were true. It is NOT the probability that your hypothesis is correct.
3 B Revenue per visitor = price × conversion rate. $27 × 0.031 = $0.837. $17 × 0.042 = $0.714. The $27 price generates 17% more revenue per visitor despite lower conversion.
4 B "Peeking" is stopping a test early when results look favorable. This inflates false positive rates because p-values fluctuate throughout a test and early significant results frequently regress as more data accumulates.
5 C The Discord product did not change — only the language used to describe it changed. This is a framing or copy test, demonstrating that how you describe an offer can matter as much as the offer itself.
6 B Sequential testing and qualitative research are practical alternatives for small creators who cannot accumulate the sample sizes required for simultaneous A/B tests. Directional data from sequential tests is better than no data.
7 C Overgeneralization applies a valid but context-specific finding beyond the conditions in which it was demonstrated. Marcus's finding about numbers in subject lines is specific to his personal finance audience and may not transfer to entertainment or lifestyle content.
8 B Sample size calculation determines how much data you need to draw valid statistical conclusions. Running a test without this calculation risks either stopping too early (underpowered) or running unnecessarily long (wasted resources).
9 C The chi-square test is used for categorical count data — how many people in each group took a categorical action (clicked vs. did not click, opened vs. did not open). It compares observed counts to expected counts under the null hypothesis.
10 B The compounding value of an iteration log is the audience-specific principles that emerge over many tests. After 50+ tests, you develop a nuanced understanding of how YOUR specific audience responds that no general best-practice guide can provide.