Chapter 23: Association Rules and Market Basket Analysis

DataField.Dev

12 min read

> Core Principle --- Association rule mining does not predict anything. It counts co-occurrences in transaction data and surfaces patterns that occur more often than chance alone would predict. That is all it does. And that is enough to generate...

In This Chapter

Finding Patterns in Transaction Data
The Unglamorous Workhorse of Retail Analytics
Part 1: Transaction Data and the Itemset Framework
Part 2: The Apriori Algorithm
Part 3: FP-Growth --- The Faster Alternative
Part 4: Beyond Lift --- Advanced Interestingness Metrics
Part 5: Handling Real Transaction Data at Scale
Part 6: From Rules to Business Actions
Part 7: Common Pitfalls and Misinterpretations
Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 23: Association Rules and Market Basket Analysis

Finding Patterns in Transaction Data

Learning Objectives

By the end of this chapter, you will be able to:

Compute support, confidence, and lift for itemsets
Apply the Apriori algorithm for frequent itemset mining
Use FP-Growth for efficient rule extraction on large datasets
Filter and interpret rules using lift, conviction, and Zhang's metric
Apply market basket analysis to product recommendations and cross-selling

The Unglamorous Workhorse of Retail Analytics

Core Principle --- Association rule mining does not predict anything. It counts co-occurrences in transaction data and surfaces patterns that occur more often than chance alone would predict. That is all it does. And that is enough to generate millions of dollars in cross-sell revenue for any retailer with decent transaction volume.

You have probably heard the beer-and-diapers story. The legend goes that a major retailer analyzed transaction data and discovered that men who bought diapers on Friday evenings also bought beer. The store moved the beer display next to the diapers and sales went up. The story is almost certainly apocryphal --- it has been attributed to Walmart, Osco Drug, and half a dozen other retailers, and nobody has produced the original analysis. But the story persists because the idea is sound: if you look at millions of transactions, you will find purchase patterns that no human merchandiser would have guessed. Some of those patterns are actionable.

Association rules formalize this intuition. Given a dataset of transactions (shopping baskets, streaming history, web clickstreams, insurance claim bundles), the algorithm finds rules of the form:

If a customer buys {bread, butter}, they also buy {milk} with confidence 68% and this combination occurs 3.2x more often than chance.

The antecedent is {bread, butter}. The consequent is {milk}. The metrics --- support, confidence, and lift --- tell you whether the pattern is frequent enough to matter, strong enough to act on, and genuinely more common than you would expect from the base rates of each item.

This chapter covers the two standard algorithms (Apriori and FP-Growth), the metrics you use to filter thousands of candidate rules down to the dozens that are worth acting on, and two case studies: ShopSmart's market basket analysis for product recommendations, and StreamFlow's analysis of viewing patterns that predict subscriber retention.

Part 1: Transaction Data and the Itemset Framework

What Transaction Data Looks Like

Transaction data is a list of baskets, where each basket is a set of items purchased (or watched, clicked, or claimed) in a single event. The data is binary at the transaction level: item X is either in the basket or it is not. Quantities do not matter for standard association rules.

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, fpgrowth, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Raw transaction data: a list of lists
transactions = [
    ['bread', 'butter', 'milk', 'eggs'],
    ['bread', 'butter', 'milk'],
    ['bread', 'butter'],
    ['bread', 'milk', 'eggs'],
    ['butter', 'milk', 'eggs'],
    ['bread', 'butter', 'milk', 'eggs', 'cheese'],
    ['milk', 'eggs'],
    ['bread', 'cheese'],
    ['bread', 'butter', 'milk'],
    ['butter', 'milk', 'eggs', 'cheese'],
]

# Convert to one-hot encoded DataFrame (mlxtend's required format)
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
basket_df = pd.DataFrame(te_array, columns=te.columns_)

print(basket_df.astype(int))
print(f"\n{len(transactions)} transactions, {len(te.columns_)} unique items")

The one-hot encoded DataFrame has one row per transaction and one column per item. Each cell is True/False (or 1/0). This is the input format for mlxtend's Apriori and FP-Growth implementations.

Practical Note --- Real transaction data rarely arrives in this format. You will typically have a long-form table with columns like transaction_id and product_name. The groupby-pivot pattern is the standard conversion:

# From long-form to basket format
# Assume: df_long has columns ['transaction_id', 'product_name']
# basket = df_long.groupby(['transaction_id', 'product_name']).size()
#     .unstack(fill_value=0).astype(bool)

Support, Confidence, and Lift --- The Three Core Metrics

Every association rule has an antecedent (the "if" part) and a consequent (the "then" part). The three core metrics evaluate different aspects of the rule.

Support measures how frequently the itemset appears in the dataset:

$$\text{support}(X) = \frac{\text{count of transactions containing } X}{\text{total transactions}}$$

Support for a rule {X} -> {Y} is the support of the union {X, Y}:

$$\text{support}(X \Rightarrow Y) = \frac{\text{count of transactions containing both } X \text{ and } Y}{\text{total transactions}}$$

Confidence measures how often the rule is correct when the antecedent is present:

$$\text{confidence}(X \Rightarrow Y) = \frac{\text{support}(X \cup Y)}{\text{support}(X)}$$

Lift measures how much more likely the consequent is when the antecedent is present, compared to its baseline frequency:

$$\text{lift}(X \Rightarrow Y) = \frac{\text{confidence}(X \Rightarrow Y)}{\text{support}(Y)} = \frac{\text{support}(X \cup Y)}{\text{support}(X) \times \text{support}(Y)}$$

Let us compute these by hand for our small example, then verify with mlxtend:

n = len(transactions)

# Count occurrences
bread_count = basket_df['bread'].sum()
butter_count = basket_df['butter'].sum()
milk_count = basket_df['milk'].sum()
bread_and_butter = (basket_df['bread'] & basket_df['butter']).sum()
bread_and_butter_and_milk = (
    basket_df['bread'] & basket_df['butter'] & basket_df['milk']
).sum()

print("=== Manual Calculation: {bread, butter} -> {milk} ===")
print(f"Transactions: {n}")
print(f"bread count: {bread_count}, support: {bread_count/n:.2f}")
print(f"butter count: {butter_count}, support: {butter_count/n:.2f}")
print(f"milk count: {milk_count}, support: {milk_count/n:.2f}")
print(f"bread AND butter count: {bread_and_butter}, support: {bread_and_butter/n:.2f}")
print(f"bread AND butter AND milk count: {bread_and_butter_and_milk}, "
      f"support: {bread_and_butter_and_milk/n:.2f}")

support_rule = bread_and_butter_and_milk / n
confidence_rule = bread_and_butter_and_milk / bread_and_butter
lift_rule = confidence_rule / (milk_count / n)

print(f"\nRule: {{bread, butter}} -> {{milk}}")
print(f"Support:    {support_rule:.2f}")
print(f"Confidence: {confidence_rule:.2f}")
print(f"Lift:       {lift_rule:.2f}")

Interpreting Lift

Lift is the single most important metric for filtering rules.

Lift Value	Interpretation
lift = 1	X and Y are independent --- knowing X tells you nothing about Y
lift > 1	X and Y co-occur more than expected --- positive association
lift < 1	X and Y co-occur less than expected --- negative association (substitutes)

Key Filter --- In practice, lift > 1 is the minimum bar. Most practitioners filter to lift > 1.2 or higher depending on the dataset. A rule with high confidence but lift near 1.0 is misleading: the consequent is just popular, and the antecedent is not really driving the co-occurrence. Lift corrects for this base rate problem.

Here is an example that shows why confidence alone is insufficient:

# Suppose milk appears in 80% of all transactions.
# A rule {bread} -> {milk} with confidence 82% looks strong.
# But lift = 0.82 / 0.80 = 1.025 --- barely above chance.
# Bread does not meaningfully predict milk; milk is just everywhere.

# Contrast: a rule {artisan_cheese} -> {wine} with confidence 45%
# and wine support of 10%. Lift = 0.45 / 0.10 = 4.5.
# Artisan cheese genuinely predicts wine purchases, even though
# the confidence is lower.

Part 2: The Apriori Algorithm

The Combinatorial Problem

With 1,000 unique items, there are $2^{1000} - 1$ possible non-empty itemsets. You cannot evaluate all of them. The Apriori algorithm exploits a simple property to prune the search space:

Apriori Principle --- If an itemset is infrequent (below min_support), then all of its supersets are also infrequent.

If {bread, wine} has support below your threshold, then {bread, wine, cheese}, {bread, wine, eggs}, and every other superset also has support below the threshold. You can prune the entire branch.

The algorithm works bottom-up:

Find all 1-item frequent itemsets (support >= min_support).
Generate candidate 2-item itemsets from pairs of frequent 1-item sets.
Scan the dataset to count support for each candidate. Keep those above min_support.
Generate candidate 3-item itemsets from pairs of frequent 2-item sets.
Repeat until no more frequent itemsets are found.

# Apriori with mlxtend
frequent_itemsets = apriori(
    basket_df,
    min_support=0.3,   # item(set) must appear in at least 30% of transactions
    use_colnames=True
)

print(f"Frequent itemsets (min_support=0.3): {len(frequent_itemsets)}")
print(frequent_itemsets.sort_values('support', ascending=False).to_string(index=False))

Generating Rules from Frequent Itemsets

Frequent itemsets are the intermediate result. The business cares about rules --- directional statements of the form "if X then Y." From each frequent itemset, we generate all possible rules and filter by confidence and lift:

rules = association_rules(
    frequent_itemsets,
    metric="lift",
    min_threshold=1.0    # only rules where lift > 1 (positive association)
)

# Select key columns and sort by lift
rules_display = rules[[
    'antecedents', 'consequents', 'support',
    'confidence', 'lift'
]].sort_values('lift', ascending=False)

print(rules_display.to_string(index=False))

Practical Note --- The min_support threshold is the most impactful parameter. Set it too high and you miss interesting niche patterns. Set it too low and the algorithm generates millions of itemsets, most of which are noise. Start with 0.01 (1% of transactions) for large retail datasets and adjust based on the number of rules generated. A useful heuristic: if you get more than 10,000 rules, raise min_support; if you get fewer than 50, lower it.

Apriori's Bottleneck: Candidate Generation

Apriori's weakness is candidate generation. At each level k, it generates candidate (k+1)-item sets and makes a full pass over the dataset to count their support. For datasets with thousands of items and millions of transactions, this is slow. Each level requires:

Generating candidates (combinatorial)
Scanning the full dataset to count support for each candidate

On a dataset with 50,000 items and 10 million transactions, Apriori can take hours. This is where FP-Growth comes in.

Part 3: FP-Growth --- The Faster Alternative

Why FP-Growth Exists

FP-Growth (Han, Pei, and Yin, 2000) eliminates candidate generation entirely. Instead of repeatedly scanning the dataset, it compresses the transaction database into a compact data structure called the FP-tree (Frequent Pattern tree) and mines frequent itemsets directly from the tree.

The speedup is significant:

Metric	Apriori	FP-Growth
Database scans	k scans (one per itemset length)	2 scans (build tree + mine)
Memory	Low (streams candidates)	Higher (stores FP-tree)
Speed on large data	Slow (candidate generation)	Fast (no candidates)
Implementation in mlxtend	`apriori()`	`fpgrowth()`

# FP-Growth with mlxtend --- same API as Apriori
frequent_fp = fpgrowth(
    basket_df,
    min_support=0.3,
    use_colnames=True
)

# Results should be identical to Apriori (same frequent itemsets)
print(f"FP-Growth frequent itemsets: {len(frequent_fp)}")
print(frequent_fp.sort_values('support', ascending=False).to_string(index=False))

When to Use Which

Small datasets (< 10,000 transactions, < 500 items): Either algorithm works. Apriori is fine.
Medium datasets (10,000-1,000,000 transactions): FP-Growth is 2-10x faster.
Large datasets (> 1,000,000 transactions): FP-Growth is the only practical choice with mlxtend. For truly massive datasets (billions of transactions), you will need distributed implementations (Spark MLlib's FPGrowth).

# Timing comparison on a larger simulated dataset
import time

np.random.seed(42)
n_transactions = 5000
n_items = 50
item_names = [f'item_{i:03d}' for i in range(n_items)]

# Simulate sparse transaction data (average 5 items per transaction)
large_transactions = []
for _ in range(n_transactions):
    n_in_basket = np.random.poisson(5)
    basket = list(np.random.choice(item_names, size=min(n_in_basket, n_items), replace=False))
    large_transactions.append(basket)

te_large = TransactionEncoder()
large_array = te_large.fit(large_transactions).transform(large_transactions)
large_df = pd.DataFrame(large_array, columns=te_large.columns_)

# Time Apriori
start = time.time()
freq_apriori = apriori(large_df, min_support=0.05, use_colnames=True)
apriori_time = time.time() - start

# Time FP-Growth
start = time.time()
freq_fpgrowth = fpgrowth(large_df, min_support=0.05, use_colnames=True)
fpgrowth_time = time.time() - start

print(f"Apriori:   {len(freq_apriori):>5} itemsets in {apriori_time:.3f}s")
print(f"FP-Growth: {len(freq_fpgrowth):>5} itemsets in {fpgrowth_time:.3f}s")
print(f"Speedup:   {apriori_time / fpgrowth_time:.1f}x")

Part 4: Beyond Lift --- Advanced Interestingness Metrics

The Problem with Confidence and Lift

Confidence and lift are necessary but not sufficient. Consider these pathological cases:

High confidence, low lift: {anything} -> {milk} when milk is in 90% of baskets. Confidence is high because milk is everywhere.
High lift, low support: {truffle_oil} -> {saffron} with lift 12.0 but support 0.001%. The pattern exists in 3 transactions out of 300,000. Not actionable.
Lift is symmetric: lift(X -> Y) = lift(Y -> X). But the business action is directional: putting wine next to cheese is different from putting cheese next to wine (if the cheese aisle gets more foot traffic).

Conviction

Conviction measures the degree to which X and Y are associated directionally:

$$\text{conviction}(X \Rightarrow Y) = \frac{1 - \text{support}(Y)}{1 - \text{confidence}(X \Rightarrow Y)}$$

Conviction ranges from 0 to infinity: - conviction = 1 means X and Y are independent. - conviction > 1 means the rule predicts correctly more than chance. - conviction = infinity means the consequent always appears when the antecedent does (confidence = 1).

Unlike lift, conviction is asymmetric: conviction(X -> Y) != conviction(Y -> X). This matches the directional nature of business decisions.

Zhang's Metric

Zhang's metric (Zhang, 2000) measures the interestingness of a rule while handling both positive and negative associations:

$$\text{Zhang}(X \Rightarrow Y) = \frac{\text{confidence}(X \Rightarrow Y) - \text{support}(Y)}{\max(\text{confidence}(X \Rightarrow Y)(1 - \text{support}(Y)),\ \text{support}(Y)(1 - \text{confidence}(X \Rightarrow Y)))}$$

Zhang's metric ranges from -1 to +1: - +1: perfect positive association - 0: independence - -1: perfect negative association

It is more robust than lift to base-rate effects and handles rare items better.

# Generate rules with additional metrics
rules_all = association_rules(
    frequent_itemsets,
    metric="lift",
    min_threshold=1.0
)

# mlxtend computes conviction and Zhang's metric automatically (v0.21+)
# If your version does not include them, compute manually:
if 'zhangs_metric' not in rules_all.columns:
    # Manual Zhang's metric
    conf = rules_all['confidence']
    sup_y = rules_all['consequent support']
    zhang_num = conf - sup_y
    zhang_denom = np.maximum(
        conf * (1 - sup_y),
        sup_y * (1 - conf)
    )
    rules_all['zhangs_metric'] = zhang_num / zhang_denom

print(rules_all[[
    'antecedents', 'consequents', 'support',
    'confidence', 'lift', 'conviction', 'zhangs_metric'
]].sort_values('lift', ascending=False).head(15).to_string(index=False))

A Practical Filtering Pipeline

In production, you layer multiple filters:

def filter_rules(rules_df, min_support=0.01, min_confidence=0.3,
                 min_lift=1.2, max_rules=100):
    """
    Production rule filtering pipeline.

    Parameters
    ----------
    rules_df : pd.DataFrame from association_rules()
    min_support : float, minimum rule support
    min_confidence : float, minimum confidence
    min_lift : float, minimum lift (the key filter)
    max_rules : int, maximum rules to return

    Returns
    -------
    pd.DataFrame, filtered and sorted rules
    """
    filtered = rules_df[
        (rules_df['support'] >= min_support) &
        (rules_df['confidence'] >= min_confidence) &
        (rules_df['lift'] >= min_lift)
    ].copy()

    # Sort by lift descending, then confidence descending as tiebreaker
    filtered = filtered.sort_values(
        ['lift', 'confidence'], ascending=[False, False]
    ).head(max_rules)

    return filtered[['antecedents', 'consequents', 'support',
                      'confidence', 'lift', 'conviction']].reset_index(drop=True)


filtered = filter_rules(rules_all, min_support=0.2, min_confidence=0.4, min_lift=1.0)
print(filtered.to_string(index=False))

Practical Rule --- Start with lift as the primary filter. A rule with lift < 1.0 is never actionable (it shows negative association). Then apply support and confidence thresholds to ensure the pattern is frequent enough and reliable enough to justify a business action. Conviction and Zhang's metric are useful for ranking among rules that pass the primary filters.

Part 5: Handling Real Transaction Data at Scale

Sparse Data and Memory

Real transaction datasets are extremely sparse. A grocery store may stock 30,000 SKUs, but the average basket has 15 items. That is 0.05% fill rate. Storing this as a dense DataFrame (30,000 columns x 10 million rows) would require ~2.2 TB of memory. That does not work.

# Approach 1: Sparse DataFrame (works up to ~1M transactions)
from scipy.sparse import csr_matrix

# Simulate a sparse transaction dataset
np.random.seed(42)
n_trans = 100_000
n_products = 5_000
avg_basket_size = 8

rows, cols = [], []
for i in range(n_trans):
    basket_size = np.random.poisson(avg_basket_size)
    items = np.random.choice(n_products, size=min(basket_size, n_products), replace=False)
    rows.extend([i] * len(items))
    cols.extend(items)

sparse_matrix = csr_matrix(
    (np.ones(len(rows), dtype=bool), (rows, cols)),
    shape=(n_trans, n_products)
)

print(f"Transactions: {n_trans:,}")
print(f"Products: {n_products:,}")
print(f"Dense size: {n_trans * n_products * 1 / 1e9:.2f} GB")
print(f"Sparse size: {sparse_matrix.data.nbytes / 1e6:.1f} MB")
print(f"Sparsity: {1 - sparse_matrix.nnz / (n_trans * n_products):.4%}")

Pre-filtering by Item Frequency

Before running Apriori or FP-Growth, remove items that appear in fewer than min_support of transactions. This reduces the number of columns and dramatically speeds up the algorithm:

# Pre-filter: remove items below min_support BEFORE running the algorithm
min_support_threshold = 0.01  # 1% of transactions

# On the dense boolean DataFrame (for mlxtend)
item_support = large_df.mean()
items_above_threshold = item_support[item_support >= min_support_threshold].index
filtered_df = large_df[items_above_threshold]

print(f"Items before filtering: {large_df.shape[1]}")
print(f"Items after filtering:  {filtered_df.shape[1]}")
print(f"Columns removed: {large_df.shape[1] - filtered_df.shape[1]}")

Aggregating by Time Window

Transaction data accumulates over time. Combining all history into one analysis can mask seasonal patterns. A practical approach is to mine rules on rolling windows and compare:

# Example: monthly rule mining
# For each month, mine rules and track which rules appear consistently

# Simulated concept: identify rules that are stable across time windows
# vs. rules that are seasonal
# monthly_rules = {}
# for month in months:
#     month_basket = baskets[baskets['month'] == month]
#     freq = fpgrowth(month_basket, min_support=0.01, use_colnames=True)
#     monthly_rules[month] = association_rules(freq, metric='lift', min_threshold=1.2)
#
# Stable rule: appears in 10+ of 12 months
# Seasonal rule: appears only in Nov-Dec (holiday baskets)

Practical Note --- At ShopSmart, the analytics team found that approximately 60% of rules with lift > 1.5 were stable across all 12 months. The remaining 40% were seasonal (holiday baking ingredients, back-to-school supplies, summer grilling items). Both types are actionable, but the business response differs: stable rules drive permanent shelf placement; seasonal rules drive temporary end-cap displays.

Part 6: From Rules to Business Actions

Cross-Selling and Product Recommendations

The most direct application: if a customer has {X} in their cart, recommend {Y} where {X} -> {Y} has high lift and confidence.

def get_recommendations(rules_df, cart_items, top_n=5):
    """
    Given a customer's cart, return product recommendations
    based on association rules.

    Parameters
    ----------
    rules_df : pd.DataFrame with association rules
    cart_items : set of item names currently in cart
    top_n : int, number of recommendations to return

    Returns
    -------
    list of (recommended_item, lift, confidence) tuples
    """
    recommendations = []

    for _, rule in rules_df.iterrows():
        antecedent = rule['antecedents']
        consequent = rule['consequents']

        # Check if the antecedent is a subset of the cart
        if antecedent.issubset(cart_items):
            # Do not recommend items already in the cart
            new_items = consequent - cart_items
            for item in new_items:
                recommendations.append({
                    'item': item,
                    'lift': rule['lift'],
                    'confidence': rule['confidence'],
                    'based_on': antecedent
                })

    if not recommendations:
        return pd.DataFrame()

    rec_df = pd.DataFrame(recommendations)
    # Deduplicate: keep highest lift for each recommended item
    rec_df = rec_df.sort_values('lift', ascending=False).drop_duplicates(
        subset='item', keep='first'
    ).head(top_n)

    return rec_df


# Example: customer has bread and butter in cart
cart = {'bread', 'butter'}
recs = get_recommendations(rules_all, cart, top_n=3)
print(f"Cart: {cart}")
print("Recommendations:")
print(recs.to_string(index=False))

Store Layout Optimization

Association rules inform physical store layout. Items with high lift should be:

Co-located if the goal is convenience (items frequently bought together placed near each other for basket completion).
Separated if the goal is discovery (place the consequent item in a different aisle to drive foot traffic past impulse items).

The "right" strategy depends on the retailer's objective. Convenience stores optimize for speed (co-locate). Supermarkets often optimize for exposure (separate, so the customer walks past more shelves).

Bundle Pricing

Rules with high confidence suggest natural bundles:

# Find rules with high confidence for potential bundles
bundle_candidates = rules_all[
    (rules_all['confidence'] >= 0.6) &
    (rules_all['lift'] >= 1.3) &
    (rules_all['support'] >= 0.15)
].sort_values('confidence', ascending=False)

print("Bundle candidates (confidence >= 60%, lift >= 1.3):")
for _, row in bundle_candidates.iterrows():
    items = set(row['antecedents']) | set(row['consequents'])
    print(f"  Bundle: {items}")
    print(f"    Confidence: {row['confidence']:.0%}, Lift: {row['lift']:.2f}, "
          f"Support: {row['support']:.0%}")

Part 7: Common Pitfalls and Misinterpretations

Pitfall 1: Confusing Correlation with Causation

Association rules are correlational. The rule {diapers} -> {beer} does not mean buying diapers causes beer purchases. It means they co-occur in the same baskets. The causal mechanism (new parents buying both in one trip) requires domain knowledge, not the algorithm.

Pitfall 2: Ignoring Lift and Relying on Confidence

A rule with confidence 85% seems strong. But if the consequent appears in 80% of all transactions, the lift is 1.06. The rule is essentially saying "people buy popular items." This is the most common mistake practitioners make. Always check lift.

Pitfall 3: Setting min_support Too High

High min_support finds only rules among the most popular items --- items that everyone already knows about. The interesting rules often involve moderately popular items with strong lift. A practical approach: start with min_support = 0.01 and filter by lift.

Pitfall 4: Generating Too Many Rules and Not Filtering

With min_support = 0.001 and no lift filter, you can generate hundreds of thousands of rules. Nobody can act on hundreds of thousands of rules. A good filtering pipeline (Part 4 of this chapter) reduces the output to tens or hundreds of actionable rules.

Pitfall 5: Treating Rules as Symmetric

The rule {bread} -> {cheese} has different confidence and conviction from {cheese} -> {bread}. The business implications differ too: recommending cheese to bread buyers is different from recommending bread to cheese buyers (cheese buyers may be a more niche, higher-margin segment).

Practical Note --- The best association rule analyses produce a short, curated list of rules accompanied by a recommended action for each one. Twenty rules with clear actions are worth more than two thousand rules in a spreadsheet that nobody reads.

Chapter Summary

Association rules are straightforward: count co-occurrences, compute support, confidence, and lift, and filter for patterns that are both frequent and genuinely above chance. The Apriori algorithm uses the downward closure property to prune the search space; FP-Growth compresses the database into a tree to avoid candidate generation entirely. Lift is the critical filter --- it corrects for base rates and separates genuinely interesting patterns from artifacts of popularity. Conviction and Zhang's metric add directional and bounded alternatives.

The real skill is not running the algorithm. It is choosing the right support threshold, filtering the output to actionable rules, and translating patterns into business decisions: product recommendations, store layout changes, bundle pricing, and cross-sell campaigns. In the next two case studies, you will see this applied to e-commerce transactions (ShopSmart) and streaming behavior (StreamFlow).

Next: Case Study 1 --- ShopSmart Market Basket Analysis