Case Study 1: StreamRec Technical Strategy — Prioritizing Modeling vs. Infrastructure

Context

The StreamRec recommendation team has completed 36 chapters' worth of progressive project milestones. The system works: two-tower retrieval, transformer-based ranking, feature store, deployment pipeline, monitoring, fairness audits, and causal evaluation are all operational. Hit@10 is 0.17, NDCG@10 is 0.09, p99 latency is 180ms, and the causal ATE of recommendations on 30-day engagement is 4.1% (Chapter 36).

The VP of Product has declared four priorities for the next twelve months:

  1. Reduce new-user 7-day churn from 42% to 35%
  2. Increase creator content diversity (reduce language-based exposure Gini from 0.48 to 0.30)
  3. Achieve sub-100ms p99 serving latency
  4. Establish causal evaluation as the standard for all recommendation experiments

The newly appointed staff data scientist — you — must write the technical strategy. The team has 9 people: 2 senior DS, 3 mid-level DS, 1 junior DS, 2 ML engineers, and 1 data engineer. There is no new headcount approved. Every hour invested in one priority is an hour not invested in another.

The central tension is this: should the team invest primarily in modeling improvements (better cold-start handling, diversity-aware ranking) or infrastructure improvements (latency optimization, causal evaluation pipeline)? Both are necessary. Neither is sufficient alone. The strategy must make choices.

The Analysis

Diagnosing the Bottleneck

The staff DS begins by mapping the current system's capabilities against the four priorities:

Priority Current State Gap Primary Constraint
1. New-user churn Cold-start uses popularity fallback; no personalization until 5+ interactions Large: 42% churn vs. 35% target Modeling: need Bayesian cold-start (Ch.20-22)
2. Creator diversity No diversity intervention in ranking or re-ranking pipeline Large: Gini 0.48 vs. 0.30 target Modeling + Serving: exposure-aware re-ranking (Ch.31) + FAISS incremental updates
3. Latency p99 = 180ms; two-tower retrieval (15ms), ranker (42ms), feature store (35ms), overhead (88ms) Medium: 180ms vs. 100ms target Infrastructure: feature caching, model distillation, serving optimization
4. Causal evaluation Implemented for capstone evaluation; not standardized across team Medium: one-off analysis vs. automated pipeline Infrastructure: doubly robust estimation pipeline, experiment report template

The diagnosis reveals that Priorities 1 and 2 are primarily modeling bottlenecks, while Priorities 3 and 4 are primarily infrastructure bottlenecks. This is not surprising — the team invested heavily in infrastructure during the production systems phase (Chapters 24-30), which means the infrastructure is reasonably mature. The next marginal unit of effort creates more value applied to modeling than to infrastructure.

Quantifying the Tradeoffs

The staff DS estimates the impact and effort for each priority:

Priority Estimated Effort Expected Impact Confidence Revenue Proxy
1. Cold-start model 12 person-weeks 4-7pp churn reduction Medium $8-14M annual (assuming $200 LTV, 10M new users/year)
2. Diversity intervention 10 person-weeks Gini reduction 0.48→0.30 Medium-High $2-5M annual (creator retention, content quality)
3. Latency optimization 14 person-weeks 180ms→95ms p99 High $3-6M annual (engagement lift from faster serving)
4. Causal evaluation 6 person-weeks N/A (process improvement) High Indirect: prevents misallocation of future investments

The team has approximately 36 person-weeks of capacity per quarter (9 people x 4 weeks, discounting ~20% for operational work, on-call, and overhead). Over four quarters, that is roughly 144 person-weeks of available capacity — enough for all four priorities if they are sequenced correctly, but not enough to run all four in parallel.

The Strategy Decision

The staff DS proposes the following quarterly sequencing:

Q1: Cold-start model (Priority 1) + Causal evaluation framework (Priority 4)

Rationale: Priority 1 has the highest expected revenue impact and directly addresses the VP's top concern. Priority 4 requires the least effort (6 person-weeks) and, once in place, improves the quality of evaluation for every subsequent priority. Starting these together means the cold-start model's impact can be measured with causal rigor from the start.

Team allocation: - Cold-start model: 2 senior DS (Bayesian modeling), 1 mid-level DS (feature engineering), 1 MLE (serving integration). 12 person-weeks. - Causal evaluation: 1 mid-level DS (pipeline development), 1 senior DS (methodology review, 25% time). 6 person-weeks. - Remaining capacity: 1 mid-level DS + 1 MLE + 1 DE on operational work and preparing for Q2.

Q2: Diversity intervention (Priority 2) + begin latency work (Priority 3)

Rationale: The diversity intervention requires a re-ranking component in the serving path, which will initially increase latency by 10-15ms. Starting latency work in Q2 means the latency optimization can offset the re-ranker's latency cost, avoiding a regression on Priority 3 while pursuing Priority 2.

Team allocation: - Diversity re-ranker: 2 senior DS (fairness modeling, exposure-aware re-ranking), 1 mid-level DS (evaluation). 10 person-weeks. - Latency Phase 1: 2 MLE (feature caching, serving optimization), 1 DE (batch pipeline optimization). 8 person-weeks.

Q3: Latency optimization Phase 2 (Priority 3) + cold-start model iteration

Rationale: Phase 2 of latency work — model distillation and FAISS tuning — requires the serving optimizations from Phase 1 to be in place. The cold-start model has been in production for two quarters; the causal evaluation framework provides data to identify specific failure modes and guide the second iteration.

Q4: Integration, consolidation, and roadmap for Year 2

Rationale: The final quarter focuses on paying down the technical debt accumulated during the three rapid-deployment quarters, conducting a comprehensive fairness audit of the combined system (cold-start + diversity + latency changes), and writing the Year 2 technical strategy.

The Key Insight: Sequencing Creates Dependencies, and Dependencies Create Risk

The staff DS identifies a critical dependency: the diversity re-ranker (Q2) adds latency, which means Priority 3 must progress fast enough in Q2 to offset the regression. If latency optimization falls behind, the team faces a choice: deploy the diversity re-ranker and accept temporarily higher latency (violating Priority 3), or delay the diversity re-ranker (delaying Priority 2).

The mitigation is to build the diversity re-ranker with a latency budget parameter:

from dataclasses import dataclass
from typing import List, Dict, Optional


@dataclass
class DiversityReranker:
    """Exposure-aware diversity re-ranker with latency budget.

    Re-ranks candidate items to improve creator diversity while
    respecting a configurable latency constraint.

    Attributes:
        target_gini: Target exposure Gini coefficient.
        latency_budget_ms: Maximum allowed latency for re-ranking.
        diversity_weight: Weight of diversity vs. relevance (0-1).
        fallback_mode: If True, skip re-ranking when over budget.
    """
    target_gini: float = 0.30
    latency_budget_ms: float = 15.0
    diversity_weight: float = 0.3
    fallback_mode: bool = True

    def rerank(
        self,
        candidates: List[Dict],
        user_context: Dict,
        current_latency_ms: float,
    ) -> List[Dict]:
        """Re-rank candidates for diversity within latency budget.

        If current pipeline latency plus estimated re-ranking time
        would exceed the total latency SLA, falls back to the
        original ranking (preserving latency at the cost of diversity).

        Args:
            candidates: Ranked candidate items with scores.
            user_context: User features for personalized diversity.
            current_latency_ms: Accumulated latency so far in pipeline.

        Returns:
            Re-ranked candidates respecting latency budget.
        """
        remaining_budget = self.latency_budget_ms
        if self.fallback_mode and current_latency_ms > 85.0:
            # Not enough budget for re-ranking; return original order.
            return candidates

        # Greedy MMR-style re-ranking with diversity objective.
        reranked: List[Dict] = []
        remaining = list(candidates)

        while remaining and len(reranked) < len(candidates):
            best_idx = 0
            best_score = float("-inf")

            for i, item in enumerate(remaining):
                relevance = item.get("score", 0.0)
                diversity = self._diversity_gain(item, reranked)
                combined = (
                    (1 - self.diversity_weight) * relevance
                    + self.diversity_weight * diversity
                )
                if combined > best_score:
                    best_score = combined
                    best_idx = i

            reranked.append(remaining.pop(best_idx))

        return reranked

    def _diversity_gain(
        self, candidate: Dict, selected: List[Dict]
    ) -> float:
        """Compute diversity gain of adding candidate to selected set.

        Measures how different the candidate's creator attributes are
        from the already-selected items.

        Args:
            candidate: Candidate item to evaluate.
            selected: Already-selected items.

        Returns:
            Diversity score between 0 and 1.
        """
        if not selected:
            return 1.0
        languages = {item.get("creator_language", "unknown") for item in selected}
        candidate_lang = candidate.get("creator_language", "unknown")
        return 1.0 if candidate_lang not in languages else 0.0

This design embodies Theme 6 (Simplest Model That Works): the re-ranker uses a greedy MMR-style algorithm with a latency-aware fallback. If the serving pipeline is already near its latency budget, the re-ranker gracefully degrades to the original ranking rather than violating the SLA. The team can replace the greedy algorithm with a more sophisticated optimization in a future iteration — but the fallback mechanism ensures that deploying the re-ranker never makes latency worse.

The Organizational Challenge

The technical strategy is only half the problem. The other half is organizational.

Senior DS A is the team's deep learning expert and is eager to work on the cold-start model. However, they have no experience with Bayesian methods. The staff DS pairs them with Senior DS B (who has Bayesian experience from Chapter 21) and assigns them to co-lead the cold-start project — creating a knowledge transfer opportunity embedded in the project work rather than a separate training activity.

Mid-level DS C has been asking to work on "more impactful projects." The staff DS assigns them to the causal evaluation framework — a high-visibility infrastructure project that is well-scoped enough for a mid-level DS to own, with senior DS review. This is a stretch assignment: it develops the mid-level DS's skills in causal inference (a gap in the team, per the team assessment) and produces a visible deliverable that supports their promotion case.

The data engineer is concerned that the latency optimization work will require changes to the feature pipeline that conflict with their planned migration to a new batch processing framework. The staff DS schedules a 1:1 to understand the conflict and discovers that the feature caching strategy for latency optimization (storing pre-computed features in Redis with a 15-minute refresh) actually reduces the feature pipeline's serving-path criticality — making the batch migration easier, not harder. This alignment was not obvious until the staff DS connected the two projects in a single conversation.

Outcome and Reflection

The strategy document produces a 12-month plan that:

  1. Sequences four priorities across four quarters, respecting dependencies and team capacity
  2. Allocates team members to projects based on both skill match and development goals
  3. Includes a latency-aware fallback mechanism that prevents the diversity intervention from regressing Priority 3
  4. Embeds knowledge transfer (Bayesian methods, causal inference) into project work rather than separate training
  5. Uses the causal evaluation framework (Priority 4) as the measurement backbone for all subsequent priorities

The staff DS presents the strategy to the VP of Product using a three-page summary: one page on the vision and key bets, one page on the quarterly roadmap with expected outcomes, and one page on risks and mitigations. The technical details — the diversity re-ranker algorithm, the feature caching strategy, the causal estimation methodology — are in an appendix, available if asked but not presented proactively.

The VP's response: "I like that you've sequenced churn first. When will we see the first results?" The staff DS: "The cold-start model deploys end of Q1. The causal evaluation framework will give us a rigorous impact estimate within two weeks of deployment. I'll present the Q1 results at the April leadership review."

This is what staff-level work looks like: not building the model, but ensuring that the right model gets built, by the right people, in the right order, with the right measurement, and with the right communication to the people who fund it.