Case Study 2: Cross-Domain Comparison — How Integration Principles Transfer to Credit Scoring, Pharma, and Climate
Context
The StreamRec capstone teaches integration through a specific system: a content recommendation platform. But the principles — component composition, architectural tradeoffs, multi-level evaluation, stakeholder communication, and technical debt management — are domain-independent. This case study examines how three other systems from the book's anchor examples face the same integration challenges with fundamentally different constraints.
The comparison reveals that while components change across domains, the craft of integration does not.
System 1: Meridian Financial Credit Scoring
Meridian Financial (Chapters 24, 28, 29, 31, 35) operates a credit scoring system that processes 15,000 applications per day. The system uses a gradient-boosted tree ensemble (XGBoost, 500 trees, 200 features) to score applicants on a 0-1 probability-of-default scale.
Architecture Mapping
| StreamRec Component | Meridian Equivalent | Key Difference |
|---|---|---|
| Two-tower retrieval | Not applicable | Credit scoring scores all applicants; no candidate generation |
| MLP/transformer ranker | XGBoost credit scorer | Single model, no funnel architecture |
| Re-ranker (diversity, fairness) | Threshold policy + adverse action | Regulatory requirement, not optimization |
| Feature store | Feature platform (Hive + Redis) | More features (200 vs. ~50), slower refresh (daily) |
| Canary deployment | Parallel scoring period | Both models score all applicants; no traffic split |
| Fairness audit | ECOA compliance audit | Legally mandated (not voluntary), protected attributes defined by law |
| Monitoring (PSI drift) | Model risk monitoring (SR 11-7) | Quarterly validation, not continuous; regulatory reporting |
| Causal evaluation | Not standard | Causal impact of credit decisions is difficult to measure (no counterfactual for rejected applicants) |
Integration Challenges Unique to Credit Scoring
Challenge 1: Regulatory approval gates. Every model change at Meridian requires independent validation by the Model Risk Management (MRM) team — a separate group that does not report to the data science team. The MRM team reviews the model documentation, replicates the validation results, and issues a formal approval. This adds 10-15 business days to every deployment. The integration implication: the deployment pipeline must produce regulatory-grade documentation automatically (model cards, validation reports, fairness audit summaries) because manual documentation is both slower and more error-prone.
Challenge 2: Adverse action requirements. When Meridian denies a credit application, federal law (ECOA, Regulation B) requires that the applicant receive specific reasons for the denial. This mandates feature-level explanations for every negative decision — not just aggregate interpretability. The integration implication: the SHAP computation from Chapter 35 is not a nice-to-have; it is a serving-path component that must execute within the decision latency and produce legally compliant reason codes.
Challenge 3: The selective labels problem. Meridian can only observe default outcomes for applicants who were approved. Applicants who were denied might have repaid their loans — but the team will never know. This creates a fundamental limitation for causal evaluation: unlike StreamRec (where causal impact can be estimated via A/B testing), Meridian cannot randomly approve rejected applicants to measure the counterfactual. The integration implication: evaluation at Meridian relies more heavily on offline backtesting, stability analysis, and stress testing (Chapter 28) than on causal impact estimation.
ADR Example
ADR: Model Explainability in the Serving Path
- Context: ECOA requires adverse action reasons for every denial. SHAP computation for 500-tree XGBoost takes ~200ms per applicant on CPU, ~15ms on GPU.
- Decision: Pre-compute SHAP values for all feature combinations at each score decile. Serve pre-computed explanations via lookup table. Fall back to real-time SHAP for edge cases outside the pre-computed range.
- Alternative rejected: Real-time SHAP for every applicant. Rejected due to GPU infrastructure cost and latency variability at the p99 level.
- Consequence: Explanations are approximate (decile-level, not applicant-level) for 95% of decisions. Regulatory counsel confirmed this meets ECOA requirements because the reason codes (top-3 negative factors) are consistent within score deciles.
System 2: MediCore Pharma Clinical Trial Analysis
MediCore (Chapters 16, 17, 18, 19, 21, 32, 34) operates a multi-site clinical trial analysis platform that integrates data from 12 hospital sites to estimate treatment effects for a cardiovascular drug.
Architecture Mapping
| StreamRec Component | MediCore Equivalent | Key Difference |
|---|---|---|
| Feature store | Multi-site data integration platform | Federated — data cannot leave hospital sites |
| Training pipeline | Statistical analysis pipeline | Causal estimation, not prediction |
| Model evaluation | Treatment effect estimation | Primary output is ATE/CATE, not predictions |
| Fairness audit | Subgroup analysis | Required by FDA for regulatory submission |
| Privacy | Federated learning + DP | Not optional — HIPAA mandates |
| Monitoring | Data quality monitoring | No live serving; batch analysis |
| Deployment | Regulatory submission package | "Deploy" means submit to FDA, not serve predictions |
Integration Challenges Unique to Pharma
Challenge 1: Federated data integration. Patient-level data cannot leave hospital sites due to HIPAA and institutional data use agreements. The analysis must be performed without centralizing data — using federated causal estimation (Chapter 32). The integration implication: every component (data validation, feature engineering, model training, subgroup analysis) must operate in a federated setting where the platform orchestrates computation across sites without accessing raw data.
Challenge 2: Pre-registered analysis plan. Clinical trials require a Statistical Analysis Plan (SAP) — the pharma equivalent of the ADR and TDD combined — that specifies every analysis, endpoint, subgroup, and statistical method before the data is unblinded. Deviations from the SAP require formal justification. The integration implication: the analysis pipeline must exactly implement the SAP, and any post-hoc analysis must be clearly labeled as exploratory (not confirmatory).
Challenge 3: Uncertainty is the primary deliverable. In StreamRec, the primary deliverable is a recommendation list. Uncertainty is supplementary (Track C). In pharma, uncertainty is the deliverable — the entire purpose of the analysis is to estimate a treatment effect with a confidence interval narrow enough to support (or reject) a regulatory claim. The integration implication: every component (Chapter 34's conformal prediction, Chapter 21's Bayesian hierarchical models, Chapter 18's doubly robust estimation) is calibrated to produce reliable uncertainty estimates, and the evaluation framework tests calibration (not just accuracy).
from dataclasses import dataclass
from typing import Dict, List, Tuple
@dataclass
class DomainIntegrationProfile:
"""Integration characteristics for a specific domain.
Captures the key constraints and priorities that shape how
components are integrated in each domain.
Attributes:
domain: Domain name.
latency_regime: 'realtime' (<1s), 'interactive' (1-60s), 'batch' (minutes+).
regulatory_burden: 'none', 'moderate', 'heavy'.
primary_deliverable: What the system produces.
causal_evaluation_feasible: Whether A/B testing or causal estimation is possible.
fairness_mandate: 'voluntary', 'industry_norm', 'legally_required'.
privacy_constraint: 'standard', 'regulated', 'federated'.
deployment_meaning: What "deployment" means in this domain.
retraining_cadence: How often the model is retrained.
"""
domain: str
latency_regime: str
regulatory_burden: str
primary_deliverable: str
causal_evaluation_feasible: bool
fairness_mandate: str
privacy_constraint: str
deployment_meaning: str
retraining_cadence: str
# Compare all four domains
profiles = [
DomainIntegrationProfile(
domain="StreamRec (Recommendation)",
latency_regime="realtime (<200ms)",
regulatory_burden="none",
primary_deliverable="Ranked item list",
causal_evaluation_feasible=True,
fairness_mandate="voluntary",
privacy_constraint="standard",
deployment_meaning="Serve to users via API",
retraining_cadence="Weekly",
),
DomainIntegrationProfile(
domain="Meridian (Credit Scoring)",
latency_regime="interactive (<5s)",
regulatory_burden="heavy",
primary_deliverable="Risk score + adverse action reasons",
causal_evaluation_feasible=False,
fairness_mandate="legally_required",
privacy_constraint="regulated",
deployment_meaning="Replace production model (with MRM approval)",
retraining_cadence="Quarterly",
),
DomainIntegrationProfile(
domain="MediCore (Pharma)",
latency_regime="batch (hours)",
regulatory_burden="heavy",
primary_deliverable="Treatment effect estimate + CI",
causal_evaluation_feasible=True,
fairness_mandate="legally_required",
privacy_constraint="federated",
deployment_meaning="Submit to FDA",
retraining_cadence="Per trial (one-time)",
),
DomainIntegrationProfile(
domain="Climate Forecasting",
latency_regime="batch (hours)",
regulatory_burden="none",
primary_deliverable="Spatiotemporal forecast + uncertainty",
causal_evaluation_feasible=False,
fairness_mandate="industry_norm",
privacy_constraint="standard",
deployment_meaning="Publish forecast to stakeholders",
retraining_cadence="Monthly (with new satellite data)",
),
]
# Print comparison table
print(f"{'Domain':<30s} {'Latency':<20s} {'Regulation':<12s} "
f"{'Fairness':<18s} {'Retrain':<12s}")
print("-" * 92)
for p in profiles:
print(f"{p.domain:<30s} {p.latency_regime:<20s} "
f"{p.regulatory_burden:<12s} {p.fairness_mandate:<18s} "
f"{p.retraining_cadence:<12s}")
Domain Latency Regulation Fairness Retrain
--------------------------------------------------------------------------------------------
StreamRec (Recommendation) realtime (<200ms) none voluntary Weekly
Meridian (Credit Scoring) interactive (<5s) heavy legally_required Quarterly
MediCore (Pharma) batch (hours) heavy legally_required Per trial (one-time)
Climate Forecasting batch (hours) none industry_norm Monthly (with new satellite data)
System 3: Climate Forecasting
The climate forecasting system (Chapters 1, 4, 8, 9, 10, 23, 26, 34) produces 7-day regional temperature and precipitation forecasts using a temporal fusion transformer (Chapter 23) trained on satellite, weather station, and reanalysis data.
Architecture Mapping
| StreamRec Component | Climate Equivalent | Key Difference |
|---|---|---|
| Feature store | Data ingestion pipeline | Geospatial data (satellite, station, reanalysis), not user behavior |
| Training pipeline | Monthly retraining with new satellite data | Global model, not per-user |
| Ranking model | Temporal fusion transformer | Regression (temperature, precipitation), not ranking |
| Uncertainty quantification | Conformal prediction + deep ensemble | Core requirement, not optional (Track C) |
| Monitoring | Forecast skill monitoring | Compared to climatological baseline and persistence |
| Fairness audit | Regional equity analysis | Forecast accuracy should not systematically differ by region or population |
Integration Challenges Unique to Climate
Challenge 1: No A/B testing. You cannot show different weather to different people. Causal evaluation of forecast quality improvements must rely on backtesting (historical re-forecasting), comparison to operational baselines (NWS, ECMWF), and ablation studies. The integration implication: the evaluation pipeline must support temporal cross-validation with embargo periods (Chapter 23) and multi-baseline comparison.
Challenge 2: Spatial fairness. Climate forecasts are used for agricultural planning, disaster preparedness, and insurance pricing. If the forecast is systematically less accurate in low-income rural areas (where weather station coverage is sparse), downstream decisions based on the forecast will be systematically worse for those communities. The integration implication: the evaluation pipeline must disaggregate forecast skill by region, population density, and station coverage — the climate equivalent of the creator fairness audit.
Challenge 3: Massive data volume. A single day of global satellite data (ERA5 reanalysis) is approximately 1TB. Training on 40 years of historical data requires distributed training (Chapter 26) and careful data pipeline engineering. The integration implication: the data pipeline is the dominant engineering challenge, not the model architecture.
The Cross-Domain Integration Matrix
The following matrix summarizes which integration principles from the StreamRec capstone apply universally and which are domain-specific.
| Principle | StreamRec | Credit Scoring | Pharma | Climate |
|---|---|---|---|---|
| Data contracts prevent silent bugs | Universal | Universal | Universal | Universal |
| Co-versioned artifacts | Universal | Universal | Universal | Universal |
| Multi-level evaluation | Universal | Universal | Universal | Universal |
| ADRs document architectural choices | Universal | Universal | SAP serves this role | Universal |
| Causal impact estimation | A/B test | Infeasible (selective labels) | RCT is the gold standard | Infeasible (no control) |
| Fairness audit | Voluntary, product-driven | Legally required, ECOA-driven | Legally required, FDA-driven | Ethical, equity-driven |
| Canary deployment | Standard | Parallel scoring | Not applicable (batch) | Not applicable (batch) |
| Latency budget | Critical (<200ms) | Important (<5s) | Irrelevant (batch) | Irrelevant (batch) |
| Retraining frequency | Weekly | Quarterly | One-time | Monthly |
| Privacy | Standard (opt-out) | Regulated (FCRA/GLBA) | Mandated (HIPAA), federated | Standard |
| Stakeholder communication | Product + engineering | Regulatory + risk + business | FDA + clinical + business | Scientific + public |
Lessons
-
Data contracts, co-versioning, and multi-level evaluation are universal. Every domain benefits from explicit feature contracts, coupled artifact versioning, and evaluation that goes beyond model accuracy to include system reliability and domain-specific impact metrics. These are not recommendation-system-specific practices — they are ML engineering practices.
-
The regulatory environment shapes the integration architecture more than the model architecture. Meridian's deployment pipeline is fundamentally different from StreamRec's not because of the model (XGBoost vs. two-tower) but because of the regulatory requirements (MRM approval, adverse action, quarterly validation). MediCore's entire analysis is shaped by the SAP and FDA submission requirements. The model is a component; the regulatory context is the architecture.
-
Causal evaluation feasibility varies by domain, but the aspiration is universal. StreamRec can A/B test. Meridian cannot (selective labels). MediCore's randomized controlled trial is causal evaluation. Climate cannot randomize weather. In every domain, the team should identify the strongest feasible causal evaluation method and document why stronger methods are not available — this is an ADR in itself.
-
Fairness is universal but the protected groups and criteria are domain-specific. StreamRec protects creators (exposure equity) and users (quality parity). Meridian protects applicants (disparate impact, ECOA). MediCore protects patient subgroups (treatment effect heterogeneity, FDA subgroup analysis). Climate protects communities (forecast equity by region and socioeconomic status). The
FairnessMetricsclass from Chapter 31 applies in all four domains — but the choice of protected attribute, fairness criterion, and acceptable threshold is deeply domain-specific. -
The hardest part is not building any individual component but making them all work together reliably. This statement is true for a recommendation system, a credit scoring platform, a clinical trial analysis pipeline, and a climate forecasting system. The components differ; the integration challenge does not.