Case Study 2: Cross-Domain Comparison — How Integration Principles Transfer to Credit Scoring, Pharma, and Climate

DataField.Dev

Case Study 2: Cross-Domain Comparison — How Integration Principles Transfer to Credit Scoring, Pharma, and Climate

Context

The StreamRec capstone teaches integration through a specific system: a content recommendation platform. But the principles — component composition, architectural tradeoffs, multi-level evaluation, stakeholder communication, and technical debt management — are domain-independent. This case study examines how three other systems from the book's anchor examples face the same integration challenges with fundamentally different constraints.

The comparison reveals that while components change across domains, the craft of integration does not.

System 1: Meridian Financial Credit Scoring

Meridian Financial (Chapters 24, 28, 29, 31, 35) operates a credit scoring system that processes 15,000 applications per day. The system uses a gradient-boosted tree ensemble (XGBoost, 500 trees, 200 features) to score applicants on a 0-1 probability-of-default scale.

Architecture Mapping

StreamRec Component	Meridian Equivalent	Key Difference
Two-tower retrieval	Not applicable	Credit scoring scores all applicants; no candidate generation
MLP/transformer ranker	XGBoost credit scorer	Single model, no funnel architecture
Re-ranker (diversity, fairness)	Threshold policy + adverse action	Regulatory requirement, not optimization
Feature store	Feature platform (Hive + Redis)	More features (200 vs. ~50), slower refresh (daily)
Canary deployment	Parallel scoring period	Both models score all applicants; no traffic split
Fairness audit	ECOA compliance audit	Legally mandated (not voluntary), protected attributes defined by law
Monitoring (PSI drift)	Model risk monitoring (SR 11-7)	Quarterly validation, not continuous; regulatory reporting
Causal evaluation	Not standard	Causal impact of credit decisions is difficult to measure (no counterfactual for rejected applicants)

Integration Challenges Unique to Credit Scoring

Challenge 1: Regulatory approval gates. Every model change at Meridian requires independent validation by the Model Risk Management (MRM) team — a separate group that does not report to the data science team. The MRM team reviews the model documentation, replicates the validation results, and issues a formal approval. This adds 10-15 business days to every deployment. The integration implication: the deployment pipeline must produce regulatory-grade documentation automatically (model cards, validation reports, fairness audit summaries) because manual documentation is both slower and more error-prone.

Challenge 2: Adverse action requirements. When Meridian denies a credit application, federal law (ECOA, Regulation B) requires that the applicant receive specific reasons for the denial. This mandates feature-level explanations for every negative decision — not just aggregate interpretability. The integration implication: the SHAP computation from Chapter 35 is not a nice-to-have; it is a serving-path component that must execute within the decision latency and produce legally compliant reason codes.

Challenge 3: The selective labels problem. Meridian can only observe default outcomes for applicants who were approved. Applicants who were denied might have repaid their loans — but the team will never know. This creates a fundamental limitation for causal evaluation: unlike StreamRec (where causal impact can be estimated via A/B testing), Meridian cannot randomly approve rejected applicants to measure the counterfactual. The integration implication: evaluation at Meridian relies more heavily on offline backtesting, stability analysis, and stress testing (Chapter 28) than on causal impact estimation.

ADR Example

ADR: Model Explainability in the Serving Path

Context: ECOA requires adverse action reasons for every denial. SHAP computation for 500-tree XGBoost takes ~200ms per applicant on CPU, ~15ms on GPU.
Decision: Pre-compute SHAP values for all feature combinations at each score decile. Serve pre-computed explanations via lookup table. Fall back to real-time SHAP for edge cases outside the pre-computed range.
Alternative rejected: Real-time SHAP for every applicant. Rejected due to GPU infrastructure cost and latency variability at the p99 level.
Consequence: Explanations are approximate (decile-level, not applicant-level) for 95% of decisions. Regulatory counsel confirmed this meets ECOA requirements because the reason codes (top-3 negative factors) are consistent within score deciles.

System 2: MediCore Pharma Clinical Trial Analysis

MediCore (Chapters 16, 17, 18, 19, 21, 32, 34) operates a multi-site clinical trial analysis platform that integrates data from 12 hospital sites to estimate treatment effects for a cardiovascular drug.

Architecture Mapping

StreamRec Component	MediCore Equivalent	Key Difference
Feature store	Multi-site data integration platform	Federated — data cannot leave hospital sites
Training pipeline	Statistical analysis pipeline	Causal estimation, not prediction
Model evaluation	Treatment effect estimation	Primary output is ATE/CATE, not predictions
Fairness audit	Subgroup analysis	Required by FDA for regulatory submission
Privacy	Federated learning + DP	Not optional — HIPAA mandates
Monitoring	Data quality monitoring	No live serving; batch analysis
Deployment	Regulatory submission package	"Deploy" means submit to FDA, not serve predictions

Integration Challenges Unique to Pharma

Challenge 1: Federated data integration. Patient-level data cannot leave hospital sites due to HIPAA and institutional data use agreements. The analysis must be performed without centralizing data — using federated causal estimation (Chapter 32). The integration implication: every component (data validation, feature engineering, model training, subgroup analysis) must operate in a federated setting where the platform orchestrates computation across sites without accessing raw data.

Challenge 2: Pre-registered analysis plan. Clinical trials require a Statistical Analysis Plan (SAP) — the pharma equivalent of the ADR and TDD combined — that specifies every analysis, endpoint, subgroup, and statistical method before the data is unblinded. Deviations from the SAP require formal justification. The integration implication: the analysis pipeline must exactly implement the SAP, and any post-hoc analysis must be clearly labeled as exploratory (not confirmatory).

Challenge 3: Uncertainty is the primary deliverable. In StreamRec, the primary deliverable is a recommendation list. Uncertainty is supplementary (Track C). In pharma, uncertainty is the deliverable — the entire purpose of the analysis is to estimate a treatment effect with a confidence interval narrow enough to support (or reject) a regulatory claim. The integration implication: every component (Chapter 34's conformal prediction, Chapter 21's Bayesian hierarchical models, Chapter 18's doubly robust estimation) is calibrated to produce reliable uncertainty estimates, and the evaluation framework tests calibration (not just accuracy).

from dataclasses import dataclass
from typing import Dict, List, Tuple


@dataclass
class DomainIntegrationProfile:
    """Integration characteristics for a specific domain.

    Captures the key constraints and priorities that shape how
    components are integrated in each domain.

    Attributes:
        domain: Domain name.
        latency_regime: 'realtime' (<1s), 'interactive' (1-60s), 'batch' (minutes+).
        regulatory_burden: 'none', 'moderate', 'heavy'.
        primary_deliverable: What the system produces.
        causal_evaluation_feasible: Whether A/B testing or causal estimation is possible.
        fairness_mandate: 'voluntary', 'industry_norm', 'legally_required'.
        privacy_constraint: 'standard', 'regulated', 'federated'.
        deployment_meaning: What "deployment" means in this domain.
        retraining_cadence: How often the model is retrained.
    """
    domain: str
    latency_regime: str
    regulatory_burden: str
    primary_deliverable: str
    causal_evaluation_feasible: bool
    fairness_mandate: str
    privacy_constraint: str
    deployment_meaning: str
    retraining_cadence: str


# Compare all four domains
profiles = [
    DomainIntegrationProfile(
        domain="StreamRec (Recommendation)",
        latency_regime="realtime (<200ms)",
        regulatory_burden="none",
        primary_deliverable="Ranked item list",
        causal_evaluation_feasible=True,
        fairness_mandate="voluntary",
        privacy_constraint="standard",
        deployment_meaning="Serve to users via API",
        retraining_cadence="Weekly",
    ),
    DomainIntegrationProfile(
        domain="Meridian (Credit Scoring)",
        latency_regime="interactive (<5s)",
        regulatory_burden="heavy",
        primary_deliverable="Risk score + adverse action reasons",
        causal_evaluation_feasible=False,
        fairness_mandate="legally_required",
        privacy_constraint="regulated",
        deployment_meaning="Replace production model (with MRM approval)",
        retraining_cadence="Quarterly",
    ),
    DomainIntegrationProfile(
        domain="MediCore (Pharma)",
        latency_regime="batch (hours)",
        regulatory_burden="heavy",
        primary_deliverable="Treatment effect estimate + CI",
        causal_evaluation_feasible=True,
        fairness_mandate="legally_required",
        privacy_constraint="federated",
        deployment_meaning="Submit to FDA",
        retraining_cadence="Per trial (one-time)",
    ),
    DomainIntegrationProfile(
        domain="Climate Forecasting",
        latency_regime="batch (hours)",
        regulatory_burden="none",
        primary_deliverable="Spatiotemporal forecast + uncertainty",
        causal_evaluation_feasible=False,
        fairness_mandate="industry_norm",
        privacy_constraint="standard",
        deployment_meaning="Publish forecast to stakeholders",
        retraining_cadence="Monthly (with new satellite data)",
    ),
]

# Print comparison table
print(f"{'Domain':<30s} {'Latency':<20s} {'Regulation':<12s} "
      f"{'Fairness':<18s} {'Retrain':<12s}")
print("-" * 92)
for p in profiles:
    print(f"{p.domain:<30s} {p.latency_regime:<20s} "
          f"{p.regulatory_burden:<12s} {p.fairness_mandate:<18s} "
          f"{p.retraining_cadence:<12s}")

Domain                         Latency              Regulation   Fairness           Retrain
--------------------------------------------------------------------------------------------
StreamRec (Recommendation)     realtime (<200ms)    none         voluntary          Weekly
Meridian (Credit Scoring)      interactive (<5s)    heavy        legally_required   Quarterly
MediCore (Pharma)              batch (hours)        heavy        legally_required   Per trial (one-time)
Climate Forecasting            batch (hours)        none         industry_norm      Monthly (with new satellite data)

System 3: Climate Forecasting

The climate forecasting system (Chapters 1, 4, 8, 9, 10, 23, 26, 34) produces 7-day regional temperature and precipitation forecasts using a temporal fusion transformer (Chapter 23) trained on satellite, weather station, and reanalysis data.

Architecture Mapping

StreamRec Component	Climate Equivalent	Key Difference
Feature store	Data ingestion pipeline	Geospatial data (satellite, station, reanalysis), not user behavior
Training pipeline	Monthly retraining with new satellite data	Global model, not per-user
Ranking model	Temporal fusion transformer	Regression (temperature, precipitation), not ranking
Uncertainty quantification	Conformal prediction + deep ensemble	Core requirement, not optional (Track C)
Monitoring	Forecast skill monitoring	Compared to climatological baseline and persistence
Fairness audit	Regional equity analysis	Forecast accuracy should not systematically differ by region or population

Integration Challenges Unique to Climate

Challenge 1: No A/B testing. You cannot show different weather to different people. Causal evaluation of forecast quality improvements must rely on backtesting (historical re-forecasting), comparison to operational baselines (NWS, ECMWF), and ablation studies. The integration implication: the evaluation pipeline must support temporal cross-validation with embargo periods (Chapter 23) and multi-baseline comparison.

Challenge 2: Spatial fairness. Climate forecasts are used for agricultural planning, disaster preparedness, and insurance pricing. If the forecast is systematically less accurate in low-income rural areas (where weather station coverage is sparse), downstream decisions based on the forecast will be systematically worse for those communities. The integration implication: the evaluation pipeline must disaggregate forecast skill by region, population density, and station coverage — the climate equivalent of the creator fairness audit.

Challenge 3: Massive data volume. A single day of global satellite data (ERA5 reanalysis) is approximately 1TB. Training on 40 years of historical data requires distributed training (Chapter 26) and careful data pipeline engineering. The integration implication: the data pipeline is the dominant engineering challenge, not the model architecture.

The Cross-Domain Integration Matrix

The following matrix summarizes which integration principles from the StreamRec capstone apply universally and which are domain-specific.

Principle	StreamRec	Credit Scoring	Pharma	Climate
Data contracts prevent silent bugs	Universal	Universal	Universal	Universal
Co-versioned artifacts	Universal	Universal	Universal	Universal
Multi-level evaluation	Universal	Universal	Universal	Universal
ADRs document architectural choices	Universal	Universal	SAP serves this role	Universal
Causal impact estimation	A/B test	Infeasible (selective labels)	RCT is the gold standard	Infeasible (no control)
Fairness audit	Voluntary, product-driven	Legally required, ECOA-driven	Legally required, FDA-driven	Ethical, equity-driven
Canary deployment	Standard	Parallel scoring	Not applicable (batch)	Not applicable (batch)
Latency budget	Critical (<200ms)	Important (<5s)	Irrelevant (batch)	Irrelevant (batch)
Retraining frequency	Weekly	Quarterly	One-time	Monthly
Privacy	Standard (opt-out)	Regulated (FCRA/GLBA)	Mandated (HIPAA), federated	Standard
Stakeholder communication	Product + engineering	Regulatory + risk + business	FDA + clinical + business	Scientific + public

Lessons

Data contracts, co-versioning, and multi-level evaluation are universal. Every domain benefits from explicit feature contracts, coupled artifact versioning, and evaluation that goes beyond model accuracy to include system reliability and domain-specific impact metrics. These are not recommendation-system-specific practices — they are ML engineering practices.
The regulatory environment shapes the integration architecture more than the model architecture. Meridian's deployment pipeline is fundamentally different from StreamRec's not because of the model (XGBoost vs. two-tower) but because of the regulatory requirements (MRM approval, adverse action, quarterly validation). MediCore's entire analysis is shaped by the SAP and FDA submission requirements. The model is a component; the regulatory context is the architecture.
Causal evaluation feasibility varies by domain, but the aspiration is universal. StreamRec can A/B test. Meridian cannot (selective labels). MediCore's randomized controlled trial is causal evaluation. Climate cannot randomize weather. In every domain, the team should identify the strongest feasible causal evaluation method and document why stronger methods are not available — this is an ADR in itself.
Fairness is universal but the protected groups and criteria are domain-specific. StreamRec protects creators (exposure equity) and users (quality parity). Meridian protects applicants (disparate impact, ECOA). MediCore protects patient subgroups (treatment effect heterogeneity, FDA subgroup analysis). Climate protects communities (forecast equity by region and socioeconomic status). The FairnessMetrics class from Chapter 31 applies in all four domains — but the choice of protected attribute, fairness criterion, and acceptable threshold is deeply domain-specific.
The hardest part is not building any individual component but making them all work together reliably. This statement is true for a recommendation system, a credit scoring platform, a clinical trial analysis pipeline, and a climate forecasting system. The components differ; the integration challenge does not.