Case Study 1: From Rule-Based Systems to Deep Learning at a Healthcare Company
Background
MedAssist Health Technologies (a composite based on real industry patterns) is a mid-sized healthcare technology company founded in 2008. The company's flagship product is a clinical decision support system (CDSS) that helps physicians in emergency departments triage patients and prioritize diagnostic testing. When we first examine MedAssist, the system has been in production for six years and serves over 200 hospitals across the United States.
The original system, built by a team of clinicians and software engineers, uses a classic rule-based architecture --- the direct descendant of the expert systems described in Section 1.1.1 of this chapter. Over 4,500 hand-crafted if-then rules, developed in collaboration with emergency medicine physicians, encode clinical knowledge about symptom patterns, risk factors, vital sign thresholds, and diagnostic protocols.
For example, a simplified version of a triage rule might look like:
IF patient.age > 65
AND patient.chief_complaint CONTAINS "chest pain"
AND patient.systolic_bp < 90
THEN priority = CRITICAL
AND recommend = ["ECG", "Troponin", "CT Angiography"]
AND alert = "Possible acute coronary syndrome with hemodynamic instability"
This rule-based approach served MedAssist well for several years. It was transparent (physicians could inspect and understand every rule), auditable (regulators could trace any recommendation to specific clinical logic), and controllable (clinicians could add, modify, or remove rules as medical knowledge evolved).
The Problem
By 2018, MedAssist was encountering several escalating challenges with its rule-based system:
1. Knowledge Maintenance Burden With over 4,500 rules, the system had become difficult to maintain. Rules interacted in complex, sometimes contradictory ways. Adding a new rule required careful analysis of its interactions with existing rules, a process that could take weeks. The clinical advisory board estimated that approximately 15% of the rules were outdated based on new medical evidence, but updating them without breaking the system was a daunting task.
2. Limited Pattern Recognition The rule-based system could only recognize patterns that had been explicitly encoded. It could not detect subtle, multi-variate patterns in patient data that might indicate an emerging condition. For instance, a particular combination of slightly elevated heart rate, mildly abnormal lab values, and a specific demographic profile might indicate early sepsis, but no single measurement crossed a threshold that would trigger an existing rule.
3. Inability to Learn from Data MedAssist had accumulated a rich dataset of over 12 million patient encounters, including outcomes. The rule-based system could not leverage this data to improve its recommendations automatically. Each improvement required manual rule engineering by expensive clinical informatics specialists.
4. Competitive Pressure Newer competitors were marketing AI-powered clinical decision support tools that claimed superior predictive accuracy. Several of MedAssist's hospital clients were evaluating alternatives.
The Transition Strategy
MedAssist's leadership decided to modernize the system by incorporating machine learning, but they recognized that a complete replacement was neither feasible nor desirable. The transition needed to be incremental, safe, and transparent --- non-negotiable requirements in healthcare.
The company hired its first AI engineering team: two machine learning engineers, one data engineer, and one AI/ML platform engineer. They reported to a newly created Director of AI who had both clinical informatics and machine learning expertise.
Phase 1: Data Infrastructure (Months 1--4)
The team's first priority was building a proper data infrastructure --- the foundation layer of the AI stack.
Challenges encountered: - Patient data was stored in multiple incompatible formats across different hospital systems (HL7 v2, FHIR, proprietary CSV exports). - Data quality was inconsistent: missing values, coding errors, and temporal inconsistencies were common. - HIPAA compliance requirements imposed strict constraints on data handling, storage, and access.
Solutions implemented: - Built an ETL pipeline using Apache Spark to normalize data from multiple hospital systems into a common FHIR-based format. - Implemented a data quality monitoring system that flagged anomalies and missing data patterns. - Established a de-identification pipeline that removed protected health information (PHI) while preserving the clinical utility of the data. - Set up a secure cloud environment (AWS with HIPAA BAA) with proper access controls and audit logging.
The resulting dataset contained 8.2 million de-identified patient encounters with complete demographic information, vital signs, lab results, chief complaints, diagnoses, and 30-day outcomes.
Phase 2: Parallel ML Development (Months 3--10)
With data infrastructure taking shape, the ML engineers began developing predictive models in parallel with the existing rule-based system.
Approach: The team chose to focus on a specific, high-impact clinical task: early sepsis prediction. Sepsis is a life-threatening condition where early detection dramatically improves patient survival. The existing rule-based system used the SIRS (Systemic Inflammatory Response Syndrome) criteria, which had a known high false-positive rate.
Model development: 1. Baseline: The existing SIRS-based rules achieved a sensitivity of 68% and specificity of 72% for sepsis detection within the first 6 hours of ED arrival.
-
Classical ML model: A gradient-boosted decision tree (XGBoost) trained on 47 features extracted from vital signs, lab results, demographics, and clinical notes achieved 81% sensitivity and 85% specificity.
-
Deep learning model: A recurrent neural network (LSTM) that processed temporal sequences of vital signs and lab values achieved 87% sensitivity and 88% specificity, detecting sepsis an average of 4.2 hours earlier than the rule-based system.
Key engineering decisions: - The team used scikit-learn for feature engineering and classical ML, and PyTorch for the deep learning models. - All experiments were tracked in MLflow, enabling reproducibility and systematic comparison. - Feature engineering was a critical step: temporal features (rate of change in vital signs), interaction features (combinations of abnormal values), and time-windowed aggregations all significantly improved model performance.
Phase 3: Hybrid Architecture (Months 8--14)
Rather than replacing the rule-based system entirely, MedAssist implemented a hybrid architecture that combined the strengths of both approaches:
Patient Data
│
├──────────────────────────┐
▼ ▼
┌─────────────┐ ┌──────────────────┐
│ Rule-Based │ │ ML Prediction │
│ Engine │ │ Engine │
│ (4,500 rules)│ │ (Sepsis model + │
│ │ │ other models) │
└──────┬──────┘ └────────┬─────────┘
│ │
▼ ▼
┌─────────────────────────────────────┐
│ Fusion Layer │
│ - Rule-based alerts preserved │
│ - ML predictions add new signals │
│ - Conflict resolution logic │
│ - Confidence calibration │
└──────────────────┬──────────────────┘
▼
┌─────────────────────────────────────┐
│ Clinical Decision Dashboard │
│ - Prioritized recommendations │
│ - Explanation for each alert │
│ - Physician override capability │
└─────────────────────────────────────┘
Design principles of the hybrid system: 1. Safety first: Rule-based alerts for critical, well-understood conditions (e.g., cardiac arrest indicators) were never overridden by ML predictions. 2. Additive value: ML models added new capabilities (early sepsis detection) rather than replacing existing ones. 3. Transparency: Every ML-generated alert included an explanation --- the top contributing features and their values --- so physicians could evaluate the recommendation. 4. Physician authority: All recommendations were advisory. Physicians could override any alert with a documented reason.
Phase 4: Production Deployment and Monitoring (Months 12--18)
Deploying the hybrid system to production required careful orchestration:
Deployment strategy: - Shadow mode (Months 12--14): The ML models ran in parallel with the production rule-based system but did not surface predictions to clinicians. This allowed the team to monitor model behavior on live data without risk. - Limited rollout (Months 14--16): The hybrid system was deployed to 12 pilot hospitals. Clinical staff were trained on interpreting ML-generated alerts. Feedback was collected systematically. - Full rollout (Months 16--18): After demonstrating safety and efficacy at pilot sites, the system was deployed to all 200+ hospitals.
Monitoring infrastructure: - Real-time model performance tracking (sensitivity, specificity, alert rates). - Data drift detection comparing incoming patient data distributions to training data. - Alert fatigue monitoring (tracking how often clinicians acknowledged vs. dismissed alerts). - Automated retraining pipeline triggered when model performance dropped below predefined thresholds.
Results
After 12 months of full deployment, the hybrid system demonstrated significant improvements:
| Metric | Rule-Based Only | Hybrid System | Improvement |
|---|---|---|---|
| Sepsis detection sensitivity | 68% | 86% | +18 percentage points |
| Sepsis detection specificity | 72% | 87% | +15 percentage points |
| Average early detection time | Baseline | 3.8 hours earlier | Significant |
| Alert fatigue (false alerts/day) | 23.4 | 11.7 | -50% |
| Physician override rate | 12% | 8% | -4 percentage points |
| Estimated lives impacted | Baseline | ~340 improved outcomes/year | Significant |
Lessons Learned
1. Incremental transition beats wholesale replacement. The hybrid approach allowed MedAssist to preserve the strengths of the rule-based system (transparency, regulatory compliance, proven clinical logic) while adding the pattern recognition capabilities of machine learning. Attempting a complete replacement would have been riskier, slower, and more expensive.
2. Data infrastructure is the prerequisite. The team spent nearly half of the first year on data engineering. This investment was essential --- without clean, well-structured, properly governed data, the ML models would have been unreliable.
3. Domain expertise is non-negotiable. The AI engineering team worked closely with clinical advisors throughout the project. Domain knowledge informed feature engineering, model evaluation criteria, deployment strategy, and user interface design. The best ML model in the world is useless if clinicians do not trust or understand it.
4. Monitoring and feedback loops are critical in healthcare. In healthcare, model performance can degrade due to changes in patient populations, clinical practices, or data collection processes. Continuous monitoring and the ability to retrain quickly were essential for maintaining system reliability.
5. Explainability is a requirement, not a luxury. In a clinical setting, physicians need to understand why the system is making a recommendation. The team invested significant effort in generating meaningful explanations for ML predictions, and this investment directly contributed to clinician trust and adoption.
Discussion Questions
-
What are the ethical implications of using ML-based predictions in clinical decision-making? How should the system handle cases where the ML prediction contradicts the rule-based recommendation?
-
The company chose to deploy the ML system in an advisory capacity, preserving physician authority. Under what circumstances, if any, might it be appropriate for an AI system to make autonomous clinical decisions?
-
How would you design a regulatory approval process for an ML-based clinical decision support system? What evidence should be required?
-
The team focused on sepsis prediction as their first ML use case. What criteria should guide the selection of the first ML application in a domain where rule-based systems already exist?
-
How might the transition strategy differ if MedAssist were a startup without an existing rule-based system? What advantages and disadvantages would a greenfield approach present?