Case Study 2: TurbineTech Seasonal Drift and Sensor Calibration


Background

TurbineTech manufactures industrial gas turbines for power generation. Their data science team deployed a predictive maintenance model eight months ago. The model predicts the probability of a bearing failure within 30 days based on vibration sensor data, temperature readings, operating hours, and maintenance history. It was trained on 18 months of historical data and achieved a recall of 0.91 on the holdout set --- meaning it caught 91% of failures before they occurred.

The model runs daily. If the predicted failure probability exceeds 0.40, the turbine is flagged for inspection. Preventive maintenance costs $12,000 per turbine. An unplanned failure costs $340,000 in emergency repairs, downtime, and contractual penalties.

The model was deployed in March. It performed well through the spring and summer. In October, the maintenance team noticed something troubling: two bearing failures had occurred in turbines that the model had classified as "healthy" (predicted probability below 0.15). In November, a third failure was missed. The operations VP called an emergency meeting.

This case study examines two sources of drift that combine to degrade the model: seasonal temperature effects on sensor readings and a sensor calibration change that shifted the baseline of the primary vibration sensor.


Phase 1: The Data

TurbineTech's model uses the following features:

import numpy as np
import pandas as pd

np.random.seed(42)

feature_descriptions = {
    "vibration_amplitude_mm_s": "Primary vibration sensor (mm/s RMS)",
    "vibration_frequency_hz": "Dominant vibration frequency (Hz)",
    "bearing_temp_c": "Bearing temperature (Celsius)",
    "ambient_temp_c": "Ambient air temperature (Celsius)",
    "operating_hours": "Total hours since last overhaul",
    "load_pct": "Turbine load as percentage of rated capacity",
    "oil_pressure_bar": "Lubricating oil pressure (bar)",
    "oil_temp_c": "Lubricating oil temperature (Celsius)",
    "rotor_speed_rpm": "Rotor speed (revolutions per minute)",
    "days_since_maintenance": "Days since last scheduled maintenance",
}

for feature, desc in feature_descriptions.items():
    print(f"  {feature:35s} {desc}")

Training Data (March -- August, "Summer" Conditions)

The model was trained on data collected during spring and summer, when ambient temperatures range from 15 to 35 degrees Celsius.

n_train = 10000  # 10,000 daily turbine observations

training_data = pd.DataFrame({
    "vibration_amplitude_mm_s": np.random.normal(2.8, 0.6, n_train).clip(0.5, 6.0),
    "vibration_frequency_hz": np.random.normal(120, 15, n_train).clip(50, 250),
    "bearing_temp_c": np.random.normal(75, 8, n_train).clip(40, 120),
    "ambient_temp_c": np.random.normal(25, 5, n_train).clip(15, 35),
    "operating_hours": np.random.uniform(100, 25000, n_train).round(0),
    "load_pct": np.random.normal(72, 12, n_train).clip(20, 100),
    "oil_pressure_bar": np.random.normal(4.2, 0.3, n_train).clip(2.5, 6.0),
    "oil_temp_c": np.random.normal(68, 6, n_train).clip(40, 95),
    "rotor_speed_rpm": np.random.normal(3600, 50, n_train).clip(3400, 3800),
    "days_since_maintenance": np.random.uniform(1, 180, n_train).round(0),
})

# Failure labels: ~3% failure rate
failure_score = (
    0.5 * (training_data["vibration_amplitude_mm_s"] - 2.8)
    + 0.3 * (training_data["bearing_temp_c"] - 75) / 8
    + 0.2 * (training_data["operating_hours"] / 25000)
    + 0.15 * (training_data["days_since_maintenance"] / 180)
    - 0.1 * (training_data["oil_pressure_bar"] - 4.2) / 0.3
    + np.random.normal(0, 0.3, n_train)
)
training_labels = (failure_score > np.percentile(failure_score, 97)).astype(int)

print(f"Training data shape: {training_data.shape}")
print(f"Failure rate: {training_labels.mean():.3f}")
print(f"\nTraining feature summary:")
print(training_data.describe().round(2).to_string())

Phase 2: Two Sources of Drift

Source 1: Seasonal Temperature Effects

When winter arrives, ambient temperatures drop from a summer average of 25 degrees Celsius to a winter average of 2 degrees Celsius at TurbineTech's northern facility. This has cascading effects:

  • Bearing temperature decreases by 8--12 degrees Celsius (thermal equilibrium with colder ambient air)
  • Vibration amplitude decreases by 0.3--0.5 mm/s (thermal contraction of the turbine housing changes resonance characteristics)
  • Oil viscosity increases, slightly increasing oil pressure and decreasing oil temperature

The model learned "low vibration = healthy" during the summer. In winter, vibrations are lower for all turbines, including turbines that are developing bearing faults. The failure threshold that was 4.2 mm/s in summer is effectively 3.7 mm/s in winter, but the model does not know this.

Source 2: Sensor Calibration Change

In September, the maintenance team replaced vibration sensors on 30% of turbines during a routine calibration cycle. The new sensors (Model VX-200) have a slightly different sensitivity than the old sensors (Model VX-150): they read approximately 0.15 mm/s lower for the same actual vibration. This is within the manufacturer's stated tolerance but creates a systematic bias in the data.

Nobody logged this change in the data pipeline. The feature vibration_amplitude_mm_s now has two populations: turbines with the old sensor (reading slightly high) and turbines with the new sensor (reading slightly low).

Simulating the Combined Drift

n_winter = 3000

# Winter production data: both temperature and calibration effects
winter_data = pd.DataFrame({
    # Vibration: lower due to temperature (-0.4) and calibration shift (-0.15 for 30%)
    "vibration_amplitude_mm_s": np.where(
        np.random.random(n_winter) < 0.30,
        np.random.normal(2.25, 0.6, n_winter).clip(0.5, 6.0),  # New sensor: -0.55
        np.random.normal(2.40, 0.6, n_winter).clip(0.5, 6.0),  # Old sensor: -0.40
    ),
    "vibration_frequency_hz": np.random.normal(118, 15, n_winter).clip(50, 250),
    "bearing_temp_c": np.random.normal(65, 8, n_winter).clip(30, 110),  # Lower in winter
    "ambient_temp_c": np.random.normal(2, 6, n_winter).clip(-15, 15),   # Winter temps
    "operating_hours": np.random.uniform(100, 25000, n_winter).round(0),
    "load_pct": np.random.normal(78, 10, n_winter).clip(20, 100),  # Higher load in winter (heating demand)
    "oil_pressure_bar": np.random.normal(4.4, 0.3, n_winter).clip(2.5, 6.0),  # Slightly higher (viscosity)
    "oil_temp_c": np.random.normal(62, 6, n_winter).clip(35, 90),  # Lower in winter
    "rotor_speed_rpm": np.random.normal(3600, 50, n_winter).clip(3400, 3800),
    "days_since_maintenance": np.random.uniform(1, 180, n_winter).round(0),
})

print("Winter production data summary:")
print(winter_data.describe().round(2).to_string())

Phase 3: Monitoring Detects the Drift

The monitoring system runs weekly PSI checks. Here is what it finds:

def compute_psi(reference, production, n_bins=10, eps=1e-4):
    bin_edges = np.quantile(reference, np.linspace(0, 1, n_bins + 1))
    bin_edges[0] = -np.inf
    bin_edges[-1] = np.inf
    ref_counts = np.histogram(reference, bins=bin_edges)[0]
    prod_counts = np.histogram(production, bins=bin_edges)[0]
    ref_proportions = ref_counts / len(reference) + eps
    prod_proportions = prod_counts / len(production) + eps
    return np.sum(
        (prod_proportions - ref_proportions)
        * np.log(prod_proportions / ref_proportions)
    )


print("Winter Drift Report:")
print("=" * 60)
for col in training_data.columns:
    psi = compute_psi(training_data[col].values, winter_data[col].values)
    status = "stable" if psi < 0.10 else "investigate" if psi < 0.25 else "RETRAIN"
    flag = " <<<" if psi >= 0.25 else " <" if psi >= 0.10 else ""
    print(f"  {col:35s}  PSI={psi:.4f}  [{status}]{flag}")

Expected results:

Feature Approximate PSI Status
vibration_amplitude_mm_s 0.30+ RETRAIN
bearing_temp_c 0.35+ RETRAIN
ambient_temp_c 2.50+ RETRAIN
oil_temp_c 0.25+ RETRAIN
oil_pressure_bar 0.12 Investigate
load_pct 0.10 Investigate
vibration_frequency_hz < 0.05 Stable
operating_hours < 0.05 Stable
rotor_speed_rpm < 0.05 Stable
days_since_maintenance < 0.05 Stable

Four features are in the "retrain" zone. ambient_temp_c has the highest PSI by far --- its entire range has shifted. The monitoring system fires a critical alert.


Phase 4: Diagnosing the Root Cause

The alert fires, and the team investigates. The diagnosis process separates expected drift from unexpected drift:

Expected drift (seasonal): - ambient_temp_c: Of course it is different --- it is winter. This is not a bug; it is physics. - bearing_temp_c, oil_temp_c, oil_pressure_bar: These follow ambient temperature. Expected.

Expected drift (load): - load_pct: Winter heating demand increases turbine load. Expected.

Unexpected drift (sensor calibration): - vibration_amplitude_mm_s: The mean dropped more than the seasonal effect alone explains. After correlating with maintenance records, the team discovers the sensor replacement.

# Decomposing the vibration drift
expected_seasonal_shift = -0.40   # mm/s, from temperature effect
observed_mean_shift = winter_data["vibration_amplitude_mm_s"].mean() - training_data["vibration_amplitude_mm_s"].mean()
unexplained_shift = observed_mean_shift - expected_seasonal_shift

print(f"Expected seasonal shift: {expected_seasonal_shift:.2f} mm/s")
print(f"Observed mean shift:     {observed_mean_shift:.2f} mm/s")
print(f"Unexplained shift:       {unexplained_shift:.2f} mm/s")
print(f"\nThe unexplained shift likely comes from the VX-200 sensor calibration.")

Key Lesson --- Not all drift requires the same response. Seasonal drift is predictable and can be addressed with feature engineering (temperature normalization). Sensor calibration drift is a data quality issue that should be fixed at the source. The monitoring system detected both, but the response is different for each.


Phase 5: The Response

TurbineTech implements a three-part fix:

Fix 1: Temperature Normalization (Feature Engineering)

Add temperature-adjusted features that remove the seasonal component:

def add_temperature_adjusted_features(df: pd.DataFrame) -> pd.DataFrame:
    """Add features that normalize for ambient temperature effects."""
    df = df.copy()

    # Temperature-adjusted vibration
    # Baseline: at 25C ambient, no adjustment. Each degree below 25C
    # reduces expected vibration by ~0.017 mm/s (empirical coefficient).
    temp_coefficient = 0.017
    reference_temp = 25.0
    df["vibration_temp_adjusted"] = (
        df["vibration_amplitude_mm_s"]
        + temp_coefficient * (reference_temp - df["ambient_temp_c"])
    )

    # Temperature-adjusted bearing temp (relative to ambient)
    df["bearing_temp_delta"] = df["bearing_temp_c"] - df["ambient_temp_c"]

    # Temperature-adjusted oil temp (relative to ambient)
    df["oil_temp_delta"] = df["oil_temp_c"] - df["ambient_temp_c"]

    return df


# Apply to both training and winter data
training_adjusted = add_temperature_adjusted_features(training_data)
winter_adjusted = add_temperature_adjusted_features(winter_data)

# Check PSI on adjusted vibration
psi_raw = compute_psi(
    training_data["vibration_amplitude_mm_s"].values,
    winter_data["vibration_amplitude_mm_s"].values,
)
psi_adjusted = compute_psi(
    training_adjusted["vibration_temp_adjusted"].values,
    winter_adjusted["vibration_temp_adjusted"].values,
)

print(f"PSI (raw vibration):              {psi_raw:.4f}")
print(f"PSI (temperature-adjusted):       {psi_adjusted:.4f}")
print(f"Reduction:                        {(1 - psi_adjusted / psi_raw) * 100:.1f}%")

Temperature adjustment reduces the vibration PSI substantially, but it does not eliminate it entirely --- the remaining drift comes from the sensor calibration issue.

Fix 2: Sensor Calibration Correction

After identifying the VX-200 sensor offset, the team adds a correction to the data pipeline:

def correct_sensor_calibration(
    df: pd.DataFrame,
    sensor_type_col: str = "sensor_model",
    vibration_col: str = "vibration_amplitude_mm_s",
    calibration_offsets: dict | None = None,
) -> pd.DataFrame:
    """
    Apply sensor-specific calibration offsets.

    Parameters
    ----------
    calibration_offsets : dict
        Mapping from sensor model to offset (added to reading).
        Example: {"VX-200": 0.15} means VX-200 reads 0.15 mm/s low.
    """
    if calibration_offsets is None:
        calibration_offsets = {"VX-150": 0.0, "VX-200": 0.15}

    df = df.copy()
    for model, offset in calibration_offsets.items():
        mask = df[sensor_type_col] == model
        df.loc[mask, vibration_col] += offset
    return df


# In production, the pipeline now applies this correction before prediction
print("Sensor calibration correction applied to data pipeline.")
print("VX-200 sensors: +0.15 mm/s offset added to match VX-150 baseline.")

Practical Note --- The correct fix for sensor calibration drift is in the data pipeline, not in the model. If you retrain the model on uncorrected data, it learns the mixed-sensor distribution. When the next batch of sensors is replaced, you have the same problem again. Fix the root cause.

Fix 3: Retrain with Multi-Season Data

With both fixes in place, the team retrains the model on data spanning March through November --- covering both summer and winter conditions:

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score, precision_score, roc_auc_score

# Combine summer and winter data (with corrections applied)
np.random.seed(42)
n_combined = 15000

# Simulate combined dataset spanning all seasons
combined_data = pd.DataFrame({
    "vibration_temp_adjusted": np.random.normal(2.8, 0.6, n_combined).clip(0.5, 6.0),
    "vibration_frequency_hz": np.random.normal(119, 15, n_combined).clip(50, 250),
    "bearing_temp_delta": np.random.normal(50, 5, n_combined).clip(30, 70),
    "ambient_temp_c": np.concatenate([
        np.random.normal(25, 5, n_combined // 2).clip(10, 38),
        np.random.normal(2, 6, n_combined // 2).clip(-15, 15),
    ]),
    "operating_hours": np.random.uniform(100, 25000, n_combined).round(0),
    "load_pct": np.random.normal(75, 11, n_combined).clip(20, 100),
    "oil_pressure_bar": np.random.normal(4.3, 0.3, n_combined).clip(2.5, 6.0),
    "oil_temp_delta": np.random.normal(43, 4, n_combined).clip(25, 60),
    "rotor_speed_rpm": np.random.normal(3600, 50, n_combined).clip(3400, 3800),
    "days_since_maintenance": np.random.uniform(1, 180, n_combined).round(0),
})

# Failure labels
failure_score = (
    0.5 * (combined_data["vibration_temp_adjusted"] - 2.8)
    + 0.3 * (combined_data["bearing_temp_delta"] - 50) / 5
    + 0.2 * (combined_data["operating_hours"] / 25000)
    + 0.15 * (combined_data["days_since_maintenance"] / 180)
    - 0.1 * (combined_data["oil_pressure_bar"] - 4.3) / 0.3
    + np.random.normal(0, 0.3, n_combined)
)
combined_labels = (failure_score > np.percentile(failure_score, 97)).astype(int)

X_train, X_val, y_train, y_val = train_test_split(
    combined_data, combined_labels, test_size=0.2,
    stratify=combined_labels, random_state=42,
)

model_v2 = GradientBoostingClassifier(
    n_estimators=300, learning_rate=0.05, max_depth=4, random_state=42,
)
model_v2.fit(X_train, y_train)
val_pred = model_v2.predict_proba(X_val)[:, 1]
val_pred_binary = (val_pred > 0.40).astype(int)

print(f"Retrained model performance:")
print(f"  AUC:       {roc_auc_score(y_val, val_pred):.4f}")
print(f"  Recall:    {recall_score(y_val, val_pred_binary):.4f}")
print(f"  Precision: {precision_score(y_val, val_pred_binary):.4f}")

Phase 6: Ongoing Monitoring Design

The team implements a monitoring system designed for the specific challenges of manufacturing data:

# TurbineTech monitoring configuration
turbinetech_alert_rules = [
    # Temperature-adjusted features should be stable year-round
    {
        "metric": "psi_vibration_temp_adjusted",
        "warning": 0.10,
        "critical": 0.20,  # Lower than default: vibration drift is high-cost
        "rationale": "Missed failures cost $340K. Use tighter thresholds.",
    },
    # Raw ambient temp will drift seasonally -- monitor but do not alert
    {
        "metric": "psi_ambient_temp_c",
        "warning": None,   # Expected to drift
        "critical": None,   # Not actionable
        "rationale": "Seasonal drift is expected. Monitor delta features instead.",
    },
    # Performance monitoring (labels arrive in 30 days)
    {
        "metric": "recall_30d",
        "warning": 0.85,
        "critical": 0.80,
        "rationale": "Missing >20% of failures is unacceptable.",
    },
    # Sensor population monitoring
    {
        "metric": "sensor_mix_shift",
        "warning": 0.10,
        "critical": 0.20,
        "rationale": "Sensor replacements change the data generation process.",
    },
]

print("TurbineTech Monitoring Configuration:")
print("=" * 65)
for rule in turbinetech_alert_rules:
    warn = rule["warning"] if rule["warning"] is not None else "disabled"
    crit = rule["critical"] if rule["critical"] is not None else "disabled"
    print(f"  {rule['metric']:35s}  warn={warn}  crit={crit}")
    print(f"    Rationale: {rule['rationale']}")
    print()

Design Principle --- Manufacturing monitoring requires asymmetric cost awareness. A false alarm (unnecessary inspection at $12,000) is annoying but recoverable. A missed failure ($340,000) is catastrophic. Set tighter drift thresholds on the features that most directly predict failure, and accept more false alarms to catch true positives.


Phase 7: Lessons for Manufacturing ML

Lesson 1: Seasonal Effects Are Not Drift --- They Are Missing Training Data

If the model had been trained on a full year of data (all four seasons), ambient temperature and its downstream effects would not register as drift. The root cause was a training window that did not cover the full range of operating conditions. The fix is not better monitoring --- it is better training data.

Lesson 2: Sensor Changes Are Data Quality Issues, Not Model Issues

The VX-200 sensor offset is a data generation process change. The correct fix is in the data pipeline (calibration correction), not in the model. Retraining on uncorrected data learns the wrong thing. Always fix data quality issues at the source.

Lesson 3: Temperature Normalization Should Have Been a Feature from Day One

Features that are confounded by a known external variable (ambient temperature) should be engineered to remove that confound. vibration_temp_adjusted is more predictive and more stable than raw vibration_amplitude_mm_s. This is domain knowledge expressed as feature engineering.

Lesson 4: Manufacturing Models Need Tighter Monitoring Thresholds

The standard PSI thresholds (0.10 warning, 0.25 retrain) were developed for credit scoring, where the cost of a bad prediction is moderate. In manufacturing, missed predictions can cause hundreds of thousands of dollars in damage. Adjust thresholds based on the cost asymmetry in your domain.

Lesson 5: Communicate Sensor Changes to the Data Science Team

The sensor replacement was routine maintenance. Nobody thought to inform the data science team because "the new sensors are equivalent." They are equivalent within the manufacturer's tolerance --- but the model is sensitive to a 0.15 mm/s shift. Any change to the data generation process (sensors, calibration, data collection frequency, preprocessing logic) must be communicated to the team that depends on that data.


Summary Table

Phase What Happened How Monitoring Helped
Deployment (March) Model trained on spring/summer data Baseline established
Summer (April--September) Model performs well Weekly PSI checks: all green
Sensor replacement (September) VX-200 sensors installed on 30% of fleet Not yet detected (effect is small alone)
Winter onset (October) Ambient temperature drops 23 degrees Celsius PSI alerts fire: 4 features in "retrain" zone
Missed failures (October--November) Three bearing failures not predicted Performance monitoring (30-day delay) confirms recall drop
Root cause analysis Seasonal + calibration effects identified Feature-level PSI pinpoints which features drifted
Fix deployed (December) Temperature normalization + calibration correction + retrained model New baseline established with multi-season data

This case study supports Chapter 32: Monitoring Models in Production. Return to the chapter for the full framework.