Case Study 2: TurbineTech Seasonal Drift and Sensor Calibration
Background
TurbineTech manufactures industrial gas turbines for power generation. Their data science team deployed a predictive maintenance model eight months ago. The model predicts the probability of a bearing failure within 30 days based on vibration sensor data, temperature readings, operating hours, and maintenance history. It was trained on 18 months of historical data and achieved a recall of 0.91 on the holdout set --- meaning it caught 91% of failures before they occurred.
The model runs daily. If the predicted failure probability exceeds 0.40, the turbine is flagged for inspection. Preventive maintenance costs $12,000 per turbine. An unplanned failure costs $340,000 in emergency repairs, downtime, and contractual penalties.
The model was deployed in March. It performed well through the spring and summer. In October, the maintenance team noticed something troubling: two bearing failures had occurred in turbines that the model had classified as "healthy" (predicted probability below 0.15). In November, a third failure was missed. The operations VP called an emergency meeting.
This case study examines two sources of drift that combine to degrade the model: seasonal temperature effects on sensor readings and a sensor calibration change that shifted the baseline of the primary vibration sensor.
Phase 1: The Data
TurbineTech's model uses the following features:
import numpy as np
import pandas as pd
np.random.seed(42)
feature_descriptions = {
"vibration_amplitude_mm_s": "Primary vibration sensor (mm/s RMS)",
"vibration_frequency_hz": "Dominant vibration frequency (Hz)",
"bearing_temp_c": "Bearing temperature (Celsius)",
"ambient_temp_c": "Ambient air temperature (Celsius)",
"operating_hours": "Total hours since last overhaul",
"load_pct": "Turbine load as percentage of rated capacity",
"oil_pressure_bar": "Lubricating oil pressure (bar)",
"oil_temp_c": "Lubricating oil temperature (Celsius)",
"rotor_speed_rpm": "Rotor speed (revolutions per minute)",
"days_since_maintenance": "Days since last scheduled maintenance",
}
for feature, desc in feature_descriptions.items():
print(f" {feature:35s} {desc}")
Training Data (March -- August, "Summer" Conditions)
The model was trained on data collected during spring and summer, when ambient temperatures range from 15 to 35 degrees Celsius.
n_train = 10000 # 10,000 daily turbine observations
training_data = pd.DataFrame({
"vibration_amplitude_mm_s": np.random.normal(2.8, 0.6, n_train).clip(0.5, 6.0),
"vibration_frequency_hz": np.random.normal(120, 15, n_train).clip(50, 250),
"bearing_temp_c": np.random.normal(75, 8, n_train).clip(40, 120),
"ambient_temp_c": np.random.normal(25, 5, n_train).clip(15, 35),
"operating_hours": np.random.uniform(100, 25000, n_train).round(0),
"load_pct": np.random.normal(72, 12, n_train).clip(20, 100),
"oil_pressure_bar": np.random.normal(4.2, 0.3, n_train).clip(2.5, 6.0),
"oil_temp_c": np.random.normal(68, 6, n_train).clip(40, 95),
"rotor_speed_rpm": np.random.normal(3600, 50, n_train).clip(3400, 3800),
"days_since_maintenance": np.random.uniform(1, 180, n_train).round(0),
})
# Failure labels: ~3% failure rate
failure_score = (
0.5 * (training_data["vibration_amplitude_mm_s"] - 2.8)
+ 0.3 * (training_data["bearing_temp_c"] - 75) / 8
+ 0.2 * (training_data["operating_hours"] / 25000)
+ 0.15 * (training_data["days_since_maintenance"] / 180)
- 0.1 * (training_data["oil_pressure_bar"] - 4.2) / 0.3
+ np.random.normal(0, 0.3, n_train)
)
training_labels = (failure_score > np.percentile(failure_score, 97)).astype(int)
print(f"Training data shape: {training_data.shape}")
print(f"Failure rate: {training_labels.mean():.3f}")
print(f"\nTraining feature summary:")
print(training_data.describe().round(2).to_string())
Phase 2: Two Sources of Drift
Source 1: Seasonal Temperature Effects
When winter arrives, ambient temperatures drop from a summer average of 25 degrees Celsius to a winter average of 2 degrees Celsius at TurbineTech's northern facility. This has cascading effects:
- Bearing temperature decreases by 8--12 degrees Celsius (thermal equilibrium with colder ambient air)
- Vibration amplitude decreases by 0.3--0.5 mm/s (thermal contraction of the turbine housing changes resonance characteristics)
- Oil viscosity increases, slightly increasing oil pressure and decreasing oil temperature
The model learned "low vibration = healthy" during the summer. In winter, vibrations are lower for all turbines, including turbines that are developing bearing faults. The failure threshold that was 4.2 mm/s in summer is effectively 3.7 mm/s in winter, but the model does not know this.
Source 2: Sensor Calibration Change
In September, the maintenance team replaced vibration sensors on 30% of turbines during a routine calibration cycle. The new sensors (Model VX-200) have a slightly different sensitivity than the old sensors (Model VX-150): they read approximately 0.15 mm/s lower for the same actual vibration. This is within the manufacturer's stated tolerance but creates a systematic bias in the data.
Nobody logged this change in the data pipeline. The feature vibration_amplitude_mm_s now has two populations: turbines with the old sensor (reading slightly high) and turbines with the new sensor (reading slightly low).
Simulating the Combined Drift
n_winter = 3000
# Winter production data: both temperature and calibration effects
winter_data = pd.DataFrame({
# Vibration: lower due to temperature (-0.4) and calibration shift (-0.15 for 30%)
"vibration_amplitude_mm_s": np.where(
np.random.random(n_winter) < 0.30,
np.random.normal(2.25, 0.6, n_winter).clip(0.5, 6.0), # New sensor: -0.55
np.random.normal(2.40, 0.6, n_winter).clip(0.5, 6.0), # Old sensor: -0.40
),
"vibration_frequency_hz": np.random.normal(118, 15, n_winter).clip(50, 250),
"bearing_temp_c": np.random.normal(65, 8, n_winter).clip(30, 110), # Lower in winter
"ambient_temp_c": np.random.normal(2, 6, n_winter).clip(-15, 15), # Winter temps
"operating_hours": np.random.uniform(100, 25000, n_winter).round(0),
"load_pct": np.random.normal(78, 10, n_winter).clip(20, 100), # Higher load in winter (heating demand)
"oil_pressure_bar": np.random.normal(4.4, 0.3, n_winter).clip(2.5, 6.0), # Slightly higher (viscosity)
"oil_temp_c": np.random.normal(62, 6, n_winter).clip(35, 90), # Lower in winter
"rotor_speed_rpm": np.random.normal(3600, 50, n_winter).clip(3400, 3800),
"days_since_maintenance": np.random.uniform(1, 180, n_winter).round(0),
})
print("Winter production data summary:")
print(winter_data.describe().round(2).to_string())
Phase 3: Monitoring Detects the Drift
The monitoring system runs weekly PSI checks. Here is what it finds:
def compute_psi(reference, production, n_bins=10, eps=1e-4):
bin_edges = np.quantile(reference, np.linspace(0, 1, n_bins + 1))
bin_edges[0] = -np.inf
bin_edges[-1] = np.inf
ref_counts = np.histogram(reference, bins=bin_edges)[0]
prod_counts = np.histogram(production, bins=bin_edges)[0]
ref_proportions = ref_counts / len(reference) + eps
prod_proportions = prod_counts / len(production) + eps
return np.sum(
(prod_proportions - ref_proportions)
* np.log(prod_proportions / ref_proportions)
)
print("Winter Drift Report:")
print("=" * 60)
for col in training_data.columns:
psi = compute_psi(training_data[col].values, winter_data[col].values)
status = "stable" if psi < 0.10 else "investigate" if psi < 0.25 else "RETRAIN"
flag = " <<<" if psi >= 0.25 else " <" if psi >= 0.10 else ""
print(f" {col:35s} PSI={psi:.4f} [{status}]{flag}")
Expected results:
| Feature | Approximate PSI | Status |
|---|---|---|
| vibration_amplitude_mm_s | 0.30+ | RETRAIN |
| bearing_temp_c | 0.35+ | RETRAIN |
| ambient_temp_c | 2.50+ | RETRAIN |
| oil_temp_c | 0.25+ | RETRAIN |
| oil_pressure_bar | 0.12 | Investigate |
| load_pct | 0.10 | Investigate |
| vibration_frequency_hz | < 0.05 | Stable |
| operating_hours | < 0.05 | Stable |
| rotor_speed_rpm | < 0.05 | Stable |
| days_since_maintenance | < 0.05 | Stable |
Four features are in the "retrain" zone. ambient_temp_c has the highest PSI by far --- its entire range has shifted. The monitoring system fires a critical alert.
Phase 4: Diagnosing the Root Cause
The alert fires, and the team investigates. The diagnosis process separates expected drift from unexpected drift:
Expected drift (seasonal):
- ambient_temp_c: Of course it is different --- it is winter. This is not a bug; it is physics.
- bearing_temp_c, oil_temp_c, oil_pressure_bar: These follow ambient temperature. Expected.
Expected drift (load):
- load_pct: Winter heating demand increases turbine load. Expected.
Unexpected drift (sensor calibration):
- vibration_amplitude_mm_s: The mean dropped more than the seasonal effect alone explains. After correlating with maintenance records, the team discovers the sensor replacement.
# Decomposing the vibration drift
expected_seasonal_shift = -0.40 # mm/s, from temperature effect
observed_mean_shift = winter_data["vibration_amplitude_mm_s"].mean() - training_data["vibration_amplitude_mm_s"].mean()
unexplained_shift = observed_mean_shift - expected_seasonal_shift
print(f"Expected seasonal shift: {expected_seasonal_shift:.2f} mm/s")
print(f"Observed mean shift: {observed_mean_shift:.2f} mm/s")
print(f"Unexplained shift: {unexplained_shift:.2f} mm/s")
print(f"\nThe unexplained shift likely comes from the VX-200 sensor calibration.")
Key Lesson --- Not all drift requires the same response. Seasonal drift is predictable and can be addressed with feature engineering (temperature normalization). Sensor calibration drift is a data quality issue that should be fixed at the source. The monitoring system detected both, but the response is different for each.
Phase 5: The Response
TurbineTech implements a three-part fix:
Fix 1: Temperature Normalization (Feature Engineering)
Add temperature-adjusted features that remove the seasonal component:
def add_temperature_adjusted_features(df: pd.DataFrame) -> pd.DataFrame:
"""Add features that normalize for ambient temperature effects."""
df = df.copy()
# Temperature-adjusted vibration
# Baseline: at 25C ambient, no adjustment. Each degree below 25C
# reduces expected vibration by ~0.017 mm/s (empirical coefficient).
temp_coefficient = 0.017
reference_temp = 25.0
df["vibration_temp_adjusted"] = (
df["vibration_amplitude_mm_s"]
+ temp_coefficient * (reference_temp - df["ambient_temp_c"])
)
# Temperature-adjusted bearing temp (relative to ambient)
df["bearing_temp_delta"] = df["bearing_temp_c"] - df["ambient_temp_c"]
# Temperature-adjusted oil temp (relative to ambient)
df["oil_temp_delta"] = df["oil_temp_c"] - df["ambient_temp_c"]
return df
# Apply to both training and winter data
training_adjusted = add_temperature_adjusted_features(training_data)
winter_adjusted = add_temperature_adjusted_features(winter_data)
# Check PSI on adjusted vibration
psi_raw = compute_psi(
training_data["vibration_amplitude_mm_s"].values,
winter_data["vibration_amplitude_mm_s"].values,
)
psi_adjusted = compute_psi(
training_adjusted["vibration_temp_adjusted"].values,
winter_adjusted["vibration_temp_adjusted"].values,
)
print(f"PSI (raw vibration): {psi_raw:.4f}")
print(f"PSI (temperature-adjusted): {psi_adjusted:.4f}")
print(f"Reduction: {(1 - psi_adjusted / psi_raw) * 100:.1f}%")
Temperature adjustment reduces the vibration PSI substantially, but it does not eliminate it entirely --- the remaining drift comes from the sensor calibration issue.
Fix 2: Sensor Calibration Correction
After identifying the VX-200 sensor offset, the team adds a correction to the data pipeline:
def correct_sensor_calibration(
df: pd.DataFrame,
sensor_type_col: str = "sensor_model",
vibration_col: str = "vibration_amplitude_mm_s",
calibration_offsets: dict | None = None,
) -> pd.DataFrame:
"""
Apply sensor-specific calibration offsets.
Parameters
----------
calibration_offsets : dict
Mapping from sensor model to offset (added to reading).
Example: {"VX-200": 0.15} means VX-200 reads 0.15 mm/s low.
"""
if calibration_offsets is None:
calibration_offsets = {"VX-150": 0.0, "VX-200": 0.15}
df = df.copy()
for model, offset in calibration_offsets.items():
mask = df[sensor_type_col] == model
df.loc[mask, vibration_col] += offset
return df
# In production, the pipeline now applies this correction before prediction
print("Sensor calibration correction applied to data pipeline.")
print("VX-200 sensors: +0.15 mm/s offset added to match VX-150 baseline.")
Practical Note --- The correct fix for sensor calibration drift is in the data pipeline, not in the model. If you retrain the model on uncorrected data, it learns the mixed-sensor distribution. When the next batch of sensors is replaced, you have the same problem again. Fix the root cause.
Fix 3: Retrain with Multi-Season Data
With both fixes in place, the team retrains the model on data spanning March through November --- covering both summer and winter conditions:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score, precision_score, roc_auc_score
# Combine summer and winter data (with corrections applied)
np.random.seed(42)
n_combined = 15000
# Simulate combined dataset spanning all seasons
combined_data = pd.DataFrame({
"vibration_temp_adjusted": np.random.normal(2.8, 0.6, n_combined).clip(0.5, 6.0),
"vibration_frequency_hz": np.random.normal(119, 15, n_combined).clip(50, 250),
"bearing_temp_delta": np.random.normal(50, 5, n_combined).clip(30, 70),
"ambient_temp_c": np.concatenate([
np.random.normal(25, 5, n_combined // 2).clip(10, 38),
np.random.normal(2, 6, n_combined // 2).clip(-15, 15),
]),
"operating_hours": np.random.uniform(100, 25000, n_combined).round(0),
"load_pct": np.random.normal(75, 11, n_combined).clip(20, 100),
"oil_pressure_bar": np.random.normal(4.3, 0.3, n_combined).clip(2.5, 6.0),
"oil_temp_delta": np.random.normal(43, 4, n_combined).clip(25, 60),
"rotor_speed_rpm": np.random.normal(3600, 50, n_combined).clip(3400, 3800),
"days_since_maintenance": np.random.uniform(1, 180, n_combined).round(0),
})
# Failure labels
failure_score = (
0.5 * (combined_data["vibration_temp_adjusted"] - 2.8)
+ 0.3 * (combined_data["bearing_temp_delta"] - 50) / 5
+ 0.2 * (combined_data["operating_hours"] / 25000)
+ 0.15 * (combined_data["days_since_maintenance"] / 180)
- 0.1 * (combined_data["oil_pressure_bar"] - 4.3) / 0.3
+ np.random.normal(0, 0.3, n_combined)
)
combined_labels = (failure_score > np.percentile(failure_score, 97)).astype(int)
X_train, X_val, y_train, y_val = train_test_split(
combined_data, combined_labels, test_size=0.2,
stratify=combined_labels, random_state=42,
)
model_v2 = GradientBoostingClassifier(
n_estimators=300, learning_rate=0.05, max_depth=4, random_state=42,
)
model_v2.fit(X_train, y_train)
val_pred = model_v2.predict_proba(X_val)[:, 1]
val_pred_binary = (val_pred > 0.40).astype(int)
print(f"Retrained model performance:")
print(f" AUC: {roc_auc_score(y_val, val_pred):.4f}")
print(f" Recall: {recall_score(y_val, val_pred_binary):.4f}")
print(f" Precision: {precision_score(y_val, val_pred_binary):.4f}")
Phase 6: Ongoing Monitoring Design
The team implements a monitoring system designed for the specific challenges of manufacturing data:
# TurbineTech monitoring configuration
turbinetech_alert_rules = [
# Temperature-adjusted features should be stable year-round
{
"metric": "psi_vibration_temp_adjusted",
"warning": 0.10,
"critical": 0.20, # Lower than default: vibration drift is high-cost
"rationale": "Missed failures cost $340K. Use tighter thresholds.",
},
# Raw ambient temp will drift seasonally -- monitor but do not alert
{
"metric": "psi_ambient_temp_c",
"warning": None, # Expected to drift
"critical": None, # Not actionable
"rationale": "Seasonal drift is expected. Monitor delta features instead.",
},
# Performance monitoring (labels arrive in 30 days)
{
"metric": "recall_30d",
"warning": 0.85,
"critical": 0.80,
"rationale": "Missing >20% of failures is unacceptable.",
},
# Sensor population monitoring
{
"metric": "sensor_mix_shift",
"warning": 0.10,
"critical": 0.20,
"rationale": "Sensor replacements change the data generation process.",
},
]
print("TurbineTech Monitoring Configuration:")
print("=" * 65)
for rule in turbinetech_alert_rules:
warn = rule["warning"] if rule["warning"] is not None else "disabled"
crit = rule["critical"] if rule["critical"] is not None else "disabled"
print(f" {rule['metric']:35s} warn={warn} crit={crit}")
print(f" Rationale: {rule['rationale']}")
print()
Design Principle --- Manufacturing monitoring requires asymmetric cost awareness. A false alarm (unnecessary inspection at $12,000) is annoying but recoverable. A missed failure ($340,000) is catastrophic. Set tighter drift thresholds on the features that most directly predict failure, and accept more false alarms to catch true positives.
Phase 7: Lessons for Manufacturing ML
Lesson 1: Seasonal Effects Are Not Drift --- They Are Missing Training Data
If the model had been trained on a full year of data (all four seasons), ambient temperature and its downstream effects would not register as drift. The root cause was a training window that did not cover the full range of operating conditions. The fix is not better monitoring --- it is better training data.
Lesson 2: Sensor Changes Are Data Quality Issues, Not Model Issues
The VX-200 sensor offset is a data generation process change. The correct fix is in the data pipeline (calibration correction), not in the model. Retraining on uncorrected data learns the wrong thing. Always fix data quality issues at the source.
Lesson 3: Temperature Normalization Should Have Been a Feature from Day One
Features that are confounded by a known external variable (ambient temperature) should be engineered to remove that confound. vibration_temp_adjusted is more predictive and more stable than raw vibration_amplitude_mm_s. This is domain knowledge expressed as feature engineering.
Lesson 4: Manufacturing Models Need Tighter Monitoring Thresholds
The standard PSI thresholds (0.10 warning, 0.25 retrain) were developed for credit scoring, where the cost of a bad prediction is moderate. In manufacturing, missed predictions can cause hundreds of thousands of dollars in damage. Adjust thresholds based on the cost asymmetry in your domain.
Lesson 5: Communicate Sensor Changes to the Data Science Team
The sensor replacement was routine maintenance. Nobody thought to inform the data science team because "the new sensors are equivalent." They are equivalent within the manufacturer's tolerance --- but the model is sensitive to a 0.15 mm/s shift. Any change to the data generation process (sensors, calibration, data collection frequency, preprocessing logic) must be communicated to the team that depends on that data.
Summary Table
| Phase | What Happened | How Monitoring Helped |
|---|---|---|
| Deployment (March) | Model trained on spring/summer data | Baseline established |
| Summer (April--September) | Model performs well | Weekly PSI checks: all green |
| Sensor replacement (September) | VX-200 sensors installed on 30% of fleet | Not yet detected (effect is small alone) |
| Winter onset (October) | Ambient temperature drops 23 degrees Celsius | PSI alerts fire: 4 features in "retrain" zone |
| Missed failures (October--November) | Three bearing failures not predicted | Performance monitoring (30-day delay) confirms recall drop |
| Root cause analysis | Seasonal + calibration effects identified | Feature-level PSI pinpoints which features drifted |
| Fix deployed (December) | Temperature normalization + calibration correction + retrained model | New baseline established with multi-season data |
This case study supports Chapter 32: Monitoring Models in Production. Return to the chapter for the full framework.