Case Study 2: TurbineTech --- When Sensors Die Before Turbines Do


Background

TurbineTech operates 1,200 wind turbines across 14 wind farms in the western United States. Each turbine is instrumented with 847 sensors measuring vibration, temperature, pressure, rotational speed, oil quality, and structural strain. The sensor data streams into a centralized monitoring system at 10-second intervals --- roughly 7.3 million readings per turbine per day.

The predictive maintenance team built a bearing failure model using this sensor data. Bearing failure is the most expensive unplanned maintenance event: $456,000 per incident (7 days of downtime at $48,000/day plus $120,000 in emergency repairs). The model is supposed to predict bearing failure 72 hours in advance, giving the maintenance crew enough time to schedule a planned repair ($20,000 total).

The model works well for some failure modes but misses a specific, expensive class of failures: progressive degradation events where the bearing overheats slowly over days or weeks. These events account for 30% of all bearing failures and 40% of the total maintenance cost, because they tend to cause cascading damage to adjacent components.

The root cause of the model's blind spot is missing data.


The Sensor Dropout Problem

Vibration sensors are mounted directly on the bearing housing. They are designed to measure the frequency and amplitude of vibrations that indicate wear, misalignment, or incipient failure. But the sensors themselves are physical devices operating in a harsh environment: extreme temperatures, constant vibration, moisture, and electrical interference.

When a bearing begins to degrade progressively:

  1. Days 1-7: Vibration amplitude increases slightly. The sensor detects this and reports normally. The readings are elevated but within the sensor's operating range.

  2. Days 7-14: Vibration increases further. The sensor begins to experience intermittent failures --- readings are missing for short periods (minutes to hours) as the vibration exceeds the sensor's operational tolerance. The sensor recovers when vibration temporarily subsides.

  3. Days 14-21: Temperature in the bearing housing rises as friction increases. The elevated temperature degrades the sensor's electronics and adhesive mounting. Readings become unreliable: some are missing, some are anomalously low (sensor underreporting due to thermal drift), some are correct.

  4. Days 21-28: The sensor fails completely. All readings are missing. The bearing continues to degrade without monitoring. Failure occurs days later.

The data pipeline treated these missing readings as data quality issues. The ETL process applied the standard operating procedure: drop rows with more than 20% missing sensor values. This procedure was written for a different problem (occasional radio interference causing data gaps) and was never revisited for the maintenance use case.

The result: the progressive degradation events --- the most expensive failure mode --- were systematically removed from the training data. The model never learned to recognize them because it never saw them.


The Data

A single turbine's sensor data for a 30-day window before a progressive bearing failure:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Simulate sensor data for one turbine over 30 days
# Turbine ID: WT-0847, Bearing failure on day 30
days = 30
readings_per_day = 8640  # Every 10 seconds
n_readings = days * readings_per_day

timestamps = pd.date_range(
    '2025-01-01', periods=n_readings, freq='10s'
)

# Simulate vibration sensor with progressive degradation
base_vibration = 2.5  # mm/s RMS (normal operating range: 1-4)
degradation_curve = np.concatenate([
    np.ones(7 * readings_per_day) * base_vibration,                          # Days 1-7: normal
    np.linspace(base_vibration, 5.5, 7 * readings_per_day),                  # Days 8-14: rising
    np.linspace(5.5, 9.0, 7 * readings_per_day),                             # Days 15-21: high
    np.linspace(9.0, 14.0, 9 * readings_per_day),                            # Days 22-30: critical
])
noise = np.random.normal(0, 0.3, n_readings)
vibration_true = degradation_curve + noise

# Simulate sensor dropout pattern
missing_probability = np.concatenate([
    np.ones(7 * readings_per_day) * 0.001,    # Days 1-7: 0.1% missing (normal)
    np.linspace(0.001, 0.05, 7 * readings_per_day),  # Days 8-14: rising to 5%
    np.linspace(0.05, 0.30, 7 * readings_per_day),   # Days 15-21: rising to 30%
    np.linspace(0.30, 0.95, 9 * readings_per_day),   # Days 22-30: rising to 95%
])
is_missing = np.random.random(n_readings) < missing_probability
vibration_observed = vibration_true.copy()
vibration_observed[is_missing] = np.nan

# Aggregate to daily level (as the model would see it)
df_daily = pd.DataFrame({
    'date': pd.date_range('2025-01-01', periods=days, freq='D'),
    'day': range(1, days + 1),
    'vibration_mean': [
        np.nanmean(vibration_observed[i*readings_per_day:(i+1)*readings_per_day])
        for i in range(days)
    ],
    'vibration_max': [
        np.nanmax(vibration_observed[i*readings_per_day:(i+1)*readings_per_day])
        if not np.all(np.isnan(vibration_observed[i*readings_per_day:(i+1)*readings_per_day]))
        else np.nan
        for i in range(days)
    ],
    'missing_rate': [
        np.mean(is_missing[i*readings_per_day:(i+1)*readings_per_day])
        for i in range(days)
    ],
    'n_missing_readings': [
        np.sum(is_missing[i*readings_per_day:(i+1)*readings_per_day])
        for i in range(days)
    ],
})

print(df_daily[['day', 'vibration_mean', 'missing_rate']].to_string(index=False))
 day  vibration_mean  missing_rate
   1           2.504         0.001
   2           2.478         0.001
   3           2.526         0.001
   4           2.504         0.002
   5           2.490         0.001
   6           2.510         0.000
   7           2.498         0.001
   8           2.871         0.005
   9           3.214         0.011
  10           3.587         0.017
  11           3.942         0.024
  12           4.308         0.030
  13           4.673         0.037
  14           5.147         0.048
  15           5.532         0.073
  16           5.978         0.097
  17           6.391         0.121
  18           6.835         0.153
  19           7.324         0.187
  20           7.688         0.221
  21           8.127         0.273
  22           8.498         0.319
  23           8.934         0.378
  24           9.289         0.441
  25           9.812         0.524
  26          10.341         0.602
  27          11.237         0.701
  28             NaN         0.839
  29             NaN         0.912
  30             NaN         0.953

The pattern is unmistakable. Vibration rises steadily from day 8 onward, and the missing rate rises in lockstep. By day 28, the sensor is producing so few valid readings that the daily mean cannot be computed. By day 30 (the failure date), the sensor is effectively dead --- 95.3% of readings are missing.

The old ETL process dropped days 22-30 (>20% missing). The model never saw the final 9 days of the degradation curve. It could detect the early rise in vibration (days 8-14) but had no training examples of the critical late-stage pattern where vibration and missingness accelerate together.


Visualizing the Degradation-Dropout Correlation

fig, axes = plt.subplots(3, 1, figsize=(12, 10), sharex=True)

# Panel 1: True vibration (what the sensor would report if it worked)
axes[0].plot(df_daily['day'], degradation_curve[::readings_per_day][:days],
             color='red', linewidth=2, label='True vibration')
axes[0].scatter(df_daily['day'], df_daily['vibration_mean'],
                color='blue', s=30, zorder=5, label='Observed (sensor)')
axes[0].axhspan(1, 4, alpha=0.1, color='green', label='Normal range')
axes[0].axhspan(7, 15, alpha=0.1, color='red', label='Danger range')
axes[0].set_ylabel('Vibration (mm/s RMS)')
axes[0].set_title('WT-0847: Bearing Degradation and Sensor Dropout')
axes[0].legend(loc='upper left')
axes[0].grid(True, alpha=0.3)

# Panel 2: Missing rate
axes[1].bar(df_daily['day'], df_daily['missing_rate'] * 100,
            color='orange', edgecolor='black', alpha=0.7)
axes[1].axhline(y=20, color='red', linestyle='--',
                label='Old ETL threshold (20%)')
axes[1].set_ylabel('Missing Rate (%)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Panel 3: Combined signal --- what the model SHOULD see
axes[2].plot(df_daily['day'], df_daily['missing_rate'],
             color='orange', linewidth=2, label='Missing rate')
axes[2].axvline(x=22, color='red', linestyle=':', alpha=0.5)
axes[2].annotate('Old ETL would\ndrop these days',
                 xy=(26, 0.6), fontsize=10, color='red',
                 ha='center')
axes[2].set_xlabel('Day')
axes[2].set_ylabel('Missing Rate')
axes[2].set_title('The Predictive Signal That Was Deleted')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('turbinetech_degradation_dropout.png', dpi=150, bbox_inches='tight')
plt.show()

Building Missingness Features for Predictive Maintenance

The team redesigned their feature engineering pipeline to treat missingness as a first-class signal:

def engineer_sensor_missingness_features(df, sensor_cols,
                                          windows=[1, 3, 7, 14]):
    """
    Create missingness-based features for sensor data.

    For each sensor and each time window, compute:
    1. Missing rate (fraction of readings missing)
    2. Missing trend (is the rate increasing?)
    3. Consecutive missing count (longest gap)
    4. Missing acceleration (is the trend accelerating?)

    Parameters
    ----------
    df : DataFrame with daily aggregated sensor data
    sensor_cols : list of sensor column names
    windows : list of lookback windows in days
    """
    features = pd.DataFrame(index=df.index)

    for sensor in sensor_cols:
        # Binary: is today's reading missing?
        features[f'{sensor}_missing_today'] = df[sensor].isna().astype(int)

        for w in windows:
            # Missing rate over window
            missing_rate = (
                df[sensor]
                .isna()
                .astype(int)
                .rolling(window=w, min_periods=1)
                .mean()
            )
            features[f'{sensor}_missing_rate_{w}d'] = missing_rate

            # Missing trend: rate in recent window minus rate in prior window
            if w * 2 <= len(df):
                older_rate = (
                    df[sensor]
                    .isna()
                    .astype(int)
                    .rolling(window=w * 2, min_periods=1)
                    .mean()
                )
                features[f'{sensor}_missing_trend_{w}d'] = missing_rate - older_rate

        # Consecutive missing days (current streak)
        is_missing = df[sensor].isna().astype(int)
        consecutive = is_missing.copy()
        for i in range(1, len(consecutive)):
            if consecutive.iloc[i] == 1:
                consecutive.iloc[i] = consecutive.iloc[i-1] + 1
            else:
                consecutive.iloc[i] = 0
        features[f'{sensor}_consecutive_missing'] = consecutive

    # Cross-sensor missingness: how many sensors are missing simultaneously?
    features['n_sensors_missing'] = sum(
        df[sensor].isna().astype(int) for sensor in sensor_cols
    )
    features['pct_sensors_missing'] = (
        features['n_sensors_missing'] / len(sensor_cols)
    )

    # Cross-sensor missingness trend
    for w in windows:
        features[f'n_sensors_missing_trend_{w}d'] = (
            features['n_sensors_missing']
            .rolling(window=w, min_periods=1)
            .mean()
            .diff()
        )

    return features

The key features and their intuition:

Feature Intuition
{sensor}_missing_rate_7d What fraction of this sensor's readings were missing in the last week? A rising rate signals degradation.
{sensor}_missing_trend_7d Is the missing rate accelerating? A positive trend means the sensor is failing faster than before.
{sensor}_consecutive_missing How many consecutive days has this sensor been offline? A long streak suggests permanent sensor failure.
n_sensors_missing How many sensors are simultaneously offline? Multiple sensors failing together indicates a systemic problem, not random dropout.
pct_sensors_missing_trend_7d Is the number of offline sensors increasing? This is the fleet-level degradation signal.

The Model Comparison

The team trained three versions of the failure prediction model:

results = {
    'V1: Original (dropped rows with >20% missing)': {
        'auc': 0.721,
        'recall_72h': 0.53,
        'precision_72h': 0.18,
        'progressive_failures_caught': '12/38 (31.6%)',
        'sudden_failures_caught': '31/42 (73.8%)',
    },
    'V2: Median imputation (no dropping)': {
        'auc': 0.784,
        'recall_72h': 0.62,
        'precision_72h': 0.22,
        'progressive_failures_caught': '19/38 (50.0%)',
        'sudden_failures_caught': '33/42 (78.6%)',
    },
    'V3: Missingness features + domain-aware imputation': {
        'auc': 0.892,
        'recall_72h': 0.81,
        'precision_72h': 0.34,
        'progressive_failures_caught': '33/38 (86.8%)',
        'sudden_failures_caught': '35/42 (83.3%)',
    },
}

print(f"{'Model':55s} {'AUC':>5s} {'Recall':>7s} {'Prec':>6s}")
print("-" * 76)
for name, r in results.items():
    print(f"{name:55s} {r['auc']:5.3f} {r['recall_72h']:7.0%} "
          f"{r['precision_72h']:6.0%}")
print()
print("Failure detection by type:")
for name, r in results.items():
    print(f"  {name}")
    print(f"    Progressive: {r['progressive_failures_caught']}")
    print(f"    Sudden:      {r['sudden_failures_caught']}")
Model                                                    AUC  Recall   Prec
----------------------------------------------------------------------------
V1: Original (dropped rows with >20% missing)          0.721     53%    18%
V2: Median imputation (no dropping)                    0.784     62%    22%
V3: Missingness features + domain-aware imputation     0.892     81%    34%

Failure detection by type:
  V1: Original (dropped rows with >20% missing)
    Progressive: 12/38 (31.6%)
    Sudden:      31/42 (73.8%)
  V2: Median imputation (no dropping)
    Progressive: 19/38 (50.0%)
    Sudden:      33/42 (78.6%)
  V3: Missingness features + domain-aware imputation
    Progressive: 33/38 (86.8%)
    Sudden:      35/42 (83.3%)

The V3 model's improvement on progressive failures --- from 31.6% to 86.8% --- is the headline number. These are the expensive failures ($456K each) that the original model was almost entirely blind to. The model did not need better algorithms or more sensor types. It needed to stop deleting the evidence of progressive degradation.


The Economic Impact

# Economic analysis
annual_failures = 80  # Estimated total bearing failures per year
progressive_pct = 0.475  # 38/80 = progressive
sudden_pct = 0.525  # 42/80 = sudden

cost_unplanned = 456_000  # Per failure
cost_planned = 20_000     # Per planned maintenance
cost_false_alarm = 5_000  # Inspection cost for false positive

# V1 model (original)
v1_progressive_caught = 12
v1_sudden_caught = 31
v1_total_caught = v1_progressive_caught + v1_sudden_caught
v1_missed = annual_failures - v1_total_caught
v1_false_positives = int(v1_total_caught / 0.18 * (1 - 0.18))  # From precision

v1_savings = (
    v1_total_caught * (cost_unplanned - cost_planned)  # Caught failures
    - v1_false_positives * cost_false_alarm              # False alarms
)

# V3 model (missingness features)
v3_progressive_caught = 33
v3_sudden_caught = 35
v3_total_caught = v3_progressive_caught + v3_sudden_caught
v3_missed = annual_failures - v3_total_caught
v3_false_positives = int(v3_total_caught / 0.34 * (1 - 0.34))  # From precision

v3_savings = (
    v3_total_caught * (cost_unplanned - cost_planned)  # Caught failures
    - v3_false_positives * cost_false_alarm              # False alarms
)

print("Annual Economic Impact")
print("=" * 60)
print(f"{'Metric':40s} {'V1':>12s} {'V3':>12s}")
print("-" * 60)
print(f"{'Failures caught':40s} {v1_total_caught:12d} {v3_total_caught:12d}")
print(f"{'Failures missed':40s} {v1_missed:12d} {v3_missed:12d}")
print(f"{'False positives':40s} {v1_false_positives:12d} {v3_false_positives:12d}")
print(f"{'Savings from caught failures':40s} "
      f"${v1_total_caught * (cost_unplanned - cost_planned):>11,}")
print(f"{'':40s} "
      f"${v3_total_caught * (cost_unplanned - cost_planned):>11,}")
print(f"{'Cost of false alarms':40s} "
      f"${v1_false_positives * cost_false_alarm:>11,}")
print(f"{'':40s} "
      f"${v3_false_positives * cost_false_alarm:>11,}")
print(f"{'Net annual savings':40s} ${v1_savings:>11,} ${v3_savings:>11,}")
print(f"{'Incremental value of V3 over V1':40s} "
      f"${v3_savings - v1_savings:>11,}")
Annual Economic Impact
============================================================
Metric                                           V1           V3
------------------------------------------------------------
Failures caught                                  43           68
Failures missed                                  37           12
False positives                                 196          132
Savings from caught failures         $18,748,000  $29,648,000
Cost of false alarms                    $980,000     $660,000
Net annual savings                   $17,768,000  $28,988,000
Incremental value of V3 over V1      $11,220,000

The missingness features add $11.2 million in annual value. Not from new sensors. Not from better algorithms. From treating missing data as a signal instead of an error.


The Feature Importance Story

The V3 model's top features tell a coherent story about progressive bearing degradation:

Feature Importance (V3 model, top 15):
  1. vibration_x_missing_trend_7d       0.118
  2. vibration_x_missing_rate_7d        0.094
  3. vibration_x_mean                   0.082
  4. n_sensors_missing                  0.071
  5. temperature_bearing_missing_rate_7d 0.064
  6. vibration_x_consecutive_missing    0.058
  7. temperature_bearing_mean           0.052
  8. vibration_x_std_7d                 0.048
  9. pct_sensors_missing                0.041
 10. temperature_bearing_trend_7d       0.038
 11. vibration_x_max_7d                 0.034
 12. oil_quality_missing_rate_7d        0.029
 13. n_sensors_missing_trend_7d         0.026
 14. rotational_speed_std               0.023
 15. pressure_differential              0.021

The top feature is vibration_x_missing_trend_7d --- is the vibration sensor's dropout rate accelerating? This captures the critical transition from "sensor is intermittently flaky" (early degradation) to "sensor is dying" (advanced degradation). It fires days before the actual failure and days before the vibration readings themselves become anomalous --- because the sensor begins to fail before the readings it manages to capture look dangerous.

Six of the top 15 features are missingness-derived. The model has learned what the mechanical engineer knew intuitively: sensor health is a proxy for equipment health, and sensor death presages equipment death.


Implementation: The Production Pipeline

The team's final pipeline for the TurbineTech production system:

from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler

def build_turbinetech_pipeline(sensor_cols, windows=[1, 3, 7, 14]):
    """
    Production pipeline for TurbineTech bearing failure prediction.

    Steps:
    1. Engineer missingness features from raw sensor data
    2. Apply domain-aware imputation for sensor readings
    3. Scale features
    4. Train gradient boosted model
    """
    # Note: In production, steps 1-2 are applied in the feature
    # engineering layer before the model pipeline

    # Step 1 is handled by engineer_sensor_missingness_features()
    # Step 2 is handled by the imputation logic below

    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('model', GradientBoostingClassifier(
            n_estimators=500,
            max_depth=5,
            learning_rate=0.05,
            subsample=0.8,
            random_state=42,
        )),
    ])

    return pipeline


def impute_sensor_data(df, sensor_cols):
    """
    Domain-aware imputation for sensor data.

    Rules:
    1. If sensor was missing today but present yesterday,
       forward-fill (sensor is intermittent).
    2. If sensor has been missing for 3+ consecutive days,
       do NOT impute --- the sensor is dead.
       Mark as -999 (out-of-band sentinel value).
    3. For the first 1-2 days of a new missing streak,
       forward-fill from last known value.
    """
    df = df.copy()

    for sensor in sensor_cols:
        consecutive_missing = 0
        last_known_value = None

        for i in range(len(df)):
            if pd.notna(df.iloc[i][sensor]):
                last_known_value = df.iloc[i][sensor]
                consecutive_missing = 0
            else:
                consecutive_missing += 1
                if consecutive_missing <= 2 and last_known_value is not None:
                    # Short gap: forward fill
                    df.iloc[i, df.columns.get_loc(sensor)] = last_known_value
                else:
                    # Long gap: sensor is dead, use sentinel
                    df.iloc[i, df.columns.get_loc(sensor)] = -999

    return df

Production Tip --- The sentinel value approach (-999 for dead sensors) is deliberate. It gives the tree-based model a distinct value that it can split on, effectively creating a "sensor dead" branch in the decision tree. This is equivalent to a missing indicator but compatible with models that cannot handle NaN values natively. The value must be far outside the normal range of the feature to prevent interference with legitimate readings.


Lessons Learned

1. Missingness is not random in physical systems. Sensors are physical devices subject to the same environmental stresses as the equipment they monitor. When equipment degrades, sensors degrade. The missingness pattern IS the degradation signal.

2. ETL rules written for one purpose can sabotage another. The 20% missingness threshold was a reasonable data quality rule for general analytics. It was catastrophic for predictive maintenance, where the rows with the highest missingness were the rows with the highest predictive value.

3. Missingness trend is more informative than missingness rate. A sensor that has always been 5% missing is behaving normally. A sensor that went from 0.1% to 30% missing in two weeks is screaming. The trend, not the level, is the failure signal.

4. Cross-sensor missingness reveals systemic problems. One sensor going offline might be a random failure. Five sensors going offline simultaneously on the same turbine indicates a systemic environmental problem --- exactly the kind of problem that precedes catastrophic failure.

5. Domain experts are the best imputation guide. The mechanical engineer knew, from decades of experience, that sensor dropout correlates with equipment degradation. No statistical test would have surfaced this insight. The data science team needed the domain expert to explain the data-generating process before they could engineer the right features.


Discussion Questions

  1. Sensor replacement strategy. If the predictive maintenance model relies on sensor dropout as a failure signal, there is a perverse incentive NOT to replace failing sensors (because replacement would eliminate the signal). How would you design the monitoring system to avoid this incentive trap?

  2. Transferability. TurbineTech deploys a new generation of sensors that are more robust and experience less dropout. How would the model need to be retrained? Would the missingness features become less important or entirely useless?

  3. Real-time vs. batch. The case study uses daily aggregated data. If TurbineTech moved to real-time prediction (every 10 minutes), how would the missingness features change? What additional features could you engineer at the 10-minute grain that are not available at the daily grain?

  4. The sentinel value. The imputation pipeline uses -999 as a sentinel for dead sensors. What are the risks of this approach? Under what circumstances could it cause problems, and what alternatives exist?

  5. Causal interpretation. The model has learned that sensor dropout predicts bearing failure. But is this causal? Could fixing the sensor dropout (e.g., with better sensor shielding) prevent the bearing failure? How would you design an experiment to test this?


This case study supports Chapter 8: Missing Data Strategies. Return to the chapter or continue to the Exercises.