Chapter 30 Key Takeaways: HR Analytics and People Data


The Core Principle

HR analytics makes people data visible so that judgment can be applied to the right questions. It does not replace the human judgment, empathy, and organizational wisdom required to act on what the data reveals. The best HR analysts are deeply committed to both rigorous analysis and responsible use.


Privacy-First Rules (Non-Negotiable)

Rule What It Means in Practice
Aggregate, never individual Reports show group statistics, never an identified person's salary or score
Minimum group size Do not publish statistics for groups smaller than 5 people
Suppress small cells In pivot tables, replace small-cell values with NaN / "-"
Access controls HR data should not be in shared drives accessible to general analytics teams
Purpose limitation Analyze for the stated purpose only; do not repurpose HR data for other uses without explicit authorization
# The minimum group size rule in code
MIN_GROUP_SIZE_FOR_REPORTING = 5

def safe_aggregate(df, group_col, metric_col, min_group=MIN_GROUP_SIZE_FOR_REPORTING):
    """Return group statistics, suppressing groups below minimum size."""
    result = df.groupby(group_col)[metric_col].agg(
        count="count", median="median"
    ).reset_index()
    result.loc[result["count"] < min_group, "median"] = None
    return result

Core HR Metrics at a Glance

Metric Formula What It Tells You
Turnover Rate Separations / Avg Headcount × 100 How much of the workforce left in a period
Voluntary Turnover Rate Voluntary Separations / Avg Headcount × 100 Pull/push factors — culture, pay, opportunity
Absence Rate Total Absence Days / (HC × Working Days) × 100 Engagement and workload signal
FTE Count FT + (PT count × PT hours / FT hours) Normalized workforce capacity
Compa-Ratio Employee Salary / Pay Band Midpoint Position within pay band
Early-Tenure Attrition Seps within 1 year / Avg Headcount × 100 Onboarding and expectation alignment
Supervisor Span of Control IC Headcount / Manager Headcount Management capacity

Turnover Rate Formula

def calculate_turnover_rate(separations_df, avg_headcount, period="annual"):
    """
    Turnover rate = (number of separations / average headcount) x 100

    Always separate voluntary from involuntary:
      - Voluntary = employees choosing to leave (culture/pay/opportunity signal)
      - Involuntary = company-initiated (performance management signal)
    """
    total = len(separations_df)
    voluntary = len(separations_df[
        separations_df["separation_reason"] == "Voluntary Resignation"
    ])
    return {
        "total_turnover_rate": round(total / avg_headcount * 100, 1),
        "voluntary_turnover_rate": round(voluntary / avg_headcount * 100, 1),
    }

Tenure Distribution Buckets

Standard buckets for tenure analysis:

Bucket Label Watch For
0–12 months New hires High concentration = onboarding problem
1–3 years Early career Should be largest bucket in a growing company
3–5 years Experienced Attrition here = career advancement gap
5–10 years Senior staff Institutional knowledge holders
10+ years Long-tenured Retirement cliff risk if cohort is large

Compensation Analysis: Use Median, Not Mean

# WRONG for compensation analysis — one CEO salary distorts the mean
avg_salary = df["annual_salary"].mean()

# RIGHT — median is robust to high-earner outliers
median_salary = df["annual_salary"].median()

# COMPLETE picture — show the distribution
salary_stats = df.groupby("department")["annual_salary"].agg(
    count="count",
    median="median",
    p25=lambda x: x.quantile(0.25),
    p75=lambda x: x.quantile(0.75),
)

The mean-to-median ratio (mean / median) is itself informative: ratios above 1.15 indicate significant right skew from high earners.


Compa-Ratio Interpretation

Range Label Typical Interpretation
Below 0.80 Significantly below band Possible underpayment; retention risk
0.80–0.85 Below midpoint Below expected range; review warranted
0.85–1.15 In range Competitive, within band expectations
1.15–1.25 Above midpoint Top performer premium or pay creep
Above 1.25 Above band Reclassification or pay review warranted

Headcount Projection Formula

def quarterly_hires_needed(
    headcount, annual_growth_rate, annual_attrition_rate
):
    """Quick estimate of quarterly hires needed."""
    backfill = headcount * annual_attrition_rate / 4
    growth = headcount * annual_growth_rate / 4
    return round(backfill + growth)

# Example: 250 employees, 18% attrition, 5% growth
q1_hires = quarterly_hires_needed(250, 0.05, 0.18)  # ≈ 14

FTE Calculation

def fte_total(full_time_count, part_time_count,
              ft_hours=2080, pt_hours=1040):
    """Total FTE = FT count + (PT count × PT hours / FT hours)."""
    return full_time_count + part_time_count * (pt_hours / ft_hours)

Standard values: 2,080 hours/year for FT (52 weeks × 40 hrs); 1,040 for half-time PT.


Pivot Table for HR Reporting

# Classic headcount pivot: department × job level
pivot = df.pivot_table(
    index="department",
    columns="job_level",
    values="employee_id",
    aggfunc="count",
    fill_value=0,
)
pivot["Total"] = pivot.sum(axis=1)
pivot.loc["Total"] = pivot.sum()

# IMPORTANT: Before sharing, check all cells >= MIN_GROUP_SIZE_FOR_REPORTING
small_cells = (pivot < MIN_GROUP_SIZE_FOR_REPORTING) & (pivot > 0)
pivot[small_cells] = None  # Suppress small cells

Dashboard Design for HR Data

Appropriate chart types for HR data:

Data Recommended Chart Why
Tenure distribution Histogram Shows the full distribution shape
Turnover by department Horizontal bar (color-coded by severity) Easy comparison, space for department names
Salary by department Bar with IQR error bars Shows spread without exposing individuals
Separations over time Bar chart or line Reveals trend and seasonal patterns
Headcount vs. target Line chart with reference line Instantly shows over/under vs. plan
Headcount by category Stacked bar Shows composition and total simultaneously

Avoid: Box plots for salary data in shared reports (quartile lines can approach individual values in small groups). Scatter plots that could reveal individual data points.


Benchmarks: Where to Find Real Ones

Metric Source
Turnover rates by industry SHRM Human Capital Benchmarking Report (annual)
Time-to-hire by role LinkedIn Talent Insights, SHRM
Absence rates Bureau of Labor Statistics (BLS) Economic News Release
Compensation ranges Bureau of Labor Statistics Occupational Employment Survey; Mercer; Willis Towers Watson
Span of control norms Deloitte Human Capital Trends; McKinsey Organization Practice

Always cite the source and year when using benchmarks. Never present a number as an industry benchmark without verifying it.


The South Warehouse Diagnostic Framework

When any department shows elevated turnover, test these hypotheses in order:

  1. When are they leaving? — Tenure-at-separation analysis reveals whether the problem is early (onboarding) or late (career ceiling)
  2. Why are they leaving? — Voluntary vs. involuntary split narrows the cause
  3. When during the year? — Seasonal index reveals cyclical vs. structural issues
  4. Is it specific to this location/team? — Comparison to equivalent departments elsewhere
  5. What does the cost estimate suggest? — Is the investment in a solution justified by the cost of the problem?

Functions Summary

from hr_analytics import (
    generate_employee_dataset,     # Create synthetic dataset for practice
    generate_separations_history,  # Create synthetic separations history
    calculate_turnover_rate,       # Overall and voluntary/involuntary rates
    turnover_by_department,        # Turnover rates per department
    turnover_by_tenure_bucket,     # When in tenure are people leaving?
    seasonal_turnover,             # Monthly and seasonal patterns
    tenure_distribution,           # Bucket breakdown + summary stats
    compensation_analysis,         # Group-level salary statistics
    compa_ratio,                   # Salary vs. pay band midpoint
    pay_band_analysis,             # Group-level band positioning
    absence_rate,                  # Company and department absence metrics
    headcount_projection,          # Quarterly hiring requirements
    fte_calculation,               # Full-time equivalent total
    hr_pivot_summary,              # Department × level headcount pivot
    build_hr_dashboard,            # 4-panel matplotlib dashboard
    anonymize_salaries,            # Aggregate + suppress small groups
    print_hr_summary,              # Formatted console summary
)

One-Paragraph Summary

Chapter 30 builds a Python toolkit for responsible HR analytics. Starting with the ethical foundation — aggregate-only reporting, minimum group sizes, privacy-by-default — the chapter builds functions for calculating turnover rates (overall, voluntary, involuntary, by department, by tenure bucket, and by season), analyzing compensation at the group level using median and interquartile range, computing compa-ratios against pay band midpoints, tracking absenteeism, projecting quarterly headcount needs, and visualizing HR data in ways that inform decisions without exposing individuals. The Acme Corp case study demonstrates the full diagnostic workflow: Priya identifies that South warehouse has 34% annual turnover (vs. 26% company warehouse average), determines through tenure-at-separation and seasonal analysis that the problem is concentrated in early-tenure Q3 hires, and builds an anonymized, actionable report that leads to an onboarding program and retention bonus — reducing South warehouse early-tenure attrition from 46% to 28% within one year. The Evergreen Gardens case study shows the same toolkit applied to a seasonal hiring sustainability question, demonstrating that people analytics serves strategic business questions, not just HR compliance.