Chapter 30 Key Takeaways: HR Analytics and People Data

The Core Principle

HR analytics makes people data visible so that judgment can be applied to the right questions. It does not replace the human judgment, empathy, and organizational wisdom required to act on what the data reveals. The best HR analysts are deeply committed to both rigorous analysis and responsible use.

Privacy-First Rules (Non-Negotiable)

Rule	What It Means in Practice
Aggregate, never individual	Reports show group statistics, never an identified person's salary or score
Minimum group size	Do not publish statistics for groups smaller than 5 people
Suppress small cells	In pivot tables, replace small-cell values with NaN / "-"
Access controls	HR data should not be in shared drives accessible to general analytics teams
Purpose limitation	Analyze for the stated purpose only; do not repurpose HR data for other uses without explicit authorization

# The minimum group size rule in code
MIN_GROUP_SIZE_FOR_REPORTING = 5

def safe_aggregate(df, group_col, metric_col, min_group=MIN_GROUP_SIZE_FOR_REPORTING):
    """Return group statistics, suppressing groups below minimum size."""
    result = df.groupby(group_col)[metric_col].agg(
        count="count", median="median"
    ).reset_index()
    result.loc[result["count"] < min_group, "median"] = None
    return result

Core HR Metrics at a Glance

Metric	Formula	What It Tells You
Turnover Rate	Separations / Avg Headcount × 100	How much of the workforce left in a period
Voluntary Turnover Rate	Voluntary Separations / Avg Headcount × 100	Pull/push factors — culture, pay, opportunity
Absence Rate	Total Absence Days / (HC × Working Days) × 100	Engagement and workload signal
FTE Count	FT + (PT count × PT hours / FT hours)	Normalized workforce capacity
Compa-Ratio	Employee Salary / Pay Band Midpoint	Position within pay band
Early-Tenure Attrition	Seps within 1 year / Avg Headcount × 100	Onboarding and expectation alignment
Supervisor Span of Control	IC Headcount / Manager Headcount	Management capacity

Turnover Rate Formula

def calculate_turnover_rate(separations_df, avg_headcount, period="annual"):
    """
    Turnover rate = (number of separations / average headcount) x 100

    Always separate voluntary from involuntary:
      - Voluntary = employees choosing to leave (culture/pay/opportunity signal)
      - Involuntary = company-initiated (performance management signal)
    """
    total = len(separations_df)
    voluntary = len(separations_df[
        separations_df["separation_reason"] == "Voluntary Resignation"
    ])
    return {
        "total_turnover_rate": round(total / avg_headcount * 100, 1),
        "voluntary_turnover_rate": round(voluntary / avg_headcount * 100, 1),
    }

Tenure Distribution Buckets

Standard buckets for tenure analysis:

Bucket	Label	Watch For
0–12 months	New hires	High concentration = onboarding problem
1–3 years	Early career	Should be largest bucket in a growing company
3–5 years	Experienced	Attrition here = career advancement gap
5–10 years	Senior staff	Institutional knowledge holders
10+ years	Long-tenured	Retirement cliff risk if cohort is large

Compensation Analysis: Use Median, Not Mean

# WRONG for compensation analysis — one CEO salary distorts the mean
avg_salary = df["annual_salary"].mean()

# RIGHT — median is robust to high-earner outliers
median_salary = df["annual_salary"].median()

# COMPLETE picture — show the distribution
salary_stats = df.groupby("department")["annual_salary"].agg(
    count="count",
    median="median",
    p25=lambda x: x.quantile(0.25),
    p75=lambda x: x.quantile(0.75),
)

The mean-to-median ratio (mean / median) is itself informative: ratios above 1.15 indicate significant right skew from high earners.

Compa-Ratio Interpretation

Range	Label	Typical Interpretation
Below 0.80	Significantly below band	Possible underpayment; retention risk
0.80–0.85	Below midpoint	Below expected range; review warranted
0.85–1.15	In range	Competitive, within band expectations
1.15–1.25	Above midpoint	Top performer premium or pay creep
Above 1.25	Above band	Reclassification or pay review warranted

Headcount Projection Formula

def quarterly_hires_needed(
    headcount, annual_growth_rate, annual_attrition_rate
):
    """Quick estimate of quarterly hires needed."""
    backfill = headcount * annual_attrition_rate / 4
    growth = headcount * annual_growth_rate / 4
    return round(backfill + growth)

# Example: 250 employees, 18% attrition, 5% growth
q1_hires = quarterly_hires_needed(250, 0.05, 0.18)  # ≈ 14

FTE Calculation

def fte_total(full_time_count, part_time_count,
              ft_hours=2080, pt_hours=1040):
    """Total FTE = FT count + (PT count × PT hours / FT hours)."""
    return full_time_count + part_time_count * (pt_hours / ft_hours)

Standard values: 2,080 hours/year for FT (52 weeks × 40 hrs); 1,040 for half-time PT.

Pivot Table for HR Reporting

# Classic headcount pivot: department × job level
pivot = df.pivot_table(
    index="department",
    columns="job_level",
    values="employee_id",
    aggfunc="count",
    fill_value=0,
)
pivot["Total"] = pivot.sum(axis=1)
pivot.loc["Total"] = pivot.sum()

# IMPORTANT: Before sharing, check all cells >= MIN_GROUP_SIZE_FOR_REPORTING
small_cells = (pivot < MIN_GROUP_SIZE_FOR_REPORTING) & (pivot > 0)
pivot[small_cells] = None  # Suppress small cells

Dashboard Design for HR Data

Appropriate chart types for HR data:

Data	Recommended Chart	Why
Tenure distribution	Histogram	Shows the full distribution shape
Turnover by department	Horizontal bar (color-coded by severity)	Easy comparison, space for department names
Salary by department	Bar with IQR error bars	Shows spread without exposing individuals
Separations over time	Bar chart or line	Reveals trend and seasonal patterns
Headcount vs. target	Line chart with reference line	Instantly shows over/under vs. plan
Headcount by category	Stacked bar	Shows composition and total simultaneously

Avoid: Box plots for salary data in shared reports (quartile lines can approach individual values in small groups). Scatter plots that could reveal individual data points.

Benchmarks: Where to Find Real Ones

Metric	Source
Turnover rates by industry	SHRM Human Capital Benchmarking Report (annual)
Time-to-hire by role	LinkedIn Talent Insights, SHRM
Absence rates	Bureau of Labor Statistics (BLS) Economic News Release
Compensation ranges	Bureau of Labor Statistics Occupational Employment Survey; Mercer; Willis Towers Watson
Span of control norms	Deloitte Human Capital Trends; McKinsey Organization Practice

Always cite the source and year when using benchmarks. Never present a number as an industry benchmark without verifying it.

The South Warehouse Diagnostic Framework

When any department shows elevated turnover, test these hypotheses in order:

When are they leaving? — Tenure-at-separation analysis reveals whether the problem is early (onboarding) or late (career ceiling)
Why are they leaving? — Voluntary vs. involuntary split narrows the cause
When during the year? — Seasonal index reveals cyclical vs. structural issues
Is it specific to this location/team? — Comparison to equivalent departments elsewhere
What does the cost estimate suggest? — Is the investment in a solution justified by the cost of the problem?

Functions Summary

from hr_analytics import (
    generate_employee_dataset,     # Create synthetic dataset for practice
    generate_separations_history,  # Create synthetic separations history
    calculate_turnover_rate,       # Overall and voluntary/involuntary rates
    turnover_by_department,        # Turnover rates per department
    turnover_by_tenure_bucket,     # When in tenure are people leaving?
    seasonal_turnover,             # Monthly and seasonal patterns
    tenure_distribution,           # Bucket breakdown + summary stats
    compensation_analysis,         # Group-level salary statistics
    compa_ratio,                   # Salary vs. pay band midpoint
    pay_band_analysis,             # Group-level band positioning
    absence_rate,                  # Company and department absence metrics
    headcount_projection,          # Quarterly hiring requirements
    fte_calculation,               # Full-time equivalent total
    hr_pivot_summary,              # Department × level headcount pivot
    build_hr_dashboard,            # 4-panel matplotlib dashboard
    anonymize_salaries,            # Aggregate + suppress small groups
    print_hr_summary,              # Formatted console summary
)

One-Paragraph Summary

Chapter 30 builds a Python toolkit for responsible HR analytics. Starting with the ethical foundation — aggregate-only reporting, minimum group sizes, privacy-by-default — the chapter builds functions for calculating turnover rates (overall, voluntary, involuntary, by department, by tenure bucket, and by season), analyzing compensation at the group level using median and interquartile range, computing compa-ratios against pay band midpoints, tracking absenteeism, projecting quarterly headcount needs, and visualizing HR data in ways that inform decisions without exposing individuals. The Acme Corp case study demonstrates the full diagnostic workflow: Priya identifies that South warehouse has 34% annual turnover (vs. 26% company warehouse average), determines through tenure-at-separation and seasonal analysis that the problem is concentrated in early-tenure Q3 hires, and builds an anonymized, actionable report that leads to an onboarding program and retention bonus — reducing South warehouse early-tenure attrition from 46% to 28% within one year. The Evergreen Gardens case study shows the same toolkit applied to a seasonal hiring sustainability question, demonstrating that people analytics serves strategic business questions, not just HR compliance.