Chapter 30 Key Takeaways: HR Analytics and People Data
The Core Principle
HR analytics makes people data visible so that judgment can be applied to the right questions. It does not replace the human judgment, empathy, and organizational wisdom required to act on what the data reveals. The best HR analysts are deeply committed to both rigorous analysis and responsible use.
Privacy-First Rules (Non-Negotiable)
| Rule | What It Means in Practice |
|---|---|
| Aggregate, never individual | Reports show group statistics, never an identified person's salary or score |
| Minimum group size | Do not publish statistics for groups smaller than 5 people |
| Suppress small cells | In pivot tables, replace small-cell values with NaN / "-" |
| Access controls | HR data should not be in shared drives accessible to general analytics teams |
| Purpose limitation | Analyze for the stated purpose only; do not repurpose HR data for other uses without explicit authorization |
# The minimum group size rule in code
MIN_GROUP_SIZE_FOR_REPORTING = 5
def safe_aggregate(df, group_col, metric_col, min_group=MIN_GROUP_SIZE_FOR_REPORTING):
"""Return group statistics, suppressing groups below minimum size."""
result = df.groupby(group_col)[metric_col].agg(
count="count", median="median"
).reset_index()
result.loc[result["count"] < min_group, "median"] = None
return result
Core HR Metrics at a Glance
| Metric | Formula | What It Tells You |
|---|---|---|
| Turnover Rate | Separations / Avg Headcount × 100 | How much of the workforce left in a period |
| Voluntary Turnover Rate | Voluntary Separations / Avg Headcount × 100 | Pull/push factors — culture, pay, opportunity |
| Absence Rate | Total Absence Days / (HC × Working Days) × 100 | Engagement and workload signal |
| FTE Count | FT + (PT count × PT hours / FT hours) | Normalized workforce capacity |
| Compa-Ratio | Employee Salary / Pay Band Midpoint | Position within pay band |
| Early-Tenure Attrition | Seps within 1 year / Avg Headcount × 100 | Onboarding and expectation alignment |
| Supervisor Span of Control | IC Headcount / Manager Headcount | Management capacity |
Turnover Rate Formula
def calculate_turnover_rate(separations_df, avg_headcount, period="annual"):
"""
Turnover rate = (number of separations / average headcount) x 100
Always separate voluntary from involuntary:
- Voluntary = employees choosing to leave (culture/pay/opportunity signal)
- Involuntary = company-initiated (performance management signal)
"""
total = len(separations_df)
voluntary = len(separations_df[
separations_df["separation_reason"] == "Voluntary Resignation"
])
return {
"total_turnover_rate": round(total / avg_headcount * 100, 1),
"voluntary_turnover_rate": round(voluntary / avg_headcount * 100, 1),
}
Tenure Distribution Buckets
Standard buckets for tenure analysis:
| Bucket | Label | Watch For |
|---|---|---|
| 0–12 months | New hires | High concentration = onboarding problem |
| 1–3 years | Early career | Should be largest bucket in a growing company |
| 3–5 years | Experienced | Attrition here = career advancement gap |
| 5–10 years | Senior staff | Institutional knowledge holders |
| 10+ years | Long-tenured | Retirement cliff risk if cohort is large |
Compensation Analysis: Use Median, Not Mean
# WRONG for compensation analysis — one CEO salary distorts the mean
avg_salary = df["annual_salary"].mean()
# RIGHT — median is robust to high-earner outliers
median_salary = df["annual_salary"].median()
# COMPLETE picture — show the distribution
salary_stats = df.groupby("department")["annual_salary"].agg(
count="count",
median="median",
p25=lambda x: x.quantile(0.25),
p75=lambda x: x.quantile(0.75),
)
The mean-to-median ratio (mean / median) is itself informative: ratios above 1.15 indicate significant right skew from high earners.
Compa-Ratio Interpretation
| Range | Label | Typical Interpretation |
|---|---|---|
| Below 0.80 | Significantly below band | Possible underpayment; retention risk |
| 0.80–0.85 | Below midpoint | Below expected range; review warranted |
| 0.85–1.15 | In range | Competitive, within band expectations |
| 1.15–1.25 | Above midpoint | Top performer premium or pay creep |
| Above 1.25 | Above band | Reclassification or pay review warranted |
Headcount Projection Formula
def quarterly_hires_needed(
headcount, annual_growth_rate, annual_attrition_rate
):
"""Quick estimate of quarterly hires needed."""
backfill = headcount * annual_attrition_rate / 4
growth = headcount * annual_growth_rate / 4
return round(backfill + growth)
# Example: 250 employees, 18% attrition, 5% growth
q1_hires = quarterly_hires_needed(250, 0.05, 0.18) # ≈ 14
FTE Calculation
def fte_total(full_time_count, part_time_count,
ft_hours=2080, pt_hours=1040):
"""Total FTE = FT count + (PT count × PT hours / FT hours)."""
return full_time_count + part_time_count * (pt_hours / ft_hours)
Standard values: 2,080 hours/year for FT (52 weeks × 40 hrs); 1,040 for half-time PT.
Pivot Table for HR Reporting
# Classic headcount pivot: department × job level
pivot = df.pivot_table(
index="department",
columns="job_level",
values="employee_id",
aggfunc="count",
fill_value=0,
)
pivot["Total"] = pivot.sum(axis=1)
pivot.loc["Total"] = pivot.sum()
# IMPORTANT: Before sharing, check all cells >= MIN_GROUP_SIZE_FOR_REPORTING
small_cells = (pivot < MIN_GROUP_SIZE_FOR_REPORTING) & (pivot > 0)
pivot[small_cells] = None # Suppress small cells
Dashboard Design for HR Data
Appropriate chart types for HR data:
| Data | Recommended Chart | Why |
|---|---|---|
| Tenure distribution | Histogram | Shows the full distribution shape |
| Turnover by department | Horizontal bar (color-coded by severity) | Easy comparison, space for department names |
| Salary by department | Bar with IQR error bars | Shows spread without exposing individuals |
| Separations over time | Bar chart or line | Reveals trend and seasonal patterns |
| Headcount vs. target | Line chart with reference line | Instantly shows over/under vs. plan |
| Headcount by category | Stacked bar | Shows composition and total simultaneously |
Avoid: Box plots for salary data in shared reports (quartile lines can approach individual values in small groups). Scatter plots that could reveal individual data points.
Benchmarks: Where to Find Real Ones
| Metric | Source |
|---|---|
| Turnover rates by industry | SHRM Human Capital Benchmarking Report (annual) |
| Time-to-hire by role | LinkedIn Talent Insights, SHRM |
| Absence rates | Bureau of Labor Statistics (BLS) Economic News Release |
| Compensation ranges | Bureau of Labor Statistics Occupational Employment Survey; Mercer; Willis Towers Watson |
| Span of control norms | Deloitte Human Capital Trends; McKinsey Organization Practice |
Always cite the source and year when using benchmarks. Never present a number as an industry benchmark without verifying it.
The South Warehouse Diagnostic Framework
When any department shows elevated turnover, test these hypotheses in order:
- When are they leaving? — Tenure-at-separation analysis reveals whether the problem is early (onboarding) or late (career ceiling)
- Why are they leaving? — Voluntary vs. involuntary split narrows the cause
- When during the year? — Seasonal index reveals cyclical vs. structural issues
- Is it specific to this location/team? — Comparison to equivalent departments elsewhere
- What does the cost estimate suggest? — Is the investment in a solution justified by the cost of the problem?
Functions Summary
from hr_analytics import (
generate_employee_dataset, # Create synthetic dataset for practice
generate_separations_history, # Create synthetic separations history
calculate_turnover_rate, # Overall and voluntary/involuntary rates
turnover_by_department, # Turnover rates per department
turnover_by_tenure_bucket, # When in tenure are people leaving?
seasonal_turnover, # Monthly and seasonal patterns
tenure_distribution, # Bucket breakdown + summary stats
compensation_analysis, # Group-level salary statistics
compa_ratio, # Salary vs. pay band midpoint
pay_band_analysis, # Group-level band positioning
absence_rate, # Company and department absence metrics
headcount_projection, # Quarterly hiring requirements
fte_calculation, # Full-time equivalent total
hr_pivot_summary, # Department × level headcount pivot
build_hr_dashboard, # 4-panel matplotlib dashboard
anonymize_salaries, # Aggregate + suppress small groups
print_hr_summary, # Formatted console summary
)
One-Paragraph Summary
Chapter 30 builds a Python toolkit for responsible HR analytics. Starting with the ethical foundation — aggregate-only reporting, minimum group sizes, privacy-by-default — the chapter builds functions for calculating turnover rates (overall, voluntary, involuntary, by department, by tenure bucket, and by season), analyzing compensation at the group level using median and interquartile range, computing compa-ratios against pay band midpoints, tracking absenteeism, projecting quarterly headcount needs, and visualizing HR data in ways that inform decisions without exposing individuals. The Acme Corp case study demonstrates the full diagnostic workflow: Priya identifies that South warehouse has 34% annual turnover (vs. 26% company warehouse average), determines through tenure-at-separation and seasonal analysis that the problem is concentrated in early-tenure Q3 hires, and builds an anonymized, actionable report that leads to an onboarding program and retention bonus — reducing South warehouse early-tenure attrition from 46% to 28% within one year. The Evergreen Gardens case study shows the same toolkit applied to a seasonal hiring sustainability question, demonstrating that people analytics serves strategic business questions, not just HR compliance.