Chapter 30 Exercises: HR Analytics and People Data

These exercises are organized across five tiers. Before beginning any exercise involving real or realistic employee data, review the privacy principles in Section 30.1 of the chapter.

Before you start: Generate the synthetic Acme Corp dataset using the functions in hr_analytics.py:

from hr_analytics import generate_employee_dataset, generate_separations_history
employee_df = generate_employee_dataset(seed=42)
separations_df = generate_separations_history(employee_df, seed=42)

Tier 1: Foundations (Direct Application)

Exercise 1.1 — Workforce Summary

Using the synthetic employee dataset, compute and print: 1. Total headcount 2. Number of full-time vs. part-time employees 3. Total FTE count 4. Number of departments 5. Number of distinct locations

Then compute and print the same metrics for each department separately.

Exercise 1.2 — Turnover Rate

Compute the overall 2023 turnover rate for Acme Corp using calculate_turnover_rate(). Then: 1. What is the total turnover rate? 2. What is the voluntary turnover rate? 3. What is the involuntary turnover rate? 4. Is the voluntary rate higher or lower than the SHRM national average (approximately 12–15% for US companies)?

Exercise 1.3 — Tenure Distribution

Use tenure_distribution() to compute Acme's tenure distribution. Answer: 1. What is the median tenure? 2. What percentage of the workforce has been there less than one year? 3. What percentage has been there more than five years? 4. Which tenure bucket (e.g., "1–2 years") has the most employees?

Create a bar chart showing employee count per tenure bucket. Label each bar with the count.

Exercise 1.4 — Compensation by Department

Use compensation_analysis() to compute median and quartile salary statistics by department. Answer: 1. Which department has the highest median salary? 2. Which department has the widest P25–P75 range (most spread)? 3. What is the company-wide median salary? 4. How many departments have a median salary above the company-wide median?

Exercise 1.5 — Absence Rate

Use absence_rate() to compute absence metrics. Answer: 1. What is the company-wide absence rate? 2. Which department has the highest absence rate? 3. What is the average number of absence days per employee? 4. Does the department with the highest absence rate also have the highest turnover? What might that suggest?

Tier 2: Applied (Modified and Extended)

Exercise 2.1 — Regional Turnover Comparison

Extend turnover_by_department() to also break down turnover by location (North, South, East, West). Write a function turnover_by_region(separations, headcount_by_region) that: 1. Calculates turnover rate per region 2. Identifies which region has the highest rate 3. Adds a status column: "High" (>25%), "Medium" (15–25%), "Low" (<15%)

Print the result as a clean table sorted by turnover rate descending.

Exercise 2.2 — Early Tenure Risk Metric

Early-tenure attrition (leaving within 6–12 months) is often more costly than turnover of longer-tenured employees because training costs are lost before the employee becomes fully productive.

Write a function early_tenure_attrition_rate(separations, headcount) that: 1. Filters separations to those with tenure_at_separation_years < 1.0 2. Calculates the early-tenure attrition as a percentage of total headcount 3. Separates voluntary from involuntary early exits 4. Returns these metrics as a dictionary

Apply it to the full 2023 separations and to each department separately.

Exercise 2.3 — Compa-Ratio Analysis

Using the following pay band definitions for Acme:

pay_bands = {
    "Individual Contributor": (45_000, 62_500, 80_000),
    "Senior IC": (65_000, 85_000, 105_000),
    "Lead": (75_000, 95_000, 115_000),
    "Manager": (85_000, 108_000, 130_000),
}

For each job level: 1. Compute the median salary from the employee dataset 2. Calculate the compa-ratio against the band midpoint 3. Identify whether the level is "Below Band", "In Band", or "Above Band" 4. Compute the percentage of employees at each level whose individual salary falls below the band minimum (use the MIN_GROUP_SIZE_FOR_REPORTING rule — suppress any group smaller than 5)

Exercise 2.4 — Headcount Trend Chart

The separations data covers 2022 and 2023. Build a monthly headcount trend chart showing the evolution of Acme's headcount over those two years.

To do this, you will need to: 1. Start with a baseline headcount (assume January 2022 = 230 employees) 2. Add a synthetic monthly hiring rate (assume Acme hired 5.5 employees per month on average in 2022, 6.0 in 2023) 3. Subtract monthly separations to get end-of-month headcount 4. Plot the result as a line chart with a horizontal reference line at 250 (current target headcount)

Exercise 2.5 — Pivot Table: Salary by Department and Job Level

Build a pivot table showing median salary by department and job level, with a totals row and column. Before displaying, apply the minimum group size rule: replace any cell value derived from fewer than 5 employees with NaN and display it as "-".

Your function signature should be:

def salary_pivot_with_suppression(df, min_group=5):
    # Returns pivot table of median salaries, suppressing small cells
    ...

Tier 3: Integration (Combining Multiple Concepts)

Exercise 3.1 — Full South Warehouse Diagnosis

Reproduce the South warehouse analysis from Section 30.11 in full. Your analysis must: 1. Compute the South warehouse turnover rate and compare it to all-warehouse and company-wide rates 2. Show the tenure-at-separation distribution for South warehouse vs. all departments 3. Show the monthly seasonal pattern for South warehouse 4. Calculate what percentage of South warehouse separations are voluntary 5. Estimate the cost of this turnover assuming $8,000 per separation (recruitment + training cost) 6. Present a one-paragraph written conclusion (as a Python comment or print statement) identifying the key findings

Exercise 3.2 — Manager-to-Individual Contributor Ratio

Build an analysis of Acme's manager-to-IC ratio (often called "span of control"): 1. Count managers and ICs per department 2. Compute the ratio of ICs to managers in each department 3. Identify departments where the ratio is above 8:1 (potentially understaffed management) or below 3:1 (potentially over-managed) 4. Compare the turnover rate of departments with different span-of-control ranges

Does span of control correlate with turnover in the Acme data?

Exercise 3.3 — Cohort Analysis for HR

Apply the cohort analysis concept from Chapter 28 to HR data. Build a "hiring cohort retention table": 1. Group employees by the quarter they were hired (e.g., Q1 2020, Q2 2020, etc.) 2. For each cohort, track what percentage are still employed at 6 months, 12 months, 18 months, and 24 months (use the separations data to determine who left and when) 3. Visualize as a heatmap with cohort as rows and retention period as columns 4. Identify which hiring cohort has the best 12-month retention rate

Exercise 3.4 — Cost of Turnover Report

Build a comprehensive cost-of-turnover model. Turnover cost estimates vary, but a reasonable framework: - Entry-level roles: 50% of annual salary - Individual Contributor roles: 75% of annual salary - Senior IC and Lead roles: 100% of annual salary - Manager and above: 150% of annual salary

Calculate: 1. Total 2023 turnover cost using the above multipliers and department median salaries 2. Cost by department 3. The cost that would be eliminated by reducing turnover from 18% to 15% (i.e., the business case for retention investment)

Exercise 3.5 — Compensation Equity Heatmap

Build a heatmap showing median compa-ratio by department and gender group (using the anonymized gender_anon column). Cells with fewer than 5 employees should be suppressed.

Steps: 1. Compute the compa-ratio for each employee against their job-level pay band midpoint 2. Pivot the result to (department × gender_anon) with median compa-ratio as values 3. Suppress cells with fewer than 5 employees 4. Display as a seaborn heatmap with a diverging color scale centered at 1.0

Write a note (as a comment) about what conclusions you can and cannot draw from this chart, and what additional analysis would be needed to make any compensation equity claims.

Tier 4: Challenge (Open-Ended Analysis)

Exercise 4.1 — HR Dashboard with Six Panels

Extend the four-panel HR dashboard to six panels. The two additional panels must add information not available in the original four. Candidates include: - Voluntary vs. involuntary turnover by department (stacked bar) - Tenure distribution by department (grouped bar or box plot equivalent) - Headcount trend over the analysis period - Compa-ratio distribution by job level - Absence rate by department vs. company average - Hiring cohort retention at 12 months

Choose two, build them, and write a brief explanation of what each panel shows and why it was worth including.

Exercise 4.2 — Attrition Prediction Features

Without building a machine learning model (that comes in a later chapter), build a dataset of "employee risk features" — characteristics that, based on the analysis in this chapter, might predict which employees are at higher risk of leaving: 1. Tenure (shorter = higher risk, especially 0-12 months) 2. Absence days YTD (higher = potentially disengaged) 3. Department turnover rate (higher department rate = higher environmental risk) 4. Compa-ratio (significantly below 0.85 = higher flight risk) 5. Manager tenure (shorter-tenured manager = higher team turnover in research)

Build a composite "risk score" (simple average of normalized components) for each employee. List the top 20 employees by risk score. Remember: this is an exercise. In a real organization, you would never use such a list to make employment decisions without substantial human judgment, privacy review, and legal counsel.

Exercise 4.3 — Headcount Planning Sensitivity Analysis

The headcount projection model in Section 30.7 assumes fixed growth and attrition rates. Build a sensitivity analysis that varies: - Attrition rate: 15%, 18%, 22%, 28% - Growth rate: 0%, 3%, 5%, 8%

For each combination (16 scenarios), compute the total new hires needed in the next 4 quarters. Present the results as a pivot table (attrition rates as rows, growth rates as columns) showing total annual hiring need.

Which scenario requires the most hiring? Which requires the least? What does this tell you about which variable is more important to control?

Exercise 4.4 — Benchmarking Report

Using only publicly available, verifiable sources (SHRM, BLS, Mercer, or similar), find three industry benchmarks relevant to Acme Corp (a mid-size logistics/distribution company): 1. A turnover rate benchmark 2. A time-to-hire benchmark 3. An absence rate benchmark

Write a short comparison (as print statements or a formatted table) showing Acme's metrics vs. the benchmarks and note the source and year for each benchmark. Include a sentence about what each comparison suggests for Acme's HR priorities.

Tier 5: Stretch (Beyond the Chapter)

Exercise 5.1 — Interactive HR Dashboard with Plotly

Recreate the HR dashboard using Plotly Express with interactive features: - Dropdown to filter by department or location - Hover tooltips showing exact values - A slider for filtering the tenure histogram by minimum/maximum tenure

Install: pip install plotly

Save as HTML: fig.write_html("acme_hr_interactive.html")

Exercise 5.2 — HR Data from Excel

Many HR systems export data to Excel rather than CSV. Build a function load_hr_from_excel(filepath) that: 1. Reads an Excel file with multiple sheets: "Employees", "Separations", "Job Openings" 2. Validates required columns in each sheet 3. Converts data types appropriately (dates, salary as float, etc.) 4. Returns a dictionary of three DataFrames

Test your function with a synthetic Excel file you create using pandas.ExcelWriter.

Exercise 5.3 — Text Analysis of Separation Reasons

In real HR systems, separations often include free-text fields ("exit interview notes"). Build a simple text frequency analysis: 1. Create 50 synthetic exit interview comments (mix of positive and negative themes) 2. Clean the text (lowercase, remove punctuation, remove stopwords) 3. Compute word frequencies and identify the top 20 most common words 4. Visualize as a word cloud using the wordcloud library (install: pip install wordcloud) or a bar chart of word frequencies

What themes emerge? How would you present this to HR leadership while protecting anonymity (since text can be identifying)?

Exercise 5.4 — Logistic Regression Attrition Model

Using scikit-learn's LogisticRegression, build a simple model to predict which employees might leave: 1. Create a binary target variable: left_in_2023 (1 if the employee appears in 2023 separations, 0 if currently employed) 2. Features: department (encoded), tenure_years, annual_salary, absence_days_ytd 3. Split 80/20 train/test, train the model, and evaluate accuracy and ROC-AUC 4. Print the most predictive features 5. Write a note (as a comment) about the ethical limitations of using this model in a real HR decision-making context

Install: pip install scikit-learn

Exercise 5.5 — HR Metrics API

Build a simple HTTP API for HR metrics using FastAPI (install: pip install fastapi uvicorn):

Endpoints to implement: - GET /summary — returns JSON with the workforce summary metrics - GET /turnover/{year} — returns turnover rate for the specified year - GET /compensation/{department} — returns aggregated (anonymized) compensation stats for the specified department - GET /headcount-projection — returns the 4-quarter headcount projection

All endpoints should enforce the minimum group size rule before returning data. Run locally with uvicorn app:app --reload and test with a browser or curl.