Chapter 32 Quiz

DataField.Dev

Chapter 32 Quiz

Twenty questions to check your grasp of predictive modeling for underwriting. Fifteen multiple-choice, five short-answer. Answers are in the collapsed key at the bottom — try the whole set before opening it. All figures are illustrative.

Multiple choice

The single most important statistical advantage a predictive model has over a classical one-way rate table is that it: a. uses more data b. runs faster c. estimates all rating factors simultaneously, disentangling correlated effects d. is always more accurate on novel risks
An insurance pricing GLM typically models the number of claims with which distribution? a. gamma b. Poisson c. normal (Gaussian) d. uniform
An insurance pricing GLM typically models the size of a claim (given one occurred) with which distribution? a. Poisson b. binomial c. gamma d. normal (Gaussian)
The "log link" in a frequency GLM is convenient for underwriters mainly because it makes the individual factor effects: a. add up on the normal scale b. multiply on the normal scale, reproducing a rate-table structure c. cancel out d. impossible to interpret
Compared with a GLM, a gradient boosting machine (GBM) generally offers: a. better interpretability but worse accuracy b. better accuracy (it finds interactions automatically) but worse interpretability c. identical accuracy and interpretability d. neither better accuracy nor better interpretability
The chapter's working rule of thumb is: a. always use a GBM; GLMs are obsolete b. always use a GLM; GBMs are untrustworthy c. GLM where you must explain the price; GBM where you must rank the risk d. use whichever model has the higher training accuracy
"Overfitting" means a model has: a. too few variables to be useful b. learned the noise in the training data rather than the signal, so it fails out-of-sample c. been filed with the regulator too early d. too small a Gini coefficient
The cardinal rule of model validation is: a. always use the largest possible training set b. never judge a model on the data it was trained on c. prefer the model with the most trees d. validate only on the most recent month of data
Lift measures a model's ability to: a. set an adequate overall price level b. separate good risks from bad (sort/rank risk) c. comply with state filing rules d. detect fraud
A lift chart whose ten deciles all run at roughly the same loss ratio (near 100%) indicates a model that: a. has excellent lift b. sorts risk almost perfectly c. has essentially no useful discriminating power d. is overfit
The Gini coefficient for a pricing model is best described as: a. the model's overall price adequacy b. a single number from ~0 (random) toward 1 (perfect) summarizing how well the model separates risk c. the loss ratio of the worst decile d. the number of trees in the ensemble
Strong lift, by itself, does not prove that: a. the model separates good risks from bad b. the worst decile is worse than the best c. the model's overall price level is adequate d. the model sorted the risks
Image-based underwriting with neural networks is most transformative because it: a. replaces the need for any human inspection entirely b. interprets data with no natural rows and columns (images, satellite tiles) at scale c. is cheaper than a GLM to file with regulators d. eliminates overfitting
The chapter argues that what usually decides a model's result is not the algorithm but: a. the programming language b. the feature engineering — the construction and selection of inputs c. the number of deciles d. the choice of link function
A documented override of a model's recommendation is the most important professional artifact in model-era underwriting primarily because: a. it lets the underwriter ignore the model whenever convenient b. an undocumented override is indistinguishable, to an auditor, from caprice and, to a regulator, from bias c. it increases the model's Gini d. it is required to file the rate

Short answer

In two or three sentences, explain why an insurance GLM splits the price into a frequency model and a severity model, and what underwriting insight that split provides.
A vendor reports their model is "94% accurate." Name the two things you must ask before that number means anything to you, and say why each matters.
Name the three situations that justify overriding a model's recommendation, and state the one thing all three have in common.
Explain why "the algorithm selected it for predictive power" is not, by itself, a regulatory defense for including a variable in a pricing model. (Name the danger.)
Harbor Steel's model scored a 7/10 and the underwriter overrode to a 6. State the specific category of override justification (from §32.7) the underwriter relied on, and give one concrete fact the model could not see that fits that category.

Answer key (try the quiz first)

**Multiple choice** 1. **c** — Simultaneous (multivariate) estimation, which disentangles correlated factors, is the core gain; speed, volume, and data are secondary. (§32.1, §32.2) 2. **b** — Poisson, the standard distribution for counts of rare events. (§32.2) 3. **c** — Gamma, for positive, right-skewed claim amounts. (§32.2) 4. **b** — On the log scale effects add, which is the same as multiplying on the normal scale, reproducing a multiplicative rate table. (§32.2) 5. **b** — A GBM typically predicts better and finds interactions automatically, but is far less interpretable. (§32.3) 6. **c** — GLM to explain the price; GBM to rank the risk. (§32.3) 7. **b** — Overfitting is learning noise, not signal, so the model fails on data it hasn't seen. (§32.3) 8. **b** — Never judge a model on its training data; only out-of-sample performance is honest. (§32.6) 9. **b** — Lift measures separation/ranking of risk. (§32.6) 10. **c** — A flat lift chart means the model discriminates nothing useful. (§32.6) 11. **b** — The Gini summarizes separation in one number from ~0 (random) toward 1 (perfect). (§32.6) 12. **c** — Lift proves *ranking*, not *price adequacy*; a model can sort perfectly and still be priced too low. (§32.6, Ch.11) 13. **b** — Neural networks shine on unstructured data (images, satellite tiles), not because they replace inspection or eliminate overfitting. (§32.4) 14. **b** — Feature engineering; a mediocre algorithm on excellent features usually beats a brilliant algorithm on raw inputs. (§32.5) 15. **b** — Without documentation an override looks like caprice to an auditor and bias to a regulator, regardless of whether it was right. (§32.7) **Short answer** 16. The GLM splits the price because risk has two dimensions — how *often* losses occur (frequency, modeled Poisson) and how *large* they are when they do (severity, modeled gamma). Multiplying the two gives the modeled pure premium. The split lets the underwriter see *which dimension* a characteristic drives — youth may raise frequency but not severity; a luxury vehicle the reverse — which is exactly the diagnosis underwriting wants. (§32.2) 17. (1) **Was it measured out-of-sample** — on data the model never saw (ideally a later time period)? In-sample accuracy is meaningless because the model bent itself to fit that data. (2) **Accurate at what, against what baseline** — accuracy on a mostly-zero claim outcome can be high and useless; you want lift/Gini against the incumbent. Without both, "94% accurate" is noise. (§32.6) 18. (1) The model is **missing a material fact** (a signed contract, new management, a just-installed control). (2) The model is **out of its domain** (novel risk, thin data, extrapolating). (3) The model is **demonstrably wrong on this case** (a data error, a misread image, a miscoded class). Common to all three: the underwriter has **information or context the model did not have** — not merely a disagreement or a "feeling." (§32.7) 19. Because a model can *launder a prohibited factor* into a price through correlation — a legal-looking variable can be a **proxy** for a protected class, producing unfair discrimination the model never names (Ch.4, Ch.35). The filing must show the variables are risk-related and permitted, and several states require a documented disparate-impact test; "the algorithm chose it" does not satisfy that. (§32.1, §32.5) 20. The category is **"the model is missing a material fact"** (override justification one). A concrete fact the model could not see: the **signed roof-replacement contract** (or the new hot-work permit program, or the management change behind the loss history) — corrective controls that exist but never reached the model's inputs, and that turn the roof from a decline-driver into a time-limited subjectivity. (§32.7, The Underwriting File)