40 min read

> *"It is the mark of an instructed mind to rest satisfied with the degree of precision which the nature of

Prerequisites

  • 1
  • 3
  • 6
  • 8
  • 9

Learning Objectives

  • Decompose expected loss into frequency and severity, and describe the shape of each as a distribution rather than a single number.
  • Compute and interpret a loss ratio, distinguishing earned premium from written premium and incurred losses from paid losses.
  • Build a pure premium (loss cost) from frequency and severity and explain why it is the expected-loss core of every rate.
  • Apply trend and loss development to convert immature historical losses into an estimate of next year's cost.
  • Define credibility and decide, for a given body of experience, how much weight that experience deserves against the class.
  • Credibility-weight a risk's own loss ratio against the class loss ratio and state, honestly, what the blend can and cannot tell you.
  • Read a small loss history — two fires in five years — as a credibility problem rather than a verdict, and defend the reading.

Chapter 10: The Mathematics of Risk and Price: Frequency, Severity, Loss Ratios, and Credibility

"It is the mark of an instructed mind to rest satisfied with the degree of precision which the nature of the subject admits, and not to seek exactness where only an approximation of the truth is possible." — Aristotle, Nicomachean Ethics. No sentence describes the underwriter's relationship to the numbers better: you will work with means and ratios and credibility weights every day, and the whole skill is knowing how much precision the data can honestly carry — and refusing to pretend to more.

Overview

Here is the problem on your desk this chapter. You have the Harbor Steel loss runs in front of you, freshly ordered (Chapter 8) and the risk assessed (Chapter 9): two fire losses in five years, roughly \$180,000 in 2021 and roughly \$1.2 million in 2023. Your manager wants a number — what should this account cost — and your first instinct is to do the obvious arithmetic: add the losses, divide by the years, and call it the annual cost. That arithmetic is not wrong so much as dangerous. Two losses is a tiny sample. The \$1.2 million fire could be a once-in-twenty-years event that happened to land in your five-year window, or it could be the leading edge of a hot-work hazard that will recur. The naive average treats those two possibilities identically, and an underwriter who prices off it will either wildly overcharge a good account and lose it to a competitor, or wildly undercharge a bad one and book a loss that surfaces in three years.

This chapter is the math that turns losses into a defensible price — and, just as important, the humility the math demands. You are not an actuary; you will rarely build a rating plan from scratch (that is Chapter 11, and the actuaries own much of it). But every underwriter needs to read a loss run the way an actuary reads it: to see expected loss as frequency times severity rather than a lump sum, to compute a loss ratio and know exactly what it does and does not measure, to build a pure premium and load it into a rate, to trend and develop immature losses so last year's numbers speak to next year, and — the heart of the chapter — to apply credibility: the disciplined answer to the question how much should I trust this risk's own experience versus what I know about its class? Credibility theory is the mathematics of not fooling yourself with a small sample, and it is the most underrated idea in underwriting.

In this chapter, you will learn to:

  • Decompose expected loss into a frequency distribution and a severity distribution, and explain why each is a shape, not a point.
  • Compute a loss ratio correctly — earned premium, incurred losses — and read what it is telling you.
  • Build a pure premium (the loss cost) and locate it as the expected-loss core of any rate.
  • Apply trend and development to turn immature, dated losses into an estimate of future cost.
  • Define credibility (full and partial) and decide how much weight a body of experience deserves.
  • Credibility-weight a risk's own loss ratio against its class, and state what the blend can and cannot do.

Learning Paths

This is the most quantitative chapter in Part II, but the math is all in service of a decision, and every track needs it. Here is what each should weight.

🏠 Personal Lines: Frequency and severity (§10.1) and credibility (§10.5–10.6) are why your own auto or homeowners experience barely moves your individual premium — a single household is almost zero credibility, so the class does the work. Watch how partial credibility explains the whole logic of personal-lines rating. 🏢 Commercial Lines: This is your chapter. Everything here feeds the Harbor Steel pricing in Chapter 11. The loss-ratio and trend/development sections (§10.2, §10.4) are exactly what you do to a commercial loss run before you quote, and §10.5–10.6 are why a mid-size account is partially credible. 📊 Analytics: The frequency/severity decomposition (§10.1) is the conceptual skeleton of the GLMs in Chapter 32 (Poisson frequency, gamma severity); credibility (§10.5–10.7) is Bayesian shrinkage by another name. The Bühlmann glimpse in §10.7 connects directly to the credibility you will see in code. 📜 Certification: Pure premium, loss ratio, trend and development, and full/partial credibility (including the square-root rule and the full-credibility standard) are heavily tested across AINS, AU, and CPCU. The worked examples here map to the exam vocabulary.


10.1 Frequency and severity as distributions

Start with the single most useful mental move in all of insurance pricing: never look at a loss as one number. Frequency — how often losses happen — and severity — how bad each one is when it does — were introduced as the two dimensions of risk in Chapter 6. Here we put them to work as the building blocks of price. The reason to keep them separate is that they behave differently, they are driven by different hazards, and they respond to different controls. Frequency is usually a story about exposure and behavior — how many welders, how many truck-miles, how careless the housekeeping. Severity is usually a story about what is at stake when it goes wrong — the value of the building, the size of the verdict, whether a fire reaches the paint store next door. Two accounts with the same total losses can be completely different risks: one with many small claims (a frequency problem, often fixable with loss control) and one with a single catastrophic claim (a severity problem, often a matter of limits and structure). Lump them together and you cannot tell which account you are looking at.

The deeper point is that frequency and severity are not constants — they are distributions. A frequency distribution is the set of probabilities that a risk produces 0, 1, 2, 3, … losses in a period. A severity distribution is the set of probabilities that a given loss, once it occurs, costs various amounts. The expected loss for the period is, in the simplest case, the expected frequency multiplied by the expected severity. Here is the structure in a worked form.

EXPECTED LOSS = FREQUENCY × SEVERITY            [constructed teaching example]

  A small fabrication account, per year:
    expected frequency (claims/yr)   = 0.40        (a claim roughly every 2.5 years)
    expected severity (per claim)    = $90,000     (the average cost of a claim that occurs)
    ──────────────────────────────────────────
    expected annual loss             = 0.40 × $90,000 = $36,000

  Read it aloud: "We do not expect a loss most years; but averaged over many years and many
  similar shops, this exposure costs about $36,000 a year in losses." That $36,000 is the
  expected loss — the raw material of the price, before expenses and profit (Chapter 11).

That product — expected frequency times expected severity — is the spine of the chapter. But the word expected hides the thing that makes insurance hard, and an underwriter who forgets it gets burned. Frequency and severity each have a shape, and the shapes are very different from each other.

Frequency, for most lines, is low-count and lumpy. A single account might average 0.4 claims a year, but it cannot have 0.4 claims in any actual year — it has zero, or one, or (rarely) two. Across many similar accounts, claim counts tend to follow a distribution that actuaries usually model as Poisson (you will meet it by name in Chapter 32): most accounts have zero in a given year, some have one, a few have two, and the long tail of three-or-more is thin but real. The practical consequence for you is that zero losses in a year tells you almost nothing. A shop that "had a clean year" may simply have rolled a zero on a die that still comes up one every couple of years. Frequency stabilizes only over many exposures or many years — the law of large numbers from Chapter 1, applied to claim counts.

Severity is the more treacherous of the two, because it is heavily skewed — what statisticians call a long right tail. Most claims are modest; a small fraction are enormous; and the average is dragged upward by the rare huge one. A fire that smolders and is caught by the sprinklers costs a few thousand dollars; the one that reaches the finished-goods inventory and the roof costs over a million. If you describe severity by its mean alone, you will systematically misunderstand the risk, because the mean of a skewed distribution is not the typical claim — it sits well above the median, pulled up by the tail.

SEVERITY IS SKEWED — illustrative claim sizes for a property line   [constructed teaching example]

   number of claims
   (taller = more common)
     │
   █ │ █
   █ │ █ █
   █ │ █ █ █
   █ │ █ █ █ █ ▁                          ▁                          ▁
   └─┴─┴─┴─┴─┴──┴──────────────────────────┴──────────────────────────┴──►  claim size
     $2K       $25K        $90K          $250K                      $1.2M
                            ▲ mean is pulled out here by the tail →
   median claim ≈ $25K   ·   MEAN claim ≈ $90K   ·   the $1.2M tail event sets the whole average

  The "typical" claim is ~$25K, but the AVERAGE claim is ~$90K because a few large losses dominate.
  Price off the mean and you must also respect the tail; price off the median and you will be ruined
  by the losses you pretended were rare.

📋 At the Desk When you read a loss run, separate the two dimensions on purpose. Count the claims and look at the pattern of counts (is it one bad year or a steady drip?); that is your frequency read. Then line the claim amounts up from smallest to largest and look at the shape — a cluster of small ones with one giant outlier is a severity story, and the giant outlier deserves its own paragraph in your file (what caused it, can it recur, was it capped by a limit?). The single most common rookie error is to compute total-losses-divided- by-years and stop. That one number blends a frequency signal and a severity signal that you needed to keep apart, and it throws away the information that would have told you which account you are pricing.

There is one more reason the distinction earns its keep: frequency and severity demand different fixes, and that is half of what you sell the insured. A frequency problem — many small claims — is usually a loss-control and housekeeping problem (Chapter 9), and it is also a candidate for a higher deductible, which sweeps the small attritional claims off your books and back onto the insured's, where they belong (Chapter 12). A severity problem — rare but huge — is a limits and reinsurance problem; no deductible fixes it, because the insured cannot retain a million-dollar loss, and your defense is the policy limit, the right terms, and the reinsurance behind you (Chapter 27). Diagnosing which problem an account has is not academic; it determines the entire structure of the deal.

🔍 Check Your Understanding 1. Two restaurants each cost you \$36,000 a year in expected losses. One has expected frequency 1.2 claims at \$30,000 each; the other has expected frequency 0.12 claims at \$300,000 each. Which is the "severity" risk, which is the "frequency" risk, and which would a higher deductible help more? 2. A welding shop had zero claims last year. Your colleague says "clean year — give them a credit." Using the idea that frequency is a low-count distribution, what is wrong with reading one zero-claim year as evidence the risk is good?


10.2 The loss ratio and what it really measures

The single most important number in day-to-day underwriting is the loss ratio, owned and formally defined back in Chapter 3 as a component of the combined ratio. We use it constantly here, so hold the definition in front of you: the loss ratio is incurred losses divided by earned premium, expressed as a percentage. If an account or a book brought in \$1,000,000 of earned premium and generated \$650,000 of incurred losses, the loss ratio is 65% — for every premium dollar earned, sixty-five cents went out the door in losses (and loss-adjustment expense, if you include it, which most working loss ratios do). Add the expense ratio and you have the combined ratio; above 100%, the underwriting lost money. Everything in this chapter ultimately feeds that judgment.

The loss ratio looks like trivial arithmetic, and that is exactly why it is dangerous. Both the numerator and the denominator have a correct form and several misleading forms, and using the wrong one is one of the most common ways an underwriter — or a whole book — fools itself. Take the denominator first.

Earned premium, not written premium. Written premium is the full premium on a policy the moment you bind it; earned premium is the portion that corresponds to coverage time that has actually elapsed. A one-year policy written on January 1 for \$12,000 is \$12,000 written on day one but only \$6,000 earned by June 30 — the other \$6,000 is "unearned," still owed back to the insured if they cancel. Why does this matter for a loss ratio? Because if your book is growing, you are writing lots of new premium that has not yet had time to earn or to produce its losses, and a loss ratio computed on written premium will look artificially wonderful — a flood of fresh premium in the denominator, the losses not yet arrived in the numerator. Growth can disguise a bad book for a year or two. The loss ratio that tells the truth uses earned premium.

Incurred losses, not paid losses. Paid losses are the dollars actually disbursed so far. Incurred losses are paid losses plus case reserves on open claims plus an estimate for losses incurred but not yet reported (the actuary's IBNR). A loss ratio on paid losses alone understates the truth for any line where claims take time to settle, because the big payments are still in the future, sitting in reserves you have not counted. For a fast line (property) paid and incurred converge quickly; for a slow, "long-tail" line (liability, workers' comp) they can differ enormously for years. An underwriter who judges a liability book by its paid loss ratio in year one is reading a number that is almost guaranteed to deteriorate.

THE LOSS RATIO — same account, four different "answers"        [constructed teaching example]

  Account: $1,000,000 written premium, mid-year growth, a long-tail liability component.

                              numerator            denominator        loss ratio   honest?
  paid / written              $300,000 paid        $1,000,000 written     30%       NO — flatters twice
  paid / earned               $300,000 paid        $   750,000 earned     40%       better, still immature
  incurred / written          $620,000 incurred    $1,000,000 written     62%       denominator too big
  incurred / earned           $620,000 incurred    $   750,000 earned     83%       ← the one that tells truth

  The same account is a 30% darling or an 83% problem depending only on which losses and which premium
  you put in the ratio. Always ask: incurred-over-earned, or you are not looking at the real number.

⚠️ Underwriting Trap The most expensive loss-ratio trap is the immature long-tail book that looks brilliant. A new book of liability or workers' comp business reports a gorgeous paid loss ratio in its first year — the premium is in, the claims have barely begun, the big ones are years from settling. Management celebrates, the appetite opens, the book doubles. Then the losses develop: reserves strengthen, late claims report, and the true loss ratio climbs toward a number that was always going to be there. The discipline is to judge a young book on incurred-and-developed losses against earned premium, and to distrust any loss ratio that looks too good on a line you know is slow to settle. The losses you cannot see yet are still your losses.

There is also the question of whose loss ratio you are looking at, and over what window. A single account's loss ratio over one year is almost meaningless — one claim swings it wildly (that is the credibility problem, §10.5, arriving in force). A loss ratio is far more stable and more honest over a book of similar accounts, or over a multi-year window for one account. When you pull Harbor Steel's experience, you will not compute a one-year loss ratio and act on it; you will look at the multi-year pattern, weight it for credibility, and compare it to the class loss ratio for metal fabrication. The loss ratio is the destination; frequency, severity, trend, development, and credibility are how you get to one you can trust.

📋 At the Desk A working definition you can keep on a card: loss ratio = incurred losses ÷ earned premium, and the "permissible" or "target" loss ratio is 1 minus the expense-and-profit load. If your insurer must spend about 30% of premium on expenses and wants about 5% for profit and contingencies, then 100% − 35% = a 65% permissible loss ratio — the loss ratio at which the account exactly hits plan. That single number is the bridge between this chapter and pricing: when Chapter 11 builds a rate, it is really solving for the premium at which the expected loss ratio equals the permissible one. Anything you write that you expect to run above the permissible loss ratio is, by definition, planned underwriting loss — sometimes justified for a relationship or a growth play, but never by accident.


10.3 Pure premium: the expected-loss core of any rate

We met the pure premium in Chapter 1 — the expected loss per exposure, the \$300 in the 100,000-home town. Now we build it deliberately, because it is the foundation stone of every rate you will ever quote. The pure premium (also called the loss cost, the term the rating bureaus like ISO and NCCI use when they publish it) is the expected loss per unit of exposure: the dollars of loss, on average, that one exposure unit is expected to generate in a period. It is, in its simplest form, exactly the frequency × severity product from §10.1 expressed per exposure unit. Two equivalent ways to compute it:

$$\text{Pure premium} = \text{Frequency} \times \text{Severity} = \frac{\text{Number of claims}}{\text{Number of exposures}} \times \frac{\text{Total losses}}{\text{Number of claims}} = \frac{\text{Total losses}}{\text{Number of exposures}}$$

The two middle terms cancel into the rightmost one — total losses divided by exposures — which is why the pure premium is sometimes called the "loss cost per exposure." Both routes give the same answer; keeping the frequency × severity form in view, though, preserves the diagnostic information we fought to keep in §10.1.

BUILDING A PURE PREMIUM (loss cost)        [constructed teaching example]

  A class of metal-fabrication shops, last year, exposure base = $1,000 of payroll for WC:

    total payroll in the class         = $200,000,000   (i.e., 200,000 units of $1,000 payroll)
    total incurred losses in the class = $  3,000,000
    ─────────────────────────────────────────────────
    pure premium (loss cost)           = $3,000,000 / 200,000 = $15.00 per $1,000 of payroll

  Interpretation: for every $1,000 of payroll in this class, expect $15 of loss per year.
  This is the loss cost a bureau (e.g., NCCI) would publish — the expected-loss core, BEFORE
  the insurer adds expenses and profit to turn it into a charged rate (Chapter 11).

The pure premium is the honest part of the price — the part that reflects nothing but the expected cost of risk. Everything else added to it (the expense load to run the company, the profit-and-contingency load to reward the capital and absorb surprise) is real and necessary, but it is loading on top of this core. We build that full premium in Chapter 11; here, the discipline is to get the core right, because every error in the pure premium is multiplied by the loads and carried straight to the combined ratio.

The exposure base — the unit you divide by — is not a throwaway choice; it is a real underwriting decision, and the bureaus have spent a century choosing well. A good exposure base varies with the expected loss and is hard to manipulate. Property uses units of building value (loss scales with what is at risk); workers' comp uses payroll (loss scales with how many people are exposed and how much they earn); general liability often uses revenue or payroll or area; commercial auto uses the vehicle. Choose a base that does not track the loss and your rate fights you all year: too high for the low-exposure risks, too low for the high-exposure ones, an open door to adverse selection (Chapter 1) because the under-charged risks flock to you and the over-charged ones leave.

🤖 Model vs. Judgment A pure-premium model — a GLM (Chapter 32) fitting frequency and severity off thousands of accounts — will almost always beat your gut on the class pure premium. It has seen more shops than you ever will, and it separates the signal from the noise across the whole population. What it cannot do is see this shop's brand-new infrared-scanned electrical panel, or the fact that the one big loss in the data was a freak that management has since engineered out. The model gives you the class loss cost with authority; your job is to decide whether this risk is the class — and credibility theory (§10.5) is the formal language for exactly how much of that decision the data has earned the right to make.

🔍 Check Your Understanding 1. A class produced \$8,000,000 of losses across 400,000 exposure units. What is the pure premium per exposure unit? If the insurer's permissible loss ratio is 65%, roughly what charged rate per unit does that imply (before credits and debits)? (Hint: charged rate ≈ pure premium ÷ permissible loss ratio.) 2. Why is payroll a better workers'-comp exposure base than number of employees?


10.4 Trend and development: why last year's losses aren't this year's

You now have the loss ratio and the pure premium. But there is a quiet problem in both: the losses you are working from are old and immature, and if you price next year off them raw, you will be pricing the past. Two corrections fix this, and together they go by the name trend and development. They are easy to confuse and they do different jobs, so keep them crisp.

Trend adjusts for change over time in the cost and frequency of losses between the period the data comes from and the period you are pricing. The world does not stand still. Construction costs rise, so the same fire costs more to rebuild from this year than three years ago — that is severity trend (closely related to inflation, but not identical; "social inflation" and rising jury verdicts push liability severity faster than the consumer price index). Driving patterns, building codes, and safety technology change how often losses happen — that is frequency trend, which can run up or down. A loss from three years ago, restated to what it would cost in next year's dollars and next year's conditions, is "trended." Skip the trend and you under-price in any inflationary environment, which is most of them.

Development adjusts for the fact that losses are not fully known when you first see them. A claim reported last year is still open; its reserve is an estimate that will move; some claims from last year have not even been reported yet (IBNR). Historical losses are therefore immature — they will "develop" toward their ultimate value as claims settle and late ones report. Actuaries capture this with loss development factors (LDFs): a multiplier, estimated from how losses have historically grown from one valuation to the next, that scales an immature loss up to its estimated ultimate. A property loss develops a little (it settles fast); a liability or workers'-comp loss develops a lot, sometimes for a decade.

TREND AND DEVELOPMENT — restating an old, immature loss          [constructed teaching example]

  A liability claim from accident-year 2022, valued today at $400,000 (still open):

    (1) DEVELOP to ultimate:   $400,000 × 1.25  (illustrative LDF)   = $500,000
        "claims like this have historically grown ~25% from this maturity to closed"

    (2) TREND to the future:   $500,000 × (1.05)^3  (≈3 yrs @ 5%/yr) = $578,800
        "restate to the cost level of the period we are about to price"

    ───────────────────────────────────────────────────────────────
    trended, developed (ultimate, future-cost) loss               ≈ $578,800

  The raw $400,000 on the loss run is NOT the number to price from. Untrended, undeveloped, it
  understates the true cost by ~45% here. Order matters by convention but the lesson is one: the
  number on the page is not yet the number you charge.

⚠️ Underwriting Trap Pricing off raw loss-run figures is the trap that sinks soft-market books (Chapter 11 will name the cycle). In a competitive market the temptation is to take the losses at face value — they look small, they are immature and not yet developed, and the broker is pushing for a number. You quote off the undeveloped, untrended losses, win the account, and three years later the reserves develop and the costs trend right through your premium. The discipline — and it is a discipline, because it always makes your number higher than the competitor who skipped it — is to trend and develop before you price. The underwriter who prices the past loses money in the future, every time.

For you as an underwriter, the point is not to compute development factors — the actuaries supply them, and building a development triangle is their craft, not yours. The point is to know they exist and demand them. When someone hands you a loss ratio or a pure premium, your first two questions are: "Is this trended to the policy period I'm pricing?" and "Are these losses developed to ultimate, or are they raw?" If the answer is no or "I'm not sure," the number is not yet usable, however precise it looks. This is also why a recent year of experience is paradoxically the least reliable: it is the most relevant to current conditions but the most immature and undeveloped. The art is to weight recent and older years sensibly — and that weighting question leads straight into the largest idea in the chapter.

📋 At the Desk A quick field heuristic for how much development to worry about: ask how long the line takes to pay. Property and physical-damage claims close in months — development is modest, and a recent year is close to usable. Liability, professional lines, and workers' comp pay for years — development is large, recent years are deeply immature, and you lean harder on older, more-developed years and on the class. When you build Harbor Steel's price in Chapter 11, the property losses (the two fires) need only modest development but real trend (rebuild costs have risen); the workers'-comp and liability pieces need both. Matching your skepticism to the line's payout speed is a sign you actually understand the numbers, not just the formulas.


10.5 Credibility: how much to trust this risk's own experience

Now the central question, the one the whole chapter has been circling. You have Harbor Steel's own loss experience — two fires in five years — and you have the class experience for metal fabrication, the loss cost the bureau publishes off thousands of similar shops. The two disagree: Harbor Steel's own raw experience, dominated by that \$1.2 million fire, looks worse than the class average. So which do you believe? Do you price Harbor Steel on its own bad experience, or on the class average, or somewhere in between — and if in between, exactly where? That question has a name and a mathematics, and it is the most useful piece of actuarial thinking an underwriter can own. It is called credibility.

Credibility is the weight you assign to a body of loss experience when you use it to predict the future — a number $Z$ between 0 and 1 that says how much you trust this experience versus a broader benchmark. $Z = 1$ is full credibility: the experience is voluminous enough to stand entirely on its own, and you price off it directly. $Z = 0$ is no credibility: the experience is too thin to trust at all, and you price off the class. Everything real lives in between, at partial credibility: $0 < Z < 1$, where you blend the risk's own experience with the class. The whole game is choosing $Z$ honestly.

Why is a small sample untrustworthy? Return to §10.1. Frequency is a low-count, lumpy distribution; severity is heavily skewed. With only a few claims, a single large loss — or a single lucky clean stretch — swings the measured experience enormously, and most of that swing is noise, not signal about the underlying risk. Two fires in five years could come from a genuinely fire-prone operation, or from an ordinary shop that caught two bad breaks; the data alone cannot tell you which, because the sample is too small for the law of large numbers to have done its smoothing. Credibility is the formal, defensible way of saying "this sample is too small to fully believe — lean partway back toward what the class tells us."

THE CREDIBILITY DIAL                              [constructed teaching example]

  Z = 0 ──────────────────────────────────────────────────────────► Z = 1
  "ignore this risk's own         this risk's experience        "trust this risk's
   experience; use the class"     partly believed; blend         own experience fully"

  one household's auto        a mid-size commercial         a huge national fleet
  (≈ no credibility)          account (PARTIAL — the         or a whole state's class
   → price off the class       interesting case)             (FULL — stands alone)

  More exposure / more claims / less volatile line  →  Z moves toward 1.
  Fewer exposures / a skewed, large-loss line        →  Z stays near 0.

How is $Z$ actually set? Two traditions, and you should know both by name. The older, simpler one is classical (limited-fluctuation) credibility: pick a standard for full credibility — typically a number of claims large enough that the experience is statistically stable (a commonly cited benchmark is on the order of about a thousand claims for full credibility of pure premium, though the exact figure depends on the line and the tolerance chosen) — and then assign partial credibility by the square-root rule:

$$Z = \sqrt{\frac{n}{N}}$$

where $n$ is the number of claims (or exposures) you actually have and $N$ is the number needed for full credibility. The square root is the key: credibility does not grow linearly with data. To go from a little data to a meaningful $Z$ takes a surprising amount of experience, and the curve is steep at first and flat later. An account with $n = 10$ claims against a full-credibility standard of $N = 1{,}000$ gets $Z = \sqrt{10/1000} = \sqrt{0.01} = 0.10$ — just 10% credibility, ninety percent class. That is the humbling arithmetic the naive average ignores entirely.

THE SQUARE-ROOT RULE — credibility grows slowly        [constructed teaching example]
  (full-credibility standard N = 1,000 claims)

  claims you have (n)      Z = sqrt(n/N)        weight on the risk's OWN experience
       1                      0.03                  3%      (essentially the class)
      10                      0.10                 10%
      40                      0.20                 20%
     100                      0.32                 32%
     250                      0.50                 50%      (half risk, half class)
     500                      0.71                 71%
   1,000                      1.00                100%      (full credibility)

  Quadrupling your data only DOUBLES your credibility (sqrt of 4 is 2). This is why a single account
  is almost never fully credible, and why one or two claims tell you almost nothing on their own.

📋 At the Desk The square-root rule is worth memorizing because it inoculates you against the most natural error in underwriting: over-reacting to a handful of losses. When an account has two big claims and you feel the urge to price as if it is a terrible risk, run the credibility number first. Two claims against any reasonable full-credibility standard is a single-digit $Z$ — which means the account's own experience deserves a small weight and the class deserves most of it. This does not mean ignore the two claims (they may carry qualitative signal the math misses — see §10.6 and the Underwriting File). It means: do not let a low-credibility sample stampede you into a price the data has not earned. Discipline here is the difference between an underwriter and a flincher.

The second tradition is Bühlmann (greatest-accuracy) credibility, the modern, statistically grounded version, which we glimpse in §10.7. It frames $Z$ as the answer to a precise question — how much of the variation you see is between risks (real, persistent signal) versus within a risk over time (noise) — and it connects credibility to Bayesian statistics and to the machine-learning idea of "shrinkage." For now, hold the intuition: credibility is high when the differences between risks are large and stable, and low when the year-to-year randomness within a risk swamps those differences. A line where good and bad risks are genuinely, persistently different (and where claims are frequent enough to reveal it) supports high credibility; a line dominated by rare, random large losses does not.

⚖️ Compliance Corner Credibility weighting is not just statistics — it is, in several lines, regulated and legally consequential. Workers'-comp experience rating (the X-mod, Chapter 22) is built on a credibility formula filed with and approved by the rating bureau and the state; an underwriter cannot simply invent the weight given to an employer's own losses. The credibility procedures embedded in filed rating plans are part of what makes pricing fair in the regulatory sense (Chapter 4): they ensure two similar employers are treated by the same rule, and that no employer is punished on the strength of a sample too small to mean anything. When you apply credibility inside a filed plan, you are applying the plan's credibility, not your own — and where you exercise genuine judgment is in schedule rating and risk selection (Chapter 11), not in overriding the filed credibility math.


10.6 Credibility weighting in practice: the blend of risk and class

Credibility theory earns its keep at the moment you actually combine the two estimates. The mechanics are a single, beautiful formula — the credibility-weighted estimate:

$$\text{Estimate} = Z \times (\text{the risk's own experience}) + (1 - Z) \times (\text{the class expectation})$$

In words: take the risk's own indicated number, multiply it by its credibility $Z$; take the class number, multiply it by the leftover weight $1 - Z$; add them. The result is your credibility-weighted estimate — the best blend of "what this risk's history says" and "what we know about its kind." When $Z$ is near 1, the risk's own experience dominates; when $Z$ is near 0, the class dominates; and the formula slides smoothly between them. This is the single most-used equation in experience rating, and you should be able to write it from memory.

Work it through on loss ratios, the form you will meet most often. Suppose an account's own loss ratio over the experience period (properly trended and developed, per §10.4) comes to 95%, while the class loss ratio for its kind of business runs 62%. The account looks bad on its own numbers — but it is a mid-size account, and you assess its credibility at $Z = 0.30$ (it has enough claims to matter, but nowhere near full credibility). The blend:

CREDIBILITY-WEIGHTED LOSS RATIO                        [constructed teaching example]

  the account's OWN loss ratio (trended, developed)  = 95%     (looks like a problem)
  the CLASS loss ratio for this business              = 62%     (the benchmark)
  assessed credibility of the account                 = Z = 0.30

  weighted = Z × own + (1 − Z) × class
           = 0.30 × 95%   +   0.70 × 62%
           = 28.5%        +   43.4%
           = 71.9%   ≈ 72%

  Read it: the account's bad years pull the estimate UP from the 62% class — but only partway,
  because at 30% credibility its own (small-sample) experience earns only 30% of the vote. The
  defensible expected loss ratio is ~72%, NOT the 95% the raw history screamed and NOT the 62%
  class either. That blend is the number you take into pricing.

This single calculation is the antidote to the two opposite errors that destroy books of business. The first error is over-reacting to a risk's own bad experience — pricing Harbor Steel as a 95% loss ratio account because of two fires, charging a punitive premium, and losing it to a competitor who saw (correctly) that two fires is low credibility and the account is closer to the class. The second error is ignoring a risk's own experience entirely — pricing every account at the 62% class and missing the genuine signal in an account that has had eight claims a year for three years running, which at higher credibility really is worse than its class. Credibility weighting holds both errors off at once: it listens to the account's own experience exactly as much as the experience deserves to be listened to, and not a percentage point more.

🤖 Model vs. Judgment Here is where the math and the judgment meet most sharply, and where the chapter connects to the book's spine. The credibility formula gives you a number — 72% — and that number is more disciplined than any gut feel. But $Z$ itself, and the question of whether the class benchmark even fits this risk, are judgment calls the formula cannot make for you. Worse, credibility math treats the risk's experience as a random sample of a fixed underlying risk — and sometimes it is not. Sometimes the two fires are not noise around a stable mean; they are a signal that the underlying risk changed — a new process, lax housekeeping, a hot-work hazard that is still live. The formula, blind to causation, would shrink that signal toward the class and under-price the danger. The underwriter who read the loss run (§10.1, and Chapter 9's loss-control read) might see that the second fire was hot-work-related and is exactly the kind of recurring hazard credibility-shrinkage would wrongly dampen. When the qualitative story says "this is a changed risk," you override the blend — upward, with documented reasons. That override, defensible to your committee, is the judgment the algorithm cannot supply. (The full model-override payoff is Chapter 32.)

The David Okafor file makes the same point from the personal-lines side, and it is worth the detour because it shows credibility working where there is no individual experience at all. Recall David Okafor from Chapter 6: 45 years old, applying for \$1 million of term life, with mildly elevated cholesterol, a BMI of 28, a father who had a heart attack at 58 — but excellent blood pressure, a non-smoker, an active cyclist. What is David's "own loss experience"? He has none — he has not died, and you only get to find out once. In life underwriting, the individual's credibility for the event being priced is essentially zero, so the class does almost all the work: mortality tables (Chapter 17) built from millions of lives, sliced by age, sex, smoking status, and build, supply the expected mortality, and the underwriter's whole job is to place David in the right class — preferred, standard, or somewhere between — rather than to weight his (nonexistent) personal claim history. This is credibility at the $Z \approx 0$ end of the dial, and it explains a deep truth: the less an individual risk's own experience can tell you, the more carefully you must classify it — because classification is how you borrow the credibility of the class. Personal lines lives almost entirely at this end (one household is nearly zero credibility, which is why your own clean driving record barely moves your premium); large commercial accounts climb toward the middle of the dial; and only the largest, most claim-rich risks approach full credibility on their own.

🔍 Check Your Understanding 1. An account's own trended/developed loss ratio is 110%; its class runs 70%; you assess $Z = 0.25$. What is the credibility-weighted loss ratio, and is the account's own bad experience earning a large or small share of the answer? 2. The credibility formula would shrink Harbor Steel's two-fire experience toward the class. Give one concrete reason an underwriter might deliberately price above the credibility-weighted number anyway — and explain why that is a judgment call the formula cannot make.


10.7 (Advanced) A glimpse of Bühlmann and the full-credibility standard

This last section is for the reader who wants to see one level deeper — the analytics-track and certification-track reader especially. Skip it and you still have everything you need to underwrite; read it and the credibility you applied in §10.6 will rest on firmer ground. We look at two things: where the full-credibility standard comes from, and what Bühlmann credibility adds.

The classical full-credibility standard — the $N$ in the square-root rule — is not arbitrary. It falls out of asking: how many claims do I need before the observed result is, with high probability, within a small tolerance of the true mean? That is a confidence-interval question. Pick a tolerance (say, you want to be 90% confident the observed pure premium is within 5% of the true one) and the mathematics of the Poisson frequency distribution hands you a required claim count. Tighten the tolerance — demand to be within 2.5% instead of 5% — and the required count rises sharply (it scales with the square of the precision you demand). This is why a specific number like "about 1,082 claims" shows up in textbooks for one common standard: it is the claim count that delivers a particular confidence and tolerance for a pure-premium estimate. The exact figure is less important than the shape of the idea: full credibility is the sample size at which random fluctuation has shrunk inside a tolerance you chose, and partial credibility (the square-root rule) is the interpolation down from there.

Classical credibility has a known weakness, though: it treats the full-credibility standard as a fixed cliff and the square-root rule as a rough bridge, without asking how different the risks in the class actually are from one another. Bühlmann credibility (also called greatest-accuracy or least-squares credibility) fixes this by deriving $Z$ from the structure of the variance itself. Its central formula is elegant:

$$Z = \frac{n}{n + K}, \qquad K = \frac{\text{Expected Process Variance}}{\text{Variance of Hypothetical Means}}$$

Unpack the two variances, because they carry the whole intuition. The Expected Process Variance (EPV) is the within-risk variance — how much a single risk's results bounce around year to year purely by chance, even though its underlying risk never changed. The Variance of Hypothetical Means (VHM) is the between-risk variance — how much the true underlying risk levels genuinely differ from one account to the next. Their ratio $K$ is the engine:

  • When risks are genuinely different from each other (large VHM) and each risk's results are stable (small EPV), $K$ is small, so $Z = n/(n+K)$ is high even for modest $n$ — the differences are real and worth pricing on. Listen to the risk's own experience.
  • When risks are basically alike (small VHM) and each one's results are noisy year to year (large EPV), $K$ is large, so $Z$ stays low — what looks like a "bad risk" is mostly random fluctuation around a common mean. Trust the class.
BÜHLMANN INTUITION — is the difference SIGNAL or NOISE?     [constructed teaching example]

  Z = n / (n + K),   K = EPV / VHM

   between-risk differences large,        →  K small   →  Z high   →  price on the risk's
   year-to-year noise small  (real signal)                            own experience

   between-risk differences small,        →  K large   →  Z low    →  shrink hard toward
   year-to-year noise large   (mostly noise)                          the class average

  Credibility is the formal answer to: "Is the difference I'm seeing between this risk and the
  class a real, persistent difference — or is it the random bounce of a small, skewed sample?"

This is the same idea modern data science calls shrinkage or partial pooling, and it is exactly what the Poisson-and-gamma GLMs of Chapter 32 are doing when they regularize an individual account's estimate toward the population. An underwriter does not need to compute EPV and VHM by hand — the actuaries embed them in the filed rating plans and the models. But understanding what $K$ measures tells you when to expect the math to give an account much credibility and when not to: claim-rich, genuinely heterogeneous lines (large commercial fleets, large workers'-comp accounts) support real own-experience credibility; thin, catastrophe-driven, homogeneous lines do not, and there the class — and your judgment about whether this risk truly belongs to it — must carry the price.

📋 At the Desk You will almost never derive a credibility factor at the desk. What you will do, constantly, is sanity-check one. When a rating plan or a model hands you a credibility weight, ask the Bühlmann question in plain English: "Are the risks in this class really different from each other, or do they mostly differ by luck?" If the line is full of rare, random large losses (catastrophe property, umbrella), be suspicious of any plan that gives one account high own-experience credibility — there usually is not enough signal to justify it. If the line has frequent claims and persistent differences between operators (commercial auto fleets, comp), high credibility on a large account is appropriate. Matching your expectation of credibility to the structure of the line is the mark of an underwriter who understands the math rather than merely obeying it.


🗂️ The Underwriting File

The math on Harbor Steel: is "two fires in five years" credible signal, or small-sample noise? You now have the tools to ask the question precisely, and the answer is more interesting than either a panic or a shrug. Recall the frozen loss history from the file: a roughly \$180,000 electrical fire in 2021 and a roughly \$1.2 million hot-work/welding fire in 2023 — two property fires in the five-year window — plus several workers'-comp claims and a couple of minor auto claims. The naive move is to add the fire losses (\~\$1.38M), divide by five years (\~\$276K/yr), and treat that as Harbor Steel's annual property loss cost. Do not do that. Run the discipline of this chapter instead.

Frequency and severity, kept apart (§10.1). Two fires is a frequency of about 0.4 property fires per year — low-count, lumpy, exactly the regime where one or two events tell you little. The severity picture is the skewed tail we warned about: one small fire (\$180K) and one large one (\$1.2M), and that single large loss dominates the raw average. So before any weighting, you already know this is mostly a severity story about one event, not a frequency story about a fire-prone shop — and severity stories are about controls, limits, and terms, not about punishing the frequency.

Credibility (§10.5–10.6). Two claims is, on any reasonable full-credibility standard, a single-digit credibility — the square-root rule puts the account's own fire experience in the low-$Z$ range, which means the class loss cost for metal fabrication deserves most of the weight. Credibility-weight Harbor Steel's own (large, scary) raw experience against the class, and the defensible expected number lands much closer to the class than to the \$276K naive average. The math says: do not over-react to two fires.

But — the override the math cannot make (the §10.6 Model-vs-Judgment point). Credibility shrinkage assumes the two fires are random noise around a fixed underlying risk. The loss-control read from Chapter 9 suggests they may not be: the 2023 fire was hot-work/welding-related, and hot-work is precisely the kind of live, recurring hazard that a blind shrink-to-the-class would wrongly dampen. So the file's honest position is two-sided and must be recorded as such: the frequency of fires is low-credibility and should not stampede the price (the math), but the hot-work severity signal is real and controllable and must be addressed through terms and subjectivities (the judgment). The disposition for this chapter: losses partially credible; the hot-work signal is the watch-item. What this layer does not settle: the actual rate (Chapter 11), the deductible and roof terms that handle the severity (Chapter 12), and the final accept/modify decision (Chapter 13). We have not priced Harbor Steel — we have established how much its own history is allowed to move the price, and flagged the one part of that history the credibility math would underweight.


Conclusion

Underwriting is not actuarial science, but it runs on the same arithmetic, and this chapter gave you the working core of it. Expected loss is frequency times severity, and each is a distribution — frequency low-count and lumpy, severity skewed by a long right tail — so you read a loss run by keeping the two apart and respecting the tail. The loss ratio, computed honestly as incurred losses over earned premium, is the truth-telling number, and the permissible loss ratio is the bridge to pricing. The pure premium (loss cost) is the expected-loss core of any rate, built per exposure and loaded in Chapter 11. Trend and development restate old, immature losses to the period you are actually pricing, and skipping them is how soft markets manufacture future losses. And credibility — the weight $Z$ you give a risk's own experience versus its class — is the discipline that keeps a handful of claims from stampeding your price; the square-root rule and the Bühlmann variance ratio both say the same thing, that a small, skewed sample earns only a small share of the vote.

Above all, the chapter advanced two of the book's themes at once. Pricing follows risk — but only after you have trended, developed, and credibility-weighted the experience into a number you can defend, because a price built on raw or low-credibility data is a price that follows noise, not risk. And underwriting is judgment — because credibility math, for all its discipline, is blind to causation: it cannot tell a random clean stretch from a genuinely changed risk, and the underwriter who reads the hot-work signal in Harbor Steel's loss run sees exactly what the formula would wrongly shrink away. The math tells you how much the data has earned the right to say. Judgment decides the rest.

In Chapter 11 we take everything you have built here — the pure premium, the loss ratio, the credibility-weighted experience — and turn it into an actual price: manual rates and relativities, experience rating and schedule rating, the loads that sit on top of the pure premium, and the rate adequacy that survives a soft market. The credibility-weighted expected loss is the raw material. Pricing is where it becomes a premium.


Key Terms

  • Pure premium — the expected loss per unit of exposure (frequency × severity, or total losses ÷ exposures); the expected-loss core of a rate, before expenses and profit.
  • Loss cost — the rating-bureau term for the pure premium: the expected loss per exposure unit, published for a class before any insurer loads.
  • Credibility — the weight $Z$ (from 0 to 1) assigned to a body of loss experience when using it to predict the future; how much you trust this experience versus a broader benchmark.
  • Full credibility — $Z = 1$; the experience is voluminous enough to be used on its own without blending toward the class.
  • Partial credibility — $0 < Z < 1$; the experience is blended with the class because it is too thin to stand alone.
  • Credibility weighting — combining a risk's own experience with the class expectation as $Z \times \text{own} + (1-Z) \times \text{class}$.
  • Frequency distribution — the probabilities of a risk producing 0, 1, 2, … losses in a period; typically low-count and lumpy (often modeled as Poisson).
  • Severity distribution — the probabilities of various loss amounts given that a loss occurs; typically skewed with a long right tail, so the mean exceeds the median.
  • Expected loss — the average loss over many similar exposures or periods; expected frequency × expected severity.
  • Trend and development — the twin adjustments that restate historical losses for change over time (trend) and for immaturity as claims settle and late ones report (development) so past losses estimate future cost.

Spaced Review

  1. Decompose expected loss into its two factors and explain why an underwriter keeps them separate rather than working from total losses divided by years. Give one control that helps a frequency problem and one that helps a severity problem. (§10.1; frequency × severity from §6.3)
  2. Why does a loss ratio computed on written premium and paid losses flatter a fast-growing, long-tail book — and which form of the ratio tells the truth? (§10.2; loss ratio owned by §3.5)
  3. An account has two large claims in three years. Using credibility, explain why this does not by itself justify a punitive price — and then name the one circumstance in which an underwriter should override the credibility math and price higher anyway. (§10.5–10.6; the loss-control read from §9.5)
  4. (Recall — adverse selection, §1.4.) If an insurer used an exposure base that did not track expected loss, how would adverse selection exploit the resulting rate? (§10.3; §1.4)
  5. (The recurring pricing-discipline question.) You credibility-weight an account to an expected loss ratio of 72% and your insurer's permissible loss ratio is 65%. If you quote at the credibility-weighted expectation with no other change, would the account help or hurt the combined ratio, and why? (§10.2; combined ratio from §3.5)