38 min read

> "Data are not taken for museum purposes; they are taken as a basis for doing something."

Prerequisites

  • 1
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 20

Learning Objectives

  • Explain how data and information technology are changing the underwriting workflow, and state precisely what changes and what stays the same.
  • Identify the major alternative data sources — satellite and aerial imagery, IoT and telematics, public records, and third-party aggregators — and judge what each can and cannot reliably tell an underwriter.
  • Describe how pre-fill auto-populates a submission, and evaluate the accuracy, adverse-selection, and verification risks it introduces.
  • Explain real-time risk scoring and the modern underwriting workstation, and read a score as one input to a decision rather than the decision itself.
  • Distinguish the risks that straight-through processing should bind from the ones it must refer, and articulate the referral logic that protects the book.
  • Diagnose a data-quality failure — the garbage-in, garbage-out problem — and state the controls that keep a data-driven decision honest.

Chapter 31: Data-Driven Underwriting: How Information Technology Is Transforming Risk Assessment

"Data are not taken for museum purposes; they are taken as a basis for doing something." — W. Edwards Deming, statistician and quality-management pioneer. On an underwriting desk the line is a standing rebuke to a temptation the rest of this chapter will keep naming: the temptation to collect a field because the system can fetch it, score it because the model will accept it, and act on it because it is there — rather than because it actually bears on the decision you are being paid to make.

Overview

The submission you are about to open is not blank. Twenty years ago, an application arrived as a fax or a PDF with thirty fields, half of them left empty by a hurried producer, and your first hour was spent chasing the answers. Today the file opens already filled in. Before a human touched it, software pulled the building's age and square footage from a county assessor's record, a roof-condition flag from an aerial image, a flood-zone code from a peril database, the firm's years in business and a public-records lien check from a data aggregator, and a prior-loss indicator from an industry clearinghouse — and a model read all of it and returned a score. Your screen is no longer a form to be completed; it is a risk picture to be read, questioned, and corroborated. That shift — from gathering information to evaluating information someone else gathered — is the change this chapter is about, and it is the largest change in the day-to-day work of underwriting in a generation.

Be clear about what has and has not changed. What has changed is the speed and the surface area: more data, arriving faster, pre-attached, scored before you see it. What has not changed is the thing this whole book is about — the judgment that decides whether to accept a risk and on what terms. A pre-filled roof age is still only as good as the image it came from; a peril score still cannot read a broker's cover note about a signed replacement contract; a clearinghouse loss flag still cannot tell you whether the manager who caused those losses is still there. Data has made the underwriter faster and, on simple risks, nearly invisible. On the risks that matter, it has raised the stakes of judgment, because now you are judging not just the risk but the data about the risk — and a confident, wrong number is more dangerous than an honest blank.

This chapter walks the new workflow end to end. We survey the data revolution and the alternative sources feeding it. We look at pre-fill — how the submission auto-populates, and what it gets wrong. We sit at the modern underwriting workstation and watch a risk get scored in real time. We revisit straight-through processing from Chapter 20, now from the data side, and draw the line between what the machine should bind and what it must hand to you. And we close on the discipline that decides whether any of it is worth anything: data quality, because a data-driven decision built on bad data is just a faster way to be wrong.

In this chapter, you will learn to:

  • Explain how data and IT are reshaping the underwriting workflow, and name what changes and what does not.
  • Identify the major alternative data sources and judge what each can and cannot tell you.
  • Describe pre-fill (data enrichment) and evaluate its accuracy and verification risks.
  • Read a real-time risk score at the underwriting workstation as one input, not the verdict.
  • Draw the line between the risks straight-through processing should bind and the ones it must refer.
  • Diagnose a data-quality failure and state the controls that keep a data-driven decision honest.

Learning Paths

🏠 Personal Lines: This is your world first — pre-fill and instant scoring are most mature in personal auto and homeowners (§31.3, §31.4). Watch how the same enrichment that speeds a clean home risk can import a stale or mismatched record and quietly misprice it. 🏢 Commercial Lines: Pre-fill and STP arrived later and reach less deeply here (§31.5, §31.6); the Harbor Steel file shows why a complex commercial risk is enriched by data but still decided by an underwriter. The satellite-roof corroboration is the beat to study. 📊 Analytics: §31.2 and §31.7 are the core — the data supply chain and the data-quality problem are the ground truth your models in Chapter 32 stand on. "Garbage in, garbage out" is not a slogan here; it is the single biggest threat to a pricing model's validity. 📜 Certification: §31.3–§31.5 map to the automation, data-sources, and workflow material in the AINS/AU bodies of knowledge; §31.7 connects to data-governance and FCRA-adjacent compliance themes.


31.1 The data revolution in underwriting

Start with the decision, because the data only matters in service of it. An underwriter is paid to answer the same question they have always answered — should we write this risk, and at what price and on what terms? (Chapter 7). For most of insurance history, the binding constraint on that decision was not enough information. You wrote the risk on a thin application, a loss run if you were lucky, and an inspection that arrived three weeks after you had already bound. Underwriting was, in large part, a discipline of deciding well under information scarcity. The data revolution has inverted that constraint. The modern underwriter's problem is rarely too little information; it is too much information of uncertain quality, arriving too fast to read it all by hand. The craft has shifted accordingly — from gathering to filtering, from interrogating the applicant to interrogating the data.

Three forces drove the shift, and it is worth naming them because they tell you where it goes next. First, data became abundant and cheap. County records went digital and queryable; satellites and aircraft began photographing every building in the country on a regular cadence; sensors got cheap enough to put in cars, buildings, and machinery; and an industry of data aggregators grew up to package all of it and sell it to carriers by the query. Second, computing got cheap enough to use it in real time. The same risk that once took an analyst a day to assemble can now be assembled, scored, and priced in the seconds between a producer clicking "quote" and the screen refreshing. Third — and this is the part Chapter 32 takes up — the models got good enough to turn that data into a price automatically, at least for the risks that are simple and high-volume enough to be modeled well.

The result is a workflow that looks nothing like the one in the textbooks of a generation ago, and yet serves exactly the same purpose. Picture the two side by side:

THE UNDERWRITING WORKFLOW — THEN AND NOW                          [constructed teaching example]

  THEN (information-scarce)                    NOW (information-abundant)
  ──────────────────────                       ──────────────────────────
  producer keys a thin application       →     producer enters a name + address
  underwriter requests loss runs         →     system PRE-FILLS from third-party data
  underwriter waits for an inspection    →     aerial image + peril scores attach instantly
  underwriter assembles the risk picture →     a MODEL scores the risk before a human sees it
  underwriter decides (days/weeks)       →     simple risk: machine BINDS (seconds)
                                               complex risk: REFERS to an underwriter
  the gating constraint: TOO LITTLE DATA       the gating constraint: TOO MUCH DATA, UNCERTAIN QUALITY

Read the diagram for what it does not change. In both columns there is still a selection decision, still a price that must be adequate, still terms to be set, still a combined ratio that will tell the truth two years later. The data revolution changed the inputs and the speed of underwriting; it did not repeal the job. This is the first theme of the book — underwriting is judgment — restated for the data age: when the information was scarce, judgment filled the gaps; now that the information is abundant, judgment decides which of it to believe. The skill migrated; it did not disappear.

📋 At the Desk A useful way to hold the change: data has automated the assembly of the risk picture, not the reading of it. The hour you used to spend chasing the building's square footage is gone — genuinely gone, and good riddance. But that hour did not vanish into leisure; it moved downstream, into the harder work of asking whether the square footage the system fetched is this building's or the one next door's, whether the aerial image is current, and whether the loss flag the clearinghouse returned is the whole story. You are not doing less work; you are doing higher-judgment work on a faster clock. The underwriters who struggle in this transition are the ones who treat the assembled picture as finished. The ones who thrive treat it as a first draft written by a machine — fast, useful, and to be checked.

There is a quieter consequence worth flagging now and returning to in §31.6. As routine assembly automates, the mix of what lands on a human underwriter's desk changes. The clean, simple risks — the ones the data describes well and the model prices confidently — increasingly never reach a person at all; they are bound straight through. What reaches you is the residue: the risks the data describes poorly, the ones the model flags as uncertain, the novel exposures with no history, the accounts where the data and the broker's story disagree. That is not a demotion. It is a concentration. The human underwriter of the data age spends a larger share of their time on exactly the hard, judgment-dependent risks where they add the most value — and a smaller share rubber-stamping easy ones. The job gets harder per file even as the easy files disappear, which is precisely why the judgment this book teaches becomes more valuable, not less.


31.2 Alternative data sources: satellite, IoT, public records, telematics

The fuel of the revolution is data the underwriter did not collect and the applicant did not volunteer. The industry calls these alternative data sources — third-party information, often gathered for some other purpose entirely, that an underwriter can pull in to assess a risk. The traditional sources from Chapter 8 (the application, the loss run, the inspection, the MVR, CLUE, the credit-based insurance score) have not gone away; the alternative sources sit on top of them, enriching the picture. Four families matter most, and the discipline is the same for each: know what it can tell you, and know what it cannot.

Satellite and aerial imagery. Aircraft and satellites now photograph buildings across the country on a regular cadence, and computer-vision models read those images for underwriting-relevant features: roof material and shape, roof condition and apparent age, the presence of a pool or a trampoline (liability flags for a homeowners risk), tree overhang, the footprint and apparent square footage of a commercial building, debris or disrepair in the yard, distance to the nearest exposure. For a property underwriter this is genuinely transformative: a question that used to require a physical inspection — what condition is the roof in? — can now be answered, at least provisionally, from an image taken last quarter. What imagery cannot do is see inside, see intent, or read time. It can flag a roof that looks old; it cannot tell you the wiring behind the wall, whether a replacement contract is already signed, or whether the disrepair it sees was fixed the week after the photo was taken. Imagery is a powerful exterior, point-in-time signal, and it must be read as exactly that.

Internet of Things (IoT) sensors and telematics. Cheap connected sensors now report directly from the insured risk. In personal and commercial auto, telematics — the usage-based insurance data introduced in Chapter 14 — streams mileage, hard-braking, speed, time-of-day, and cornering from a device or an app. In property, water-leak sensors, temperature and smoke monitors, and (in commercial) connected fire-suppression and equipment-condition sensors can report conditions and even prevent losses. IoT is the richest data of all because it observes actual behavior and condition over time rather than a snapshot or a proxy. Its limits are practical: coverage is uneven (not every risk is sensored), the data raises real privacy questions (Chapter 8's FCRA and state-privacy themes return here, sharpened), and a sensor reports only what it measures — a pristine telematics score tells you nothing about the one trip the driver took in a borrowed car.

Public records. The least glamorous and most reliable family. County assessor and recorder data (building age, square footage, sale history, owner of record), business registrations and years in business, court records, liens and judgments, building permits, and code-violation histories are increasingly digital and queryable. For commercial risks especially, public records are the backbone of pre-fill: they answer "what is this building and who owns it" cheaply and fairly reliably. Their limit is staleness and granularity — an assessor's record may be years out of date, may describe a renovation that never happened or miss one that did, and may attach to a parcel rather than to the specific structure you are insuring.

Third-party aggregators and scores. Around all of the above has grown an industry that packages data — combining public records, imagery-derived features, prior-loss indicators from industry clearinghouses, firmographic and demographic data, and derived scores — and sells it to carriers as a single enriched feed. This is the connective tissue of pre-fill (§31.3). The aggregator's value is convenience and breadth; its risk is that you are now relying on a vendor's matching, currency, and accuracy, often without seeing the underlying source — and when two aggregators disagree about the same building, as they sometimes do, you are reminded that a data feed is a claim, not a fact.

Source family What it tells you well What it cannot tell you Key limit to watch
Satellite / aerial imagery Roof condition & shape, footprint, yard hazards, exterior exposures Interior, wiring, intent, whether a flagged defect was since fixed Exterior, point-in-time only; image may be stale
IoT / telematics Actual behavior & condition over time (driving, leaks, equipment) Behavior it doesn't observe; context of an anomaly Uneven coverage; privacy; measures only what it senses
Public records Building age, size, ownership, permits, liens, years in business Current condition; whether the record matches this structure Staleness; parcel-vs-structure mismatch
Third-party aggregators A broad enriched feed assembled from many sources The provenance behind each field; certainty You inherit the vendor's matching & currency errors

⚖️ Compliance Corner Alternative data does not get a free pass on the rules you learned in Chapters 4 and 8. The Fair Credit Reporting Act (FCRA) governs information used to make insurance eligibility and pricing decisions when it comes from a consumer reporting agency — which can include some third-party data feeds — bringing notice, accuracy, and dispute obligations along with it. State law adds restrictions on what factors may be used at all, and the line against unfair discrimination (Chapter 4) applies no matter how the data arrived: a data source that turns out to be a proxy for a protected class is not laundered clean by being "alternative" or "algorithmic" — a problem Chapter 35 takes up in full. The disciplined rule is simple to state and hard to live: if you could not use a fact when a human collected it, you cannot use it because a vendor collected it for you. New data sources expand what you can see faster than the law clarifies what you may use, and the gap is the underwriter's and the carrier's responsibility to manage, not the vendor's.

🔍 Check Your Understanding 1. An aerial image flags Harbor Steel's roof as "poor condition, likely end-of-life." Name one thing this confirms about the file you already had, and two things the image cannot tell you that you still need. 2. A personal-auto applicant has a near-perfect telematics score from six months of app data. Why is this strong evidence and yet not proof of a low-risk driver? Which word in "alternative data" is doing the work?


31.3 Pre-fill: auto-populating the submission

Now the single most visible change to the daily workflow. Pre-fill — also called data enrichment — is the automatic population of a submission's fields from third-party data sources, so that the producer or applicant enters as little as a name and an address and the system fills in the rest. Type "Harbor Steel & Fabrication, 1400 Dockside Road, Port Hadley" and, in the seconds that follow, the workstation populates the year built (1994), the square footage (≈50,000), the construction type, the roof characteristics from the latest aerial image, the flood-zone and named-windstorm codes from a peril database, the firm's years in operation and entity type from business records, a prior-loss indicator from an industry clearinghouse, and a dozen fields more. A submission that used to take a producer twenty minutes to complete — badly — now arrives nearly complete in twenty seconds.

The benefits are real and worth stating plainly, because the rest of this section is going to qualify them. Pre-fill makes submissions faster, which the producer loves and which wins business. It makes them more complete, filling fields a hurried human would have skipped. And, counter-intuitively, it can make them more honest: a producer can no longer "forget" to mention a prior loss the clearinghouse already knows about, or shave a few years off a building's age, because the system fills those fields from independent data rather than from the applicant's self-report. Pre-fill is, among other things, a tool against the adverse selection of Chapter 1 — it narrows the information gap between what the applicant knows and what the underwriter knows, which is the gap adverse selection exploits.

But pre-fill imports the limits of its sources along with their content, and three failure modes recur.

PRE-FILL — THREE WAYS IT GOES WRONG                              [constructed teaching example]

  1. WRONG MATCH        The address resolves to the parcel next door, or to a prior
                        tenant; the file is now full of confident facts about the
                        wrong building.

  2. STALE DATA         The assessor's record predates a renovation; the aerial image
                        predates a new roof; the "30-year-old roof" was replaced last
                        spring. The data is internally consistent and out of date.

  3. FALSE PRECISION    A square footage given to the square foot, a roof age to the
                        year — numbers that LOOK measured but are estimated or derived,
                        and carry an uncertainty the field does not show.

Each of these is dangerous in the same specific way: pre-filled data arrives wearing the costume of verified fact. A field that a human filled in by hand carries an implicit "the applicant says"; a field the system pre-filled carries an implicit "the data shows" — and the second feels more authoritative even when it is less reliable. The mismatch matters most when pre-fill feeds a model (Chapter 32) that prices the risk automatically: the model cannot tell a measured square footage from an estimated one, a current roof image from a three-year-old one, or a correct address match from a wrong one. It prices whatever it is fed, confidently. Garbage in, garbage out — the subject of §31.7 — begins right here, at pre-fill, which is why the best shops treat pre-filled fields not as answers but as hypotheses to be confirmed on anything that drives the price.

⚠️ Underwriting Trap The trap is automation bias on a pre-filled field — trusting a number more because a machine supplied it than you would if a person had. A producer who typed "roof age: 5 years" on a 30-year-old building invites your skepticism; the same false figure, pre-filled from a mismatched record, slides through because "the system pulled it." The disciplined move on any account where the field is material is the one-line verification: does the pre-filled roof age square with the aerial image, the inspection, and the broker's note? On Harbor Steel, the pre-filled roof read and the aerial image agree with the loss runs and the submission — and that agreement is itself the finding, because corroboration from independent sources is worth more than any single source alone. When the sources disagree, that is not noise to be averaged away; it is a flag to be chased.

📄 Read the Submission

text FIGURE 31.1 — "The submission that filled itself in" [the Underwriting File] THE SUBMISSION Harbor Steel & Fabrication: the Meridian commercial-package submission, opened in the workstation. Producer entered name + address; the system pre-filled the rest. THE CONTEXT Pre-fill returned: year built 1994; ≈50,000 sq ft; joisted-masonry/metal frame; roof flagged "poor / likely end-of-life" from Q-prior aerial imagery; FPC 4; named-windstorm + surge-adjacent peril codes; ~30 yrs in business; a clearinghouse prior-loss indicator (two property losses in 5 yrs). A real-time model scored the risk and returned a number. WHAT IT SHOWS The pre-fill CORROBORATES the manual read: the satellite roof flag matches the loss runs and the broker's note; the peril codes match the known coastal exposure. Independent data agreeing with the file raises confidence in the picture. WHAT IT DOESN'T It does not confirm the roof was NOT already replaced (image is a quarter old); it does not see the hot-work controls, the new plant manager, or the signed roof contract the broker references; and a single prior-loss flag is a count, not a story. THE DECISION Treat the pre-fill as a verified-enough RISK PICTURE for the data-driven fields, and as a set of HYPOTHESES on the price-driving ones (roof, losses) to confirm at inspection. The model's score is logged as an input; the disposition is unchanged from Chapter 30. THE LESSON Pre-fill's best gift is not speed but CORROBORATION — independent data that agrees with the file. Its worst trap is false confidence in data that merely looks measured.


31.4 Real-time risk scoring and the underwriting workstation

The data does not sit in a folder anymore; it lives inside a tool. The underwriting workstation is the integrated software environment where the modern underwriter works: it pulls the submission, runs the pre-fill, calls the data sources, displays the assembled risk picture, runs the pricing and scoring models, surfaces the guidelines and referral rules, and records the decision and its rationale — all in one place, in real time. The fax-and-folder underwriter assembled a risk picture by hand from a dozen separate inputs over days; the workstation assembles it automatically in seconds and presents it as a single screen. It is the cockpit of data-driven underwriting, and learning to read it well is now a core skill of the job.

At the center of that screen, increasingly, is a number: real-time risk scoring — the automatic generation of a risk score, the moment a submission arrives, by a model reading the enriched data (the model itself is Chapter 32's subject; here we care about the score as an object on the desk). The score might be a 1-to-10 risk grade, a predicted loss ratio, a green/yellow/red triage flag, or a recommended action (bind / refer / decline). It arrives before the underwriter has read anything, and that ordering is the whole psychological problem. When the first thing you see is a 7-out-of-10-decline, every subsequent fact you read is colored by it; you are no longer forming a judgment so much as looking for reasons to confirm or overturn a number someone else's model already produced. The score is enormously useful — it triages a flood of submissions, flags the risks that need attention, and prices the easy ones — and it is enormously easy to over-trust.

Hold the right frame: the score is an input, not the decision. It is a fast, data-driven opinion about the risk, formed from the fields the model could see, and it deserves exactly the weight that its inputs and its track record earn — no more. The underwriter's job is not to ignore the score (that wastes a genuinely powerful triage tool) and not to defer to it (that surrenders the judgment that is the reason the job exists), but to read it as one voice in the room: a confident, well-informed voice that has, crucially, never read the broker's cover note, never seen the signed roof contract, and cannot tell a real fact from a mismatched pre-fill. The score knows things you might miss; you know things the score literally cannot see.

🤖 Model vs. Judgment Here is the central tension of this entire part of the book, in its mildest and most everyday form. The workstation scores Harbor Steel and returns a number leaning toward decline — built from the construction, the coastal peril codes, the roof flag, and the two-loss clearinghouse indicator. Every one of those inputs is real and adverse; the score is not wrong about the facts it can see. What it cannot see is the context that changes their meaning: that both fires trace to causes a hot-work program and an electrical scan address, that the broker has attached a signed roof-replacement contract, that the loss history is a story about a problem being fixed. In this chapter the score is simply logged as an input — corroborated by the satellite roof read, consistent with the manual assessment, and carried forward. The actual override — the underwriter writing at a 6 the risk the model scores a 7, with documented reasons — is the work of Chapter 32, where we open the model itself and earn the right to disagree with it. The lesson here is narrower and prior: know that the number arrived first, and refuse to let its arriving first make it the answer.

How does a disciplined underwriter actually use the score? Three habits separate the professionals from the button-pushers. First, read the file before you over-weight the score — or at least read it as if the score did not exist, then compare; the score is a check on your read, not a replacement for it. Second, ask what the score could not see: the relationship facts, the in-flight corrective actions, the broker's intelligence, the one-off context. Those are precisely the things that, on a complex risk, most change the decision — and precisely the things the model lacks. Third, document the divergence: when your judgment and the score agree, the file is easy; when they diverge, the file must record why — what you saw that the model did not — because that record is your defense to the broker, the auditor, and the underwriting committee, and because "the system said so" is not a defense an underwriter can give (Chapter 13's documentation discipline, now load-bearing). We build the full override apparatus in Chapter 32; the habit of writing down the why starts now.


31.5 Straight-through processing revisited (the automation frontier)

You met straight-through processing (STP) in Chapter 20: the binding of a policy end-to-end with no human underwriter, when the risk is simple enough and the data clean enough for the system to decide. There we saw it from the small-commercial side. Here we see it from the data side, because data is what makes STP possible — and what determines where its frontier honestly lies.

The logic is a direct application of everything in this chapter. If pre-fill (§31.3) can assemble a clean risk picture automatically, and a model (Chapter 32) can score and price it confidently, and the risk falls squarely inside a well-understood, high-volume class, then there is nothing left for a human to add, and the machine should simply bind it. A personal-auto renewal, a small BOP for a low-hazard class, a workers'-comp policy for a clean small office — these are STP's home turf, where automation writes faster, cheaper, and more consistently than a human, and where insisting on human review would add cost and delay without adding judgment. For these risks, theme five of the book — technology augments underwriters; it does not replace them — has a blunt corollary: for the simplest risks, technology genuinely does replace the underwriter, and that is fine, because there was no judgment to exercise. The underwriter's value was never in stamping easy files.

The whole craft of automated underwriting, then, is not in what STP binds but in where it draws the line — the referral logic that decides which risks the machine keeps and which it hands to a human. Draw the line too tight and you refer everything, defeating the purpose and drowning your underwriters in easy files. Draw it too loose and the machine binds risks it should not — the complex, the novel, the data-poor, the ones where the score is confident but wrong — and the losses arrive on schedule. The art is in the referral rules, and they are a direct map of the limits of the data and the model:

STRAIGHT-THROUGH PROCESSING — THE REFERRAL DECISION             [constructed teaching example]

  submission arrives → pre-fill → real-time score
                         │
            ┌────────────┴────────────┐
       data CLEAN & risk SIMPLE?   data MESSY or risk COMPLEX/NOVEL?
            │                            │
       in a known, well-modeled     missing/ conflicting data, large
       class, inside appetite,      limits, catastrophe exposure, thin
       limits modest, no flags      history, low model confidence, or a
            │                       guideline/ appetite flag
            ▼                            ▼
       ┌──────────┐                 ┌──────────────────────┐
       │  BIND    │                 │  REFER to underwriter │
       │ (STP)    │                 │  (judgment required)  │
       └──────────┘                 └──────────────────────┘

  The line is drawn by the REFERRAL RULES — and a good referral rule is just an
  honest statement of where the data and the model stop being trustworthy.

📋 At the Desk When you help tune a referral grid — and as you grow, you will — build the rules around the failure modes of automation, not around premium size alone. Refer on: missing or internally conflicting pre-fill (the system can't see the building clearly); large limits or catastrophe exposure (one wrong bind is existential); thin or novel exposure (the model has no pool to learn from — Chapter 1's law of large numbers, failing for lack of data); low model confidence (the model itself is telling you it's unsure); and any guideline or appetite flag. Notice that these are the same conditions that make a risk judgment-heavy in the first place. A well-built STP system is one that bound exactly the risks where judgment added nothing and referred exactly the ones where it adds the most. Harbor Steel — a large-limit, catastrophe-exposed, loss-flagged, multi-line commercial account — trips half of these triggers at once, which is exactly why no sane system would straight-through-bind it, and exactly why it is the running file for a book about judgment.

The honest limit of STP, then, is the honest limit of the data and the model, and §31.6 makes that limit explicit. But note the trajectory, because it is the shape of the profession's future (Chapter 36): the STP frontier keeps advancing. Risks that needed a human a decade ago — mid-size BOPs, some homeowners, simpler commercial auto — are increasingly bound straight through as the data improves and the models mature. The frontier moves up, toward more complex risks, year after year. It has not reached, and on the argument of this book will not soon reach, the Harbor Steels of the world — the large, novel, relationship-dependent, catastrophe-exposed accounts where judgment is the product. The underwriter's career bet is to stay on the right side of that advancing line, which means continually moving toward the harder judgment the machine cannot yet do.


31.6 When automation helps and when judgment is irreplaceable

We can now state the principle the whole chapter has been circling, because it is the principle that should govern every decision about whether to let the data and the model decide. Automation's advantage and its limit come from the same source: a model and a data feed see only what can be turned into a field. Where a risk is fully describable in fields the data captures well — a clean home, a small office, a standard auto — automation wins, and wins decisively: it is faster, cheaper, and more consistent than a human, who gets tired, gets talked into things, and decides the same risk differently on a Friday than on a Monday. But where the risk turns on things that resist becoming fields — context, relationship, intent, a corrective action in flight, a novel exposure with no history — judgment is irreplaceable, because there is nothing for the model to read.

Sort the work honestly, and the boundary is clear:

Automation tends to win when… Human judgment is irreplaceable when…
The risk is simple and standard (well inside a known class) The risk is complex, novel, or one-of-a-kind (no pool to learn from)
The data is clean, complete, and well-captured The data is thin, conflicting, or doesn't capture what matters
The exposure is high-volume, so the model has a large pool The exposure is low-frequency / high-severity, where one bind is existential
Consistency is the goal (treat like risks alike) Context is the goal (this risk differs from its class in ways fields miss)
Limits are modest and a wrong bind is survivable Limits are large and a wrong bind threatens the book
The decision turns on measured facts The decision turns on relationship, intent, or a story

The two columns are not a hierarchy; they are a division of labor, and the future the book keeps pointing to (Chapter 36) is the underwriter who works both. For the left column, you trust the machine, and you spend your scarce judgment elsewhere — over-riding it on the easy risks would be a waste, and worse, would introduce the very inconsistency automation exists to remove. For the right column, you take the controls, and you use the machine's output as one input among several. The error in both directions is real: the shop that hand-underwrites its simple BOPs is burning money and introducing noise; the shop that straight-through-binds its Harbor Steels is one storm or one mispriced cat-account from a very bad year. The discipline is to know which column the risk in front of you is in — and to notice when a risk that looks like the left column (a routine-seeming account) actually belongs in the right (because the data is wrong, or the exposure is novel, or the broker just told you something the fields don't show).

🔍 Check Your Understanding 1. Give one risk that automation should clearly bind straight through, and one that judgment must clearly own — and name the single feature that puts each on its side of the line. 2. Why does the chapter say automation can be more consistent than a human underwriter — and why is that consistency a genuine advantage on simple risks but a danger on complex ones?

This is also where the combined ratio (Chapter 3) re-enters, because automation's payoff is finally a combined-ratio payoff or it is nothing. Done right, automation lowers the expense ratio — fewer human hours per policy — on exactly the high-volume risks where expense efficiency decides the result, while keeping the loss ratio honest by referring the risks that need a human to price them well. Done wrong, it trades a lower expense ratio for a higher loss ratio: the machine binds cheap and binds badly, and the expense savings are swamped by the losses the referred-out judgment would have prevented. The combined ratio tells the truth about both effects, and it is the number against which any automation initiative must finally be judged. "We automated underwriting and cut costs" is only good news if the combined ratio fell, and the losses from the badly-bound risks do not show up for two or three years — the lag that has fooled many a quarterly-minded executive, and that the disciplined underwriter never forgets.


31.7 The data-quality problem (garbage in, garbage out)

We end where every data-driven decision is finally decided: on the quality of the data. Every benefit in this chapter — the speed, the corroboration, the scoring, the automation — rests on an assumption the chapter has been quietly stress-testing throughout: that the data is good. When it is not, none of the machinery helps; it merely industrializes the error. Garbage in, garbage out is the oldest law of computing, and it is the single greatest threat to data-driven underwriting, because the data age replaces the honest blank — a field a human would have known to chase — with a confident wrong number that a model will price and a system will bind, at scale, in seconds.

Data quality has several dimensions, and an underwriter should be able to name them because each fails differently:

  • Accuracy — is the value correct? (Is the roof really 30 years old, or did a mismatched record say so?)
  • Currency — is it up to date? (Was the aerial image taken before or after the new roof went on?)
  • Completeness — is anything missing? (Did pre-fill silently leave a price-driving field blank, or fill it with a class default?)
  • Consistency — do the sources agree? (Does the assessor's square footage match the image's footprint match the application?)
  • Provenance — do you know where it came from? (Is this a measured value, an estimate, or a vendor's derived guess wearing the costume of a fact?)
  • Relevance — does it actually bear on this decision, or was it pulled because the system could?

The reason data quality is more dangerous in the data age, not less, is the loss of the human friction that used to catch errors. In the fax-and-folder world, a wrong roof age sat in front of an underwriter who knew the building from the inspection and caught it. In the pre-fill-and-score world, the wrong roof age flows straight into a model that prices it and a system that may bind it, with no human in the loop to say "that's not right." Automation does not create bad data; it removes the friction that used to catch it and then acts on it faster. A bad data point that once produced one mispriced policy can now, fed through an automated pipeline, misprice an entire class of business before anyone notices — the modern, industrialized version of a small error.

⚠️ Underwriting Trap The most expensive data-quality failure is the silent default. When a price-driving field is missing, a poorly designed pipeline does not stop and ask; it fills the blank with a class average, an optimistic assumption, or a zero — and then prices the risk as if that guess were a fact. The submission looks complete; every field is populated; the model is confident; nothing flags. And the carrier has just priced a risk on a number nobody actually knows. The disciplined controls are unglamorous and non-negotiable: make missingness visible (a blank field must look different from a filled one, never get silently defaulted); require corroboration on the fields that drive the price (two independent sources agreeing, as Harbor Steel's roof read and aerial image do, is the gold standard); set confidence thresholds that refer a risk to a human when the data is too thin to trust; and audit the data feed itself, not just the decisions, because a vendor's match-rate or currency can degrade quietly and poison every downstream price at once.

The governance implication is the chapter's last and most durable lesson, and it lands squarely on the underwriter and the carrier, not the vendor. A carrier that buys a data feed and wires it into automated pricing has outsourced the data but kept the risk: if the feed is wrong, it is the carrier's losses, the carrier's mispriced book, the carrier's regulator. So the data-driven carrier needs the same disciplines a careful underwriter has always had, scaled up to the data pipeline — verification of what matters, healthy skepticism of a confident number, corroboration from independent sources, and a clear-eyed sense of what the data does and does not establish. The technology changed the form of the discipline; it did not relax the need for it. Indeed it raised it, because the cost of a lapse now compounds at machine speed. Garbage in, garbage out is not a reason to distrust data-driven underwriting; it is the reason data-driven underwriting needs underwriters more than ever — to be the judgment that asks, of the confident number on the screen, the oldest question in the trade: says who, and how do they know?


🗂️ The Underwriting File

The submission fills itself in — and the satellite confirms the roof. Back to Harbor Steel, now opened in the underwriting workstation rather than as a PDF. You enter the name and the Port Hadley address, and in the seconds that follow the file pre-fills: year built 1994; roughly 50,000 square feet; joisted-masonry/metal frame; fire protection class 4; named-windstorm and surge-adjacent peril codes from the catastrophe database; about thirty years in business; an entity-and-ownership record consistent with the single owner-operator you already knew about; and — the field you slow down on — a roof flagged "poor condition, likely end-of-life" by a computer-vision read of last quarter's aerial imagery. An industry-clearinghouse indicator returns two property losses in five years. And a real-time risk score, generated before you read a word, returns a number leaning toward decline.

Here is the layer this chapter adds, and it is a quieter one than it first appears. The pre-fill and the satellite imagery do not change the disposition you reached through Chapters 9 through 30 — they corroborate it. The aerial roof flag agrees with the loss runs, the inspection read, and Meridian's cover note; the peril codes agree with the coastal exposure you have been pricing all along; the clearinghouse loss count matches the two fires already in the file. Independent data, gathered for other purposes by parties with no stake in this submission, agreeing with your manual read — that is the most valuable thing pre-fill gives you, and it is worth more than any single source alone. You log the agreement, and your confidence in the risk picture goes up a notch.

But hold the discipline this chapter taught. The imagery is a quarter old, so it cannot confirm the roof was not already replaced — a hypothesis the inspection will settle, and one that matters because a warranted replacement is already a subjectivity on the quote. The score is logged as an input, not acted on: it has never read the broker's note about the signed roof contract, never seen the hot-work controls or the new plant manager, and cannot tell the two-loss count from the story of those two losses. So the running disposition is unchanged from Chapter 30 — quote-with-conditions, cat-load confirmed, within the zone aggregate — now standing on a risk picture the data confirms rather than merely asserts. The one open thread you flag for the next chapter is the score itself: the model recommends declining, and your file does not. Chapter 32 opens that model and earns the override — writing at a 6 the risk the machine scores a 7, with the documented reasons that make it a defensible underwriting judgment rather than a stubborn one. (Tindall Stores, the post-breach cyber submission from Chapter 24, is enriched in parallel: its data is pre-filled too, and the same lesson applies — the enriched feed tells you the breach happened; only judgment tells you whether the company actually fixed it.) The data has confirmed the read. The argument with the model is next.


Conclusion

Data and information technology have changed underwriting more in the last two decades than in the previous two centuries — but they have changed the inputs and the speed, not the job. The submission now arrives pre-filled from public records, imagery, sensors, and aggregators; the modern underwriting workstation assembles the risk picture in seconds and scores it in real time before a human reads a word; and for simple, high-volume, well-described risks, straight-through processing binds with no underwriter at all. Every one of those advances is real and worth having. And every one of them rests on a single fragile assumption — that the data is good — which is why the chapter ends on data quality and the oldest law of computing: garbage in, garbage out.

The through-line is the book's first theme, restated for the data age: underwriting is judgment. When information was scarce, judgment filled the gaps; now that information is abundant, judgment decides which of it to believe. The score on the screen is an input, not the answer; the pre-filled field is a hypothesis, not a fact; the automation should bind exactly the risks where judgment adds nothing and refer exactly the ones where it adds the most. The underwriter of the data age is not a button-pusher who trusts the assembled picture, and not a Luddite who refuses the tools, but the professional who reads the machine's first draft fast, corroborates what drives the price, knows what the data cannot see, and documents the why whenever their judgment and the score diverge.

That divergence — the moment the model says one thing and the underwriter, reading what the model cannot see, says another — is the central drama of modern underwriting, and we have so far only logged it. In the next chapter we open the model itself: how a generalized linear model splits a price into frequency and severity, what gradient boosting buys and what it costs, how to tell whether a model is any good, and how, with that understanding, you earn the right to write Harbor Steel at a 6 when the algorithm says 7. The data has confirmed the read. Now we learn to argue with the machine that priced it.


Key Terms

  • Data enrichment / pre-fill — the automatic population of a submission's fields from third-party data sources, so an applicant or producer enters minimal information and the system fills in the rest; fast and often more honest, but it imports its sources' errors and arrives wearing the costume of verified fact.
  • Real-time risk scoring — the automatic generation of a risk score (a grade, a predicted loss ratio, or a bind/refer/decline flag) the moment a submission arrives, by a model reading the enriched data; a powerful triage input that must be read as an input, not the decision.
  • The underwriting workstation — the integrated software environment in which the modern underwriter works: it pulls the submission, runs pre-fill, calls data sources, displays the assembled risk picture, runs the scoring and pricing models, surfaces guidelines and referral rules, and records the decision.
  • Alternative data sources — third-party information (satellite/aerial imagery, IoT and telematics, public records, aggregator feeds), often gathered for another purpose, that an underwriter draws in to enrich a risk assessment; each must be read for what it can and cannot reliably show.

Spaced Review

  1. Pre-fill returns a Harbor Steel roof age that agrees with the loss runs, the inspection, and the broker's note. Why is that agreement worth more than the roof age from any single one of those sources, and what single thing does the aerial image still fail to settle? (§31.3, §31.7)
  2. A real-time score and your read of the file diverge on a complex account. State the three habits §31.4 prescribes for using the score, and explain why documenting the why of the divergence is a compliance matter and not just good housekeeping. (§31.4; recall Chapter 13 documentation)
  3. From Chapter 1: the law of large numbers needs a large pool of similar, independent exposures. Use it to explain why a novel exposure with thin history is exactly the kind of risk a straight-through system should refer to a human rather than bind. (§31.5, §31.6; recall §1.2)
  4. From Chapter 6: risk is frequency × severity shaped by hazards and controls. A telematics feed gives a driver a near-perfect score. Which dimension does telematics observe well, and what can it still miss that a human would weigh? (§31.2; recall Chapter 6)
  5. (The recurring pricing-discipline question.) A carrier automates small-commercial underwriting and reports a lower expense ratio in year one. Would you call that a win yet? Explain what the combined ratio would have to show, and why the verdict cannot be reached for two or three years. (§31.6; recall Chapter 3)