Chapter 32: Game Balancing — The Spreadsheet Is Your Friend

Claude (Anthropic)

46 min read

There is a photograph that every aspiring designer should tape to their monitor. It is a screenshot, leaked years ago, from inside Blizzard's offices. Two designers stand at a whiteboard, and on the table behind them is a laptop. The laptop's screen...

In This Chapter

What Balance Actually Means
The Spreadsheet Mindset
Cost-Benefit Analysis
The Fundamental Triangle
Balancing for the Mean vs. the Master
Mathematical Tools
Simulation
The Meta and Evolution
Asymmetric Balance
Difficulty Curves in Single-Player
Economic Balance (Brief)
Communicating Balance Changes
Common Nerf/Buff Mistakes
GDScript / Spreadsheet Integration
Progressive Project Update: The Balance Pass
Common Pitfalls
Summary

Chapter 32: Game Balancing — The Spreadsheet Is Your Friend

There is a photograph that every aspiring designer should tape to their monitor. It is a screenshot, leaked years ago, from inside Blizzard's offices. Two designers stand at a whiteboard, and on the table behind them is a laptop. The laptop's screen is open to an enormous spreadsheet — dozens of columns, hundreds of rows, color-coded cells, formulas cascading down the sides. The spreadsheet tracks every unit in StarCraft II: HP, damage, armor, attack speed, build time, mineral cost, gas cost, supply, range, movement speed, counters, synergies. It is the document that the dream of StarCraft II runs on. It is ugly. It is a spreadsheet. It is also the reason the game has survived, essentially intact, for more than a decade as a competitive title.

If you want to ship a balanced game, you will learn to love spreadsheets. Not because spreadsheets are fun — they are not — but because balance is an emergent property of numbers, and numbers belong in cells. Designers who rely on their intuition ship unbalanced games. Designers who rely on spreadsheets ship balanced ones. The gap between the two populations is approximately the gap between cargo-cult engineering and actual engineering. You can sit at a table and argue about whether the Marauder is too strong. You cannot argue for very long when the spreadsheet shows that the Marauder's damage-per-mineral against armored targets exceeds every comparable unit's by twenty-six percent.

This chapter is about the craft of balancing — the boring, essential, unglamorous work that turns a playable prototype into a game people will fight each other to win at. We will start with what balance actually means (it is not what most people think), move through the math and the mindset and the tools, cover the asymmetries that make the job ten times harder than it looks, and end with you doing a balance pass on your own progressive project based on the playtest data you gathered in Chapter 31. If you have skipped Chapter 31 and have no playtest data, go back. Balancing without data is gardening in the dark.

One promise up front: this chapter will not make balancing easy. It cannot. Balance is one of the hardest problems in game design because it is high-dimensional, emergent, and adversarial — your players will find edges you did not know existed. What this chapter can do is make balancing tractable. You will end with a repeatable process, a set of tools, a shared vocabulary with the rest of your team, and the sober understanding that balance is not something you achieve once and move on from. It is something you tune, and re-tune, and keep tuning, for the entire life of the game.

What Balance Actually Means

Balance is the most misused word in game design. Designers throw it around as if it were a binary — balanced or not — and as if its meaning were self-evident. It is not. Before you can balance a game, you need to understand what you are aiming at.

The naive definition: a balanced game is one in which every option is equally good. This is the definition most players hold implicitly, and it is wrong. If every option were equally good, it would not matter which one you picked, and the game would become a lottery. Mathematically, if each of ten strategies has an equal fifty-percent win rate against every other strategy, the game has no strategic content at all. You could pick strategies by dice roll and perform identically. There is no skill in choosing. There is no evolution of the meta. There is nothing to learn. Games are supposed to reward knowing. "Perfectly equal" is the opposite of what you want.

The working definition, which you should internalize: a balanced game is one in which every option is viable, interesting, and situationally optimal. Let me unpack each of those.

Viable means the option is worth picking. If the Protoss Carrier costs eight thousand resources and does marginally better damage than a Void Ray at half the price, the Carrier is not viable. It is a trap. A viable option wins you games in some situations. A non-viable option is one that a halfway-serious player should not pick, ever. Non-viable options are not options — they are spreadsheet entries masquerading as choices. A balance pass often consists mostly of turning non-viable options into viable ones.

Interesting means the option creates meaningful decisions during play. A one-button insta-win mechanic is viable in the narrow sense (it wins games) but uninteresting — it collapses the decision space. An option is interesting when using it well requires judgment: timing, positioning, resource management, predicting the opponent. Interesting options are the ones designers love to watch in the hands of skilled players. Each one should generate a distinct style of play.

Situationally optimal is the key phrase. The goal of balance is that every option is the best choice in some situation. Tanks are the best choice against massed light units. Liberators are the best choice against ground armies that cannot look up. Mutalisks are the best choice for harassment and map control. No unit is the best choice everywhere. Each unit owns a niche. Rock beats scissors, scissors beats paper, paper beats rock — and when you expand the triangle to a hundred-point graph of mutual counters, you get something like a real, balanced strategy game.

Meaningful choice is the target, not symmetric choice. Meaningful choice means the player's decision matters, informed by knowledge, pressured by context, rewarded by skill. Meaningful choice is the whole reason to play a strategic game. Lose sight of that, and you will spend years nerfing and buffing numbers while the underlying game gets less interesting.

One more thing to set straight. Balance is not about fairness in the colloquial sense. Players will call a game "unfair" when they lose to a strategy they did not expect. Your job is not to eliminate that feeling — your job is to create a game where the player who loses to the unexpected strategy can, after the match, understand what happened and adjust. Balance is not a feeling. It is a structural property of the design. Players will feel unbalanced things, but player feelings are data, not verdicts.

💡 Intuition: Think of balance as the shape of the payoff matrix, not the magnitude of any single entry. A balanced game is one whose matrix has no dominant row and no dominated column — every strategy is the best response to some other strategy, and every strategy has at least one counter. The goal is no free lunches anywhere.

The Spreadsheet Mindset

Every designer who has shipped a balanced multiplayer title lives in spreadsheets. Every one. StarCraft II, Magic: The Gathering, Dota 2, League of Legends, Overwatch, Hearthstone, Smash Bros. Ultimate, Counter-Strike — if you could see the internal tools, you would see columns on columns on columns. Unit stats. Cost breakdowns. DPS curves. Counter matrices. Win rates by rank. Pick rates. Ban rates. Synergy scores. Everything tracked. Everything compared.

The reason is simple. The human brain cannot hold twenty variables at once and compare them across forty units. The brain reaches for heuristics, simplifications, and the three most recent games it watched. The spreadsheet holds the real data, and the designer queries it. The spreadsheet is an external memory that makes the designer's pattern-recognition useful instead of misleading.

The practical setup for a small team: one master workbook, version-controlled, with tabs for each category of content. Tabs you will want for a typical game:

Units / Heroes / Characters. One row per entity, with columns for every stat.
Abilities. Every ability its own row: cost, cooldown, damage, duration, scaling, range, radius.
Items / Equipment. Cost, stats provided, unique effects.
Progression Curve. Level, XP required, stat growth per level.
Economy. Resource costs, production rates, conversion ratios (callback to Chapter 24).
Derived Metrics. DPS, effective HP, time-to-kill, cost-per-damage, cost-per-HP.
Matchup Matrix. The triangle of who beats whom and by how much.

The derived-metrics tab is where the spreadsheet earns its keep. You do not balance raw HP or raw damage in isolation; you balance the combinations that players actually experience. A unit with 100 HP and 10 armor has an effective HP of about 200 against physical damage — that is the number the enemy feels when they attack. A unit with 10 DPS and 3 second cooldown on its signature ability has an effective DPS you can only compute when you add the ability's damage into the denominator. The spreadsheet lets you compute these derived quantities automatically, which means when you change one input cell, twenty output cells update, and you see the cascading effect of your change before you commit to it.

StarCraft II designer David Kim has talked publicly about the team's process: every balance change started in a spreadsheet. The change was modeled. Derived metrics were recomputed. If the model suggested the change would push a unit from "acceptable" to "overpowered," the number was adjusted before the build was ever patched. The spreadsheet filtered out the obvious failures, which let the iteration time on the remaining candidates be spent on playtesting and tournament data rather than on changes that were wrong on paper.

Magic: The Gathering's design team at Wizards of the Coast uses a legendary "New World Order" framework, with each card rated on a power-level spreadsheet before it sees daylight. The rating considers mana cost, power/toughness ratio, the abilities on the card, and a comparison to historical cards at the same cost. Cards that exceed the power-level ceiling for their cost are flagged and either nerfed on paper or re-costed before the set is printed. This is not mystical. This is a spreadsheet, applied rigorously, stopping broken cards from shipping.

You will build your own spreadsheet for your own game. The principle is the same regardless of genre: the spreadsheet is the source of truth for numeric design, and the game reads from the spreadsheet (via CSV export or similar) rather than hard-coding values in scripts. We will cover the technical side of that later in the chapter. The mindset side is what matters now: if a designer on your team proposes a change, and they cannot point you to the cell in the spreadsheet where the change was modeled, they are proposing a hunch. Hunches ship unbalanced games.

📊 Rule of Thumb: If your game has more than ten numeric values that drive behavior, you need a spreadsheet. If it has more than fifty, you need several. By the time it has hundreds, the spreadsheet is the design document.

Cost-Benefit Analysis

At the core of balance math is a single question: for what the player pays, how much do they get?

Every ability, item, and unit in your game has a cost. The cost might be time (cast time, cooldown), mana, gold, an action slot, a deck slot, an equipment slot, or an opportunity cost (by picking this you did not pick that). Every ability, item, and unit also has a benefit — damage, healing, utility, crowd control, mobility, information. Balance begins by making the cost-to-benefit ratio consistent across comparable options.

Take a concrete example. In a fighting game, you have two special moves:

Move A: 12 damage, 80-frame startup, 20-frame recovery, uses 25% of the super meter.
Move B: 15 damage, 90-frame startup, 30-frame recovery, uses 50% of the super meter.

Which is stronger? At a glance, you cannot tell — both have tradeoffs. The spreadsheet answers in three columns:

Damage per frame of total animation: A = 12/100 = 0.12, B = 15/120 = 0.125. Close to equal.
Damage per meter cost: A = 12/25 = 0.48, B = 15/50 = 0.30. A is far more efficient per meter.
Damage per frame of startup: A = 12/80 = 0.15, B = 15/90 = 0.167. B is slightly better per startup frame.

The decision criterion depends on what is scarce in the match. If meter is scarce, A dominates. If you have meter to burn and are looking for damage, B is slightly better per startup frame but far worse per meter. If both are freely available, A's damage-per-animation is nearly the same as B's but cheaper — so A dominates unless B has some unique utility.

This analysis is how professional balance happens. You do not ask "is this move too strong?" You ask "is this move's damage-per-cost within acceptable range compared to its siblings?" The spreadsheet computes the ratios. The designer looks at the columns. The outliers glare back.

A more advanced version of the same analysis is opportunity cost. In Hearthstone, a card's cost is mana. But its real cost is also the deck slot it takes (you have thirty) and the turn on which it is played (turn five, turn seven, endgame). A five-mana card that reliably wins you the game on turn five is balanced differently than a five-mana card that reliably wins you the game on turn fifteen, because by turn fifteen you have already won or lost for other reasons. Balance thinking includes asking "when is this used?" not just "what does this do?"

🎯 Tradeoff Spotlight: Cost-benefit math catches dominant options, but not dominated ones. A dominant option is strictly better than its comparables; a dominated option is strictly worse. Both are unbalanced. The dominated option is usually the more important problem to solve, because it represents a design choice that was supposed to be interesting and became useless. Spend your balance time making weak things viable at least as often as you spend it nerfing strong things.

The Fundamental Triangle

Symmetric competitive games are built on the triangle. The simplest version is rock-paper-scissors: rock beats scissors, scissors beats paper, paper beats rock. No option dominates; each has exactly one counter. Meaningful choice exists because you must anticipate what your opponent will pick.

The triangle is scaffolding. In real games, the geometry is more complex — a twenty-sided graph of mutual counters, with asymmetric edge weights — but the principle remains. Every option must have a counter, and every option must be a counter to something.

Street Fighter's footsies meta is a perfect example. At the neutral-game level, you have three primary tools: pokes (long-range normal attacks), throws, and anti-airs. Pokes counter throws (the thrower gets hit before grabbing). Throws counter defensive play (the turtle gets grabbed). Defensive walking counters pokes (the whiffed poke is punishable). Add jumping as a fourth layer, countered by anti-airs, and you have a dynamic rock-paper-scissors that plays out continuously in a match. If any one tool is too strong — pokes that hit too fast to react, throws that cannot be teched, anti-airs that work from full screen — the triangle collapses, and the meta devolves into spamming the dominant tool.

StarCraft's unit-counter system is the grand macro-scale version. Marines counter Zerglings. Zerglings counter Siege Tanks (swarming before the tanks set up). Siege Tanks counter Marines (splash damage at range). Colossi counter Marines. Vikings counter Colossi. Cyclones counter Vikings. Every unit has a role, and the role is defined by what it beats and what beats it. The triangle goes both directions, and the interactions are deep — some units counter hard (splash vs. clump), some soft (slight advantage at even cost) — but the structure is a directed graph of counters, maintained through hundreds of patches over a decade.

Hearthstone's deck-archetype meta runs the same math at a higher level of abstraction. Aggro decks beat Control (too fast to stabilize). Control beats Midrange (outvalues in the long game). Midrange beats Aggro (efficient removal plus beefy bodies). New decks are often designed explicitly to slot into one corner of the triangle — the team asks "what role does this deck play? what does it beat? what beats it?" before any card is costed.

When you are balancing your own symmetric-competitive game, your first map should be the counter graph. Write down every major option. For each, ask: what does it beat? and what beats it? If an option has no answer to the first question, it is too weak. If it has no answer to the second, it is too strong. These are binary problems, easy to diagnose, harder to fix — but the diagnosis is the beginning.

A common mistake is designing an option without considering its counter before shipping. You build a cool new unit, give it impressive numbers, ship it, and discover in the field that it has no effective counter — because you never identified one. The player base finds this out before you do, adopts the unit, and your meta collapses around it. Always identify the counter during the design phase, not after launch.

Balancing for the Mean vs. the Master

Here is a hard truth that took the League of Legends team years to articulate publicly: a champion that is balanced at the pro level is often overpowered at the casual level, and vice versa. These are not the same game.

Consider League's Riven. Pro players use her combo cancels and animation-buffer tricks to achieve effective damage-per-second that casual players cannot replicate. At the pro level, she is a threat but manageable. At the bronze level, a Riven main who has practiced one combo sequence can dominate the lane, because no one at bronze level knows how to punish the combo's cooldown windows. The same kit is "okay" at one tier and "overpowered" at another.

Azir is the inverse. His complex kit — the sand soldiers, the ultimate wall, the shuffle — rewards enormous skill. At pro level, Azir can solo-carry games, because the pro player extracts every drop of value from every mechanic. At bronze level, Azir's win rate craters, because bronze-level players cannot execute the kit's complicated pieces. The same champion is "a nightmare" at one tier and "a free win" for the other team at another.

How do you balance something that is simultaneously both problems? Riot's published strategy is to balance per-elo — their data breaks down win rates and pick rates by rank, and their targeted changes often affect the aspects of a kit that scale with skill, without touching the parts that do not.

If you nerf Riven's animation cancels, you hurt her at pro level much more than at bronze. If you nerf Azir's ultimate damage, you hurt his casual viability much more than his pro viability. If you nerf his sand-soldier base damage, you affect both levels roughly equally. These are different surgical tools, and a good balance team picks the right one for the right target.

There is no bronze-proof champion. There is no diamond-proof champion. The best you can do is bring each champion's win rate at each tier into an acceptable range — a fifty-to-fifty-two percent win rate across all tiers, with variance of no more than a few points between pro and casual. That is the Riot target. Ninety percent of the time they get close. Ten percent of the time some outlier breaks the pattern, and balance enters crisis mode.

For indie designers: if you do not yet have a pro scene, you are balancing mostly for casual play. This is a gift, not a problem. Your data is simpler, your players are more forgiving, and you can iterate on a single target audience. When (if) a pro scene emerges, you will discover that your balance held at one level and not the other — and you will have real work ahead. Until then, trust your casual data and your spreadsheet.

⚠️ Common Pitfall: Balance-around-streamers. Do not rewrite your game because a popular YouTuber complained about a matchup. Streamers play at a specific skill level, often narrow, often personal. Their experience is real data, but it is one data point. Aggregate data beats anecdote every time.

Mathematical Tools

A small set of numeric tools covers most of what a designer needs. Learn to calculate each in your sleep.

Damage per second (DPS). The sustained rate at which an attacker deals damage under standard conditions. For a weapon with 40 damage per hit and a 0.8 second attack interval, DPS = 40 / 0.8 = 50. For a bow that fires three arrows per shot at 12 damage each with a 1.2 second cooldown, DPS = (3 × 12) / 1.2 = 30. DPS is the common language of offensive comparison, and the first number to put in your spreadsheet for any attacker.

Effective HP (EHP). The damage a defender absorbs before dying, accounting for armor, resistance, damage reduction, and evasion. For a unit with 100 HP and 25% damage reduction, EHP against matching damage types = 100 / (1 − 0.25) = 133. For a unit with 200 HP and 30 armor on a flat-subtraction system where armor reduces damage by one point per armor, EHP depends on the attacker — against a 10-damage attack, each hit does zero damage and EHP is effectively infinite; against a 40-damage attack, each hit does 10 damage and EHP is 200 / 10 × 40 = 800 raw damage. Flat-subtraction armor is brutal for a reason.

Time to kill (TTK). How long it takes attacker X to kill defender Y. TTK = EHP(Y) / DPS(X). If your Marine does 10 DPS and your Zergling has 35 EHP, TTK = 3.5 seconds. TTK is the most player-facing of the derived metrics, because players feel it directly — a one-shot kill feels different from a four-second slugfest. When you balance a matchup, you are balancing a TTK target, and the TTK should land in a range that feels like a fight rather than a coin flip.

Opportunity cost. What did you not do by choosing this option? A level-up ability that gives +10% damage forever has a different opportunity cost from a consumable that gives +10% damage for five seconds. Opportunity cost is what makes slot-based and cooldown-based design interesting. If a card costs a deck slot, its cost is the best card that would have gone in that slot instead.

Expected value (EV). The average outcome of a random event, weighted by probability. A 50% chance of 10 damage has an EV of 5 damage. A 30% chance of a critical hit at 2.5× damage has an effective damage multiplier of (0.3 × 2.5) + (0.7 × 1.0) = 1.45× — meaning on average, a unit with 30% crit chance deals 45% more damage than the same unit without. When designing RNG-driven abilities, always compute EV, because EV is what the player experiences over a long enough session.

Standard deviation and variance. When outcomes have a distribution, EV tells you the center and standard deviation tells you the spread. A 5%-chance-of-200-damage ability has the same EV as a 100%-chance-of-10-damage ability (EV = 10), but completely different play feel. High variance is exciting once in a while and frustrating as a core loop. Low variance is consistent. Mix them intentionally; do not let high variance sneak in and make your game feel arbitrary.

Synergy coefficients. When two abilities or units interact multiplicatively, their combined value can exceed the sum of their parts. A crit-chance stat and a crit-damage stat combine multiplicatively — each doubles the value of the other. Two healers in a party heal more than one healer plus one DPS in the same game time, because the surviving party deals more damage over the longer fight. Synergy coefficients are why an ability that looks balanced in isolation can be broken in combination — and why you have to model pairs, not just singletons.

A worked example, to cement the habit. In your progressive project, you have a starter sword that does 10 damage at 1 attack per second, and a heavy hammer that does 25 damage at 0.4 attacks per second. Which is better?

Sword DPS = 10 × 1 = 10.
Hammer DPS = 25 × 0.4 = 10.

Equal DPS. But the hammer is not equivalent, because TTK against a 50-HP enemy: - Sword TTK = 50 / 10 = 5 seconds (five 10-damage hits). - Hammer TTK = 50 / 10 = 5 seconds (but two 25-damage hits, with the first killing the enemy 25/50 = at 40% of its HP gone on the first swing).

Against a 20-HP enemy: - Sword TTK = 20 / 10 = 2 seconds (two hits). - Hammer TTK = 2.5 seconds (one hit, which kills immediately — but you waited 0.4 seconds for the windup, so the effective TTK is 0.4 seconds).

The hammer dominates one-shot-kill matchups; the sword dominates sustained fights. That is interesting and balanced. If your numeric design produces this kind of differential, you are on the right track. If it produces the hammer is always better or always worse, rebalance until it does not.

Simulation

When the spreadsheet is not enough — when the interactions are too complex for closed-form math — you simulate. Simulation is Monte Carlo for game design: run the scenario a thousand times with random inputs, aggregate the outcomes, and use the distribution as your balance signal.

The StarCraft II team has an internal bot league. Bots — programmed AI opponents — are matched against each other in thousands of self-play matches per day. The win-rate data for each matchup shapes the balance team's priorities. If Terran-Zerg at the top skill bracket holds at 51/49, they leave it alone. If it drifts to 55/45 after a patch, they investigate. Bot data does not replace human data; it supplements it, especially in matchups that human pro players are not exploring.

Hearthstone uses a battle simulator for its Battlegrounds mode. The simulator computes probabilistic outcomes of combat between two compositions and is used both in-game (to show the player their win chance) and in development (to detect combinations whose win rate exceeds the target range). When a new minion is designed, the simulator runs thousands of matches with the new minion in random compositions, and the win-rate data tells the team whether the minion is over- or under-tuned before it reaches the live patch.

Small studios can do this with Python. A Monte Carlo simulator for a combat system is a few dozen lines of code. You loop over matchups, roll simulated attacks and defenses, track damage, count wins. After ten thousand matches per pairing, you have statistically significant win rates for every matchup in your game. Compare to the target distribution, flag the outliers, change the spreadsheet, re-run.

Here is a minimal Python sketch, the kind of thing a designer should be able to write in an afternoon:

import random

def simulate_duel(a_stats, b_stats, n_trials=10000):
    a_wins = 0
    for _ in range(n_trials):
        a_hp, b_hp = a_stats['hp'], b_stats['hp']
        while a_hp > 0 and b_hp > 0:
            # A attacks B
            dmg = max(0, a_stats['dmg'] - b_stats['armor'])
            if random.random() < a_stats.get('crit_chance', 0):
                dmg *= 2
            b_hp -= dmg
            if b_hp <= 0:
                a_wins += 1
                break
            # B attacks A (same pattern)
            dmg = max(0, b_stats['dmg'] - a_stats['armor'])
            if random.random() < b_stats.get('crit_chance', 0):
                dmg *= 2
            a_hp -= dmg
            if a_hp <= 0:
                break
    return a_wins / n_trials

Run this for each pair of units in your game. Output is a win-rate matrix. Outliers — cells above 65% or below 35% — are the cells to investigate. Twenty lines of code have now told you more about the health of your balance than thirty hours of playtesting would. Playtesting is still essential (Chapter 31, always), but simulation catches gross imbalances before human time is wasted on them.

For GDScript specifically, you can run the same kind of simulation inside the engine, using a headless mode that skips rendering. This has the advantage of running the actual game logic rather than a simplified model, which catches emergent bugs that the simplified model misses. Blizzard's bot league does this; so can you, at a fraction of the scale.

🧪 Practitioner Note: Use simulation to rule things out, not to rule them in. If simulation says a unit is balanced, that is a necessary but not sufficient condition. The unit might still be unbalanced in ways the simulator does not model — positioning, player psychology, meta-interactions. If simulation says a unit is unbalanced, believe it.

The Meta and Evolution

A game's balance is never static. It cannot be, because the players are not static. The community learns, strategies evolve, new interactions are discovered, and the "meta" — the metagame, the dominant strategies at any given moment — shifts like weather. Balance work is never done. The game at month one is not the game at month twelve.

The meta is the emergent consequence of balance, player skill, and time. Early in a game's life, the meta is a guess — the top players try what looks strong, and the community follows. A few weeks in, data accumulates: win rates by unit, pick rates, ban rates. A dominant strategy emerges, and a counter-strategy emerges in response. Then a counter to the counter. Then something unexpected from a region of the design space nobody considered. Then the tournament results force a re-evaluation. The cycle never stops.

Your job as the designer is to shape the evolution without trying to control it. You set the conditions — the unit stats, the ability costs, the economic tempo — and then you watch what the community does with them. When the meta converges into a single dominant strategy, you patch to disrupt it. When it fragments into incomprehensible chaos, you patch to re-center it. When it becomes stale, you patch to shake it up. The patch cadence becomes a rhythm that shapes the game's lifespan.

Hearthstone patches weekly in response to the meta. Small nerfs, small buffs, targeted at whatever has dominated the past seven days. Blizzard's stated goal is to prevent any single deck from exceeding a 60% win rate, and they hit it most of the time. The cost of weekly patches is that the meta never settles — players complain that the game they learned last week is not the game they are playing now — but the benefit is that degenerate states never last.

League of Legends patches every two weeks on a fixed cadence. The biweekly rhythm lets pro teams prepare for tournaments (they know patch 13.X will be stable for the next two weeks) while still allowing Riot to respond to problems. The patch size is tuned so that each patch is meaningful but not game-breaking — no more than twenty or so champions touched per patch, most changes in the single-digit percent range.

Path of Exile uses a quarterly league cycle. Every three months, Grinding Gear Games releases a "league" — a massive expansion with new mechanics, new items, and large balance changes. Between leagues, the meta stabilizes. Within a league, the dominant build is identified in the first week and other players either adopt it or find alternatives. Quarterly is the slow end of the cadence spectrum, and it produces a game that feels stable-then-upended, stable-then-upended. Players either love or hate this.

Smash Bros. Ultimate patched on an irregular cadence for a few years after launch, then stopped. For competitive players, this is a mixed bag — the meta has had years to settle, every matchup is known, every interaction documented, but the imbalances that exist at this point (Steve, anybody?) will never be fixed. Stopped patches are a design choice with consequences.

The pro scene is the fastest mirror on your balance. A tournament weekend, in a game with a serious competitive scene, functions as thousands of hours of top-level playtesting compressed into three days. The units you thought were niche get picked by a top team; the units you thought were dominant get banned out of every match. By Monday morning, the balance team has more information than a month of ladder data could produce. This is why balance-for-the-pro-scene is a coherent design philosophy: the pros are your best data, even when they are not your target audience.

But remember the per-elo problem. The pro scene's data is about the pro scene. It is not about what happens at gold rank, where your casual players live. A healthy balance process watches both, and weights them based on the game's design target. A primarily casual game weights the casual data higher. A primarily competitive game weights the pro data higher. Your game is probably somewhere in between, and your weighting should reflect that.

Asymmetric Balance

Balancing symmetric games is hard. Balancing asymmetric games — where different players have fundamentally different tools — is brutal.

In a symmetric game, every player starts with the same options. If unit X beats unit Y by a small margin, the player who picks X has an advantage, but both players had the opportunity to pick X. The decision is symmetric; the outcome asymmetry reflects the players' different choices. A 50% win rate emerges when neither player has a pick advantage.

In an asymmetric game, players are given different roles or different toolkits, and the target is "different but fair." This is the Heroes of the Storm specialist-role problem, the Rainbow Six Siege attackers-vs-defenders problem, the Dead by Daylight killer-vs-survivor problem, the Natural Selection marines-vs-aliens problem. The players are playing different games against each other, and the balance target is that both sides feel competitive while playing completely different strategies.

Dead by Daylight is the canonical example of the difficulty. One killer versus four survivors. The killer's toolkit is offensive — tracking, chasing, hooking. The survivors' toolkit is cooperative — generator repair, flashlight saves, body-blocking. The target win rate is contentious; Behaviour has aimed for roughly 60% killer-escape at base, but the actual rate depends enormously on skill, rank, killer choice, map, and patch. After nearly a decade of balancing, the community still argues about whether the current state is fair. This is typical of asymmetric games. You are aiming at a target that recedes as the players improve.

Rainbow Six Siege solves part of the problem through structural symmetry at the match level. Attack and defense swap; a team plays both sides across a match, and the aggregate win rate is what matters. If attack has a 48% win rate per round and defense has 52%, the match-level result depends on how many rounds each side plays. Ubisoft has tuned round-win-rates to approach 50/50, with the operator-level balance adjusting the margins.

Asymmetric balance requires more data per balance decision than symmetric balance, because every matchup pair has to be evaluated in both directions. You cannot assume Killer-A versus Survivor-set-B has the same win rate as Killer-B versus Survivor-set-A. You need data on each, and the data matrix grows multiplicatively with each added option.

For indie designers: if you can ship a symmetric game, do. Asymmetric design is not wrong — some of the most interesting games ever made are asymmetric — but it is a harder problem, and your first game is not the place to take on that harder problem. Spelunky is symmetric co-op. Stardew Valley is symmetric co-op. Celeste is single-player. These are indie successes that sidestepped the asymmetric-balance problem entirely.

Difficulty Curves in Single-Player

Not every game has matchups. Single-player games balance a different problem: the progression curve. The player gets stronger over time; the challenges get harder over time; the gap between the two — the difficulty curve — is what the player feels.

A well-tuned single-player curve has three properties. First, challenges scale approximately with player growth, so a hundred hours in, the combat is no easier than it was five hours in, because the enemies scale with the player. Second, the curve has bosses and spikes — deliberate moments of heightened challenge that reward mastery. Third, the curve has valleys — moments of relative ease after a hard fight, where the player exhales and enjoys the power fantasy before the next climb.

Callback to Chapter 25 (Progression Systems): the XP/power curve is the spine here. If the player gains 10% damage per level and the enemy HP scales at 12% per level, the player is falling behind at a rate of about 2% per level. Over thirty levels, that compounds to a meaningful disadvantage — the game gets harder even as the player feels more powerful. You can tune this intentionally (a hard-mode slope) or accidentally (a balance bug), but you have to be aware of it.

Bosses are your difficulty spikes. A boss at the end of an area should ask the player to demonstrate everything the area taught them. Hollow Knight's False Knight tests the player on the basic combat vocabulary — jump, dash, nail swing, soul-cast — that the first area introduced. Dark Souls's Ornstein and Smough tests the player on stamina management, spacing, and crowd control — everything that Sen's Fortress and the earlier game trained them on. A well-designed boss is a final exam for the area. A badly-designed boss is a brick wall disconnected from everything that came before.

FromSoftware's design philosophy is instructive here: "the boss is hard until you respect it, then easy." The curve is not linear; it is a threshold. The player fails for an hour, then figures out the timing, then wins. The fight is hard, and then, on one specific attempt, it becomes easy — because the player has internalized the pattern. This is mastery-as-difficulty, and it creates some of the most satisfying completion moments in all of gaming.

Contrast the bullet-sponge boss — a boss with so much HP that the fight becomes attrition, not pattern-recognition. Bullet sponges fail because they teach nothing. The player's victory is a function of damage output and patience, not skill. Any boss that takes more than a few minutes of honest fighting risks becoming a sponge. If your boss is getting complaints about "it takes forever," your problem is not damage — your problem is that the fight's pattern is not rich enough to sustain that length.

Valleys are as important as spikes. After a boss, the player should exhale. A few minutes of easy encounters, a shop, a fast-travel node, a narrative beat. Hollow Knight gives you a bench and a breath after every major fight. Celeste gives you a chapter-ending flag and a transition scene. The valleys are where the player consolidates what they just learned; where they feel powerful before the next ramp. A game that is all spikes exhausts. A game that is all valleys bores. The rhythm matters.

And the curve must be tunable. Almost every modern single-player game ships with difficulty options, and for good reason — the same curve that one player finds perfect another finds impossible or trivial. Celeste's assist mode, Resident Evil's adaptive difficulty, Halo's difficulty presets, Dark Souls's refusal to include them (its design decision, deliberate and debated) — each of these is a balance decision about where on the curve your audience sits. We covered assist modes and accessibility in Chapter 11 (Flow) and Chapter 29 (UI/UX); difficulty options live in the intersection of those design concerns.

Economic Balance (Brief)

Callback to Chapter 24 (Game Economy Design): economic balance is its own problem, and it interacts with combat balance in ways you should not ignore.

The basic question is the same — are all the options viable, interesting, and situationally optimal — but the variables are different: income rate, expenditure rate, price structure, scarcity, inflation. An economy balanced in isolation can still destabilize the rest of the game.

Example: you balance your combat system so that every weapon is viable. But your economy offers the Heavy Sword at 50 gold and the Light Dagger at 500 gold. Every rational player buys the sword. The dagger is economically dominated even if it is combat-viable — nobody ever uses it, because the gold is better spent on other things. You never see the dagger-balance problem because the economy is gatekeeping the dagger before the player ever tries it.

The fix is price-balance as carefully as stat-balance. Every option's price should reflect its value, and "value" includes the combat numbers, the opportunity cost, and the role in the meta. Monster Hunter's weapon economy is elaborate for this reason — different weapons require different materials, gathered from different monsters, at different difficulty levels. The economy routes the player through every weapon type, not because the player chose to, but because the economy made each a reasonable next step.

Inflation and deflation are the slow-burn economic problems. Currency inflation happens when the player has more gold than things to spend it on — a common late-game state. Inflation trivializes shops and makes drops meaningless. Deflation is the opposite, where the player is always short on currency and can never afford the interesting purchases. Inflation and deflation are functions of rates; you tune them with spreadsheets. We covered this in Chapter 24. The connection to balance is that an inflating economy ruins combat balance: if gold is meaningless, every purchase is equivalent, and the economic dimension of choice collapses.

Communicating Balance Changes

Patch notes are not documentation. Patch notes are community management.

When you change a number, you are also making a statement to your community about what you value, what you noticed, and how you are thinking. A patch note that says "Riven's Q cooldown increased from 11 to 12" is information. A patch note that says "Riven was dominating high-elo play more than we wanted and her carry potential needed to come down; we are targeting her early-game trading patterns without hurting her teamfight kit" is design communication.

League of Legends pioneered the modern patch-note style. Every change has a rationale paragraph. The rationale explains what the designer saw, what they thought, and what they want the change to do. The format is now standard across the live-service genre, and for good reason — it turns every patch into a design conversation.

The reasons to write patch notes carefully:

You respect your players' intelligence. Players can tell when a change is motivated by data versus vibes. A well-explained change builds trust even if players disagree with it. An unexplained change feels arbitrary, and the community fills the vacuum with conspiracy theories.

You train your player base. When you explain why Riven's cooldown went up, players learn to think about balance in those terms. Over time, your community develops a shared vocabulary — early-game, teamfight, carry potential — and discussions become more productive.

You commit to a philosophy. Patch notes document your balance ideology over time. A team that consistently explains "we target power level, not play patterns" is saying something different from a team that consistently says "we target specific degenerate strategies." Neither is wrong. Both are choices. The patch notes make the choice visible.

For an indie developer, the lesson is simple: write patch notes like the live-service games write them. Every change gets a rationale. Every rationale points to the data or observation that prompted it. When a player asks in your Discord "why did you nerf X?" the answer should already be in the patch notes.

A minor change to a minor unit, well-explained, builds more community trust than a major change, badly-explained. The tone of your patch notes is the tone of your ongoing relationship with your players.

💬 Playbook: A patch note has four parts: what changed (number X from Y to Z), why it changed (the observation or data), what you expect (the predicted outcome), and what you will watch (the metric you will monitor to see if the change worked). If any part is missing, the note is incomplete.

Common Nerf/Buff Mistakes

The surest way to destabilize your balance is to overreact to short-term data. These are the recurring failure patterns.

The over-nerf cycle. A unit is dominant. You nerf it. The nerf lands harder than expected — the unit is now weak. Next patch, you un-nerf slightly. You over-correct the other direction. A cycle develops where the unit oscillates above and below the balance line. Over-nerf cycles happen because the first nerf is usually motivated by player complaints rather than data, and player complaints overshoot reality. Fix: make smaller changes. A 3% nerf is almost never worse than a 10% nerf, and it is far easier to extend into a 6% nerf next patch than to walk back a 10%.

Power creep. Each patch makes the game slightly more powerful — new abilities are marginally stronger than old ones; new heroes are kitted with slightly more options; new items are marginally more efficient. Over twenty patches, the baseline has drifted upward by 30%, and the old content feels anemic. Power creep is the default state of games without design discipline, because buffs feel good and nerfs feel bad, and the path of least resistance is to buff. Fix: commit to a power ceiling, and measure new additions against it. If a new hero needs 10% more baseline damage to feel "exciting," something is wrong with the baseline, not with the hero.

Single-unit balancing. You evaluate a unit in isolation, change it, and ship. The change lands — but it breaks the synergy with its teammate, because you did not model the synergy. Now the pair is underpowered. Fix: always model in context. Never balance a unit without looking at its three most-used partners.

Tournament-tuned, ladder-broken. A change that helps the tournament meta (slows down the degenerate pro strategy) can wreck the ladder meta (suddenly the casual-friendly units are much stronger because their pro counters are gone). Fix: check both before shipping. When in doubt, which meta is your game actually for?

Chasing the last patch. You nerf X. Next week, Y is dominant, because X was X's only counter. You nerf Y. Next week, Z is dominant. You are now playing whack-a-mole with your own meta, and every patch produces a new "dominant" unit because the stability of the meta was precisely what you kept punching holes in. Fix: identify the structural problem (what is the counter-triangle supposed to look like?) and patch toward the structure, not away from the most recent complaint.

No changelog. You patch silently. The community has to guess. Trust erodes. Conspiracy theories spread. Fix: never silent patch. Even minor bug fixes get notes.

GDScript / Spreadsheet Integration

Here is the technical piece. The spreadsheet is the source of truth; the game reads from the spreadsheet. You do this by exporting the spreadsheet to CSV (every modern spreadsheet tool supports this) and reading the CSV at runtime or build-time in GDScript.

A minimal Godot implementation for a unit-stats CSV:

# BalanceLoader.gd — load unit_stats.csv into a dictionary
extends Node

var unit_stats: Dictionary = {}

func _ready() -> void:
    load_unit_stats("res://balance/unit_stats.csv")

func load_unit_stats(path: String) -> void:
    var file = FileAccess.open(path, FileAccess.READ)
    if not file:
        push_error("Balance file not found: " + path)
        return
    var headers: PackedStringArray = file.get_csv_line()
    while not file.eof_reached():
        var row: PackedStringArray = file.get_csv_line()
        if row.size() < headers.size():
            continue
        var unit_id := row[0]
        var stats: Dictionary = {}
        for i in range(1, headers.size()):
            stats[headers[i]] = _parse_value(row[i])
        unit_stats[unit_id] = stats
    file.close()

func _parse_value(s: String) -> Variant:
    if s.is_valid_int():
        return s.to_int()
    if s.is_valid_float():
        return s.to_float()
    return s

func get_stat(unit_id: String, stat: String) -> Variant:
    if not unit_stats.has(unit_id):
        return null
    return unit_stats[unit_id].get(stat, null)

Your unit_stats.csv in the balance/ directory looks like this:

id,hp,damage,attack_speed,armor,move_speed,cost_gold
slime_green,20,3,1.0,0,40,0
slime_red,35,5,0.8,1,45,0
goblin_archer,25,6,0.7,0,60,10
orc_warrior,80,14,0.5,3,45,30
boss_forest_01,400,22,0.6,5,35,0
player_starter,100,10,1.0,0,80,0

Now, in any enemy or item script, you fetch stats through the loader rather than hard-coding them:

# EnemyBase.gd
extends CharacterBody2D

@export var unit_id: String = "slime_green"
var hp: int
var damage: int
var attack_speed: float

func _ready() -> void:
    var loader = get_node("/root/BalanceLoader")
    hp = loader.get_stat(unit_id, "hp")
    damage = loader.get_stat(unit_id, "damage")
    attack_speed = loader.get_stat(unit_id, "attack_speed")

The moment you wire this up, your balance iteration loop collapses from "change a number in code, recompile, test" to "change a number in the spreadsheet, save, reload the scene." Designers who do not code can now tune the game. Artists can now adjust pacing. QA can experiment with scenarios. The bottleneck of "only the programmer can tune" disappears.

You can go further with hot-reload during development. A file-watcher node monitors the CSV for changes and reloads the loader on save:

# BalanceHotReload.gd — editor/debug builds only
extends Node

var _last_mtime: int = 0
const BALANCE_PATH := "res://balance/unit_stats.csv"

func _process(_delta: float) -> void:
    if not OS.is_debug_build():
        return
    var mtime = FileAccess.get_modified_time(BALANCE_PATH)
    if mtime != _last_mtime and _last_mtime != 0:
        get_node("/root/BalanceLoader").load_unit_stats(BALANCE_PATH)
        print("[Balance] Hot-reloaded from spreadsheet.")
    _last_mtime = mtime

With hot-reload, your iteration is: save the spreadsheet, switch to the running game, see the new numbers. No engine reload, no scene reset. This is the workflow that real balance teams use, scaled down to your project's size.

🔧 Pro Tip: Split your CSVs by domain. One unit_stats.csv, one ability_stats.csv, one item_stats.csv, one progression.csv. Small files are easier to diff in source control, easier to merge when two designers are tuning different areas, and faster to reload.

Progressive Project Update: The Balance Pass

You finished Chapter 31 with a document of playtest findings. It probably contains observations like: "the first boss feels too punishing — 6/10 testers died more than four times before winning"; "the slime enemy in Area 2 feels too easy — testers killed them without engaging the dodge mechanic"; "the fire sword is always picked over the ice sword — 9/10 testers preferred fire"; "level 3 feels like a wall — 3/10 testers quit during it."

Your balance pass, this chapter, takes those findings and acts on them. Here is the structured process.

Step one: convert findings to hypotheses. Every finding is an observation, not a solution. The finding "first boss too hard" is compatible with several solutions: reduce boss HP, reduce boss damage, extend the invulnerability window on player dodge, improve the tell on the boss's biggest attack, add a checkpoint mid-fight. The hypothesis step identifies which variable you think is the actual problem. Write it in the spreadsheet as a comment on the row you plan to change.

Step two: model the change. Open the spreadsheet. Change the value. Look at the derived metrics (TTK, EHP, cost ratios). Does the change produce the effect you expected? If you reduced boss HP from 400 to 350, TTK against the player's sword drops from 40 seconds to 35 seconds — is that the right target, or is 32 seconds too easy? Adjust until the model predicts the outcome you want.

Step three: implement the change. With the CSV-loader pattern above, this is saving the file. In a non-CSV game, it is editing the relevant script. Either way, the change lives in one place, and the source of truth is documented.

Step four: test. You do not have to run a full formal playtest for every change — an informal self-test (you, playing, watching the metric you care about) is often enough to confirm the change is in the right ballpark. Keep a running log of each change and its immediate effect.

Step five: document the change. Write a patch note for your own game, even if no player has ever seen it. "Chapter 32 balance pass: boss HP reduced from 400 to 350 because playtest showed average attempts-to-clear at 4.3, targeting 2.5." The patch notes are your design diary. A year from now, when you wonder why the boss has 350 HP, the note will tell you.

The concrete deliverables for this chapter's progressive project update are:

A balance/ directory in your Godot project containing at least one CSV (unit_stats.csv) and the BalanceLoader.gd autoload.
A docs/balance-passes.md file with one entry per change you made, following the four-part patch-note format.
Updated enemy and player scripts that read from the loader rather than hard-coding values.
At minimum, five balance changes based on Chapter 31 playtest data, with rationale for each.

This is not busywork. It is the transformation of your game from a prototype that "works" into a design that was tuned. The difference shows up in every playtest from now on.

Common Pitfalls

Balancing by gut. A designer with ten years of experience and strong intuition can get close to balanced on gut alone. Most designers cannot. If you are not Mark Rosewater or David Kim, trust the spreadsheet over your instinct, and compare the two when they disagree — the disagreement is where you learn.

Reacting to the most recent playtest. One playtester said the boss was too hard, so you halve its HP. The next playtester says the boss was trivial. You are ping-ponging on sample-size-of-one. Aggregate first. Five playtesters with consistent complaints is data. One playtester with a complaint is an observation.

Ignoring synergy. You balance the fire sword alone, verify it is fine, ship. A playtester discovers it synergizes with the fire-amulet to double-damage enemies with the burning debuff, and the combination is broken. Always check how a change interacts with the top three partners of the thing you changed.

Tournament-only balancing. Your balance changes fix the competitive meta but push casual players out of the game. The competitive scene is a feedback source, not the whole audience. Check casual data after every change.

No changelog. You silent-patch. The next time a player is surprised by a change, they lose trust in your process. Never silent-patch. Even nothing-changes-but-a-bug-fix deserves a line.

Spreadsheet idolatry. The opposite failure: trusting the spreadsheet so much that you ignore playtest data that contradicts it. The spreadsheet models what you programmed it to model. It does not model player psychology, habit patterns, or surprise. If the spreadsheet says a unit is balanced and players universally say it feels terrible, the players are right and the spreadsheet is wrong. Update the model to include the missing variable, and rebalance from the new model.

Summary

Balance is not a feeling. It is the structural property of a design where every option is viable, interesting, and situationally optimal. You achieve it through spreadsheets, cost-benefit math, simulation, per-skill-level analysis, and the relentless application of data to hypothesis.

The spreadsheet is your primary tool because it is external memory. The brain cannot hold twenty variables across forty units. The spreadsheet can, and it runs derived-metric formulas that tell you immediately whether a change to one value cascades into chaos or into coherence. Designers who ship balanced games live in spreadsheets. You should too.

The core math is a small set: DPS, EHP, TTK, opportunity cost, expected value, variance, synergy. Learn to compute each without a calculator. The moment you can look at a unit's stats and compute its effective DPS in your head, you are thinking like a balance designer.

Simulation is the extension of the spreadsheet when the interactions outrun closed-form math. Monte Carlo, bot matches, self-play — all of these are industrial-scale playtest data at nearly zero marginal cost. Use them to rule failures out, not success in.

The meta evolves because the players learn. Your patch cadence shapes the evolution. Too fast and the game never settles; too slow and it stagnates. Pick a cadence that matches your game's scope and your team's bandwidth, and write patch notes that explain your reasoning — the notes are community management, not just documentation.

Asymmetric balance is harder than symmetric. Single-player difficulty curves are a different problem than multiplayer matchups. Economic balance interacts with combat balance in non-obvious ways. Each of these is a chapter on its own; what they share is the spreadsheet-plus-data discipline that runs through everything.

You have playtest data from Chapter 31. You have the tools now to act on it. Your progressive project deserves a balance pass — not a random tuning, but a hypothesis-driven, spreadsheet-modeled, documented pass that turns your prototype into something that was tuned. Do the pass. Then test again. Then do the next pass. This is the work.

Next chapter, Chapter 33, turns to ethics — including the dark version of the skills you just learned, where balance becomes exploitation and tuning becomes predation. Balance for skill expression is the craft. Balance for extracting the maximum possible money from vulnerable players is the same craft turned against the player. You should know the difference, because you will make the choice, over and over, in every game you ever ship.

The spreadsheet is neutral. You are not. What goes in the cells is a design decision, and every design decision carries a value judgment about what the game is for and who it is for. That judgment is yours, and it is the hardest part of the job.