Case Study: The Carbon Footprint of GPT-4

DataField.Dev

Case Study: The Carbon Footprint of GPT-4

"We do not disclose the details of our model architecture, training compute, or training methods." — OpenAI, GPT-4 Technical Report, 2023

Overview

In March 2023, OpenAI released GPT-4, the most capable large language model of its time. The model's technical report described its performance in extraordinary detail: benchmark scores, exam results, multilingual capabilities, and reasoning ability. What the report conspicuously omitted was any information about the model's environmental footprint — the energy consumed, the carbon emitted, or the water used during training.

This omission is the starting point for this case study. Using publicly available information, informed estimation, and the CarbonEstimator class from Section 34.2, we reconstruct what we can about the environmental cost of GPT-4's development. More importantly, we analyze what the omission itself reveals about the governance of AI's environmental impact.

Skills Applied: - Applying the CarbonEstimator class to estimate training emissions under uncertainty - Evaluating the significance of non-disclosure in environmental governance - Connecting computational cost to environmental justice frameworks - Assessing proposals for mandatory environmental reporting

The Situation

What We Know

OpenAI's GPT-4 Technical Report (March 2023) is notable for what it does not say. While acknowledging that GPT-4 is a "large-scale, multimodal model," the report provides no information about:

The number of parameters in the model
The type or number of GPUs/TPUs used for training
The duration of training
The total computational cost (in FLOPs or GPU-hours)
The energy consumed during training
The carbon emissions produced
The water consumed for cooling
The data center location(s)

OpenAI explicitly justified the omission: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."

The stated reasons — competitive pressure and safety — are legitimate concerns. But the environmental consequence of non-disclosure is that the carbon footprint of one of the most resource-intensive AI training runs in history is unknown to the public, to regulators, and to the research community.

What We Can Estimate

Independent researchers have attempted to estimate GPT-4's carbon footprint based on publicly available clues:

Model architecture. Reports from The Information (2023) and other sources suggest GPT-4 uses a Mixture of Experts (MoE) architecture with approximately 1.8 trillion total parameters, with approximately 220 billion parameters active per forward pass across 16 expert modules. While unconfirmed by OpenAI, this estimate is widely cited.

Training infrastructure. Microsoft, OpenAI's primary computing partner, built a dedicated supercomputer cluster for GPT-4 training, reportedly comprising approximately 10,000-25,000 NVIDIA A100 GPUs (later reports suggest partial use of H100s as well). The cluster is housed in Microsoft Azure data centers, primarily in the US.

Training duration. Estimates range from approximately 90-120 days of continuous training, with multiple training runs (including failed runs, restarts, and hyperparameter optimization).

Compute cost. Various estimates suggest GPT-4's training required approximately 2.15 x 10^25 FLOPs (floating-point operations), which at A100 throughput rates would require thousands of GPU-years of computation.

Estimation Exercise

Using the CarbonEstimator class, we can bracket the likely range:

Lower bound estimate (conservative):

CarbonEstimator(
    gpu_type="A100",
    num_gpus=10000,
    training_hours=2160,  # 90 days
    cloud_region="us-east",
    pue=1.1,
    gpu_utilization=0.6
)

This produces approximately: - Energy: ~6,048 MWh - Carbon: ~2,056 tonnes CO2

Upper bound estimate (aggressive):

CarbonEstimator(
    gpu_type="A100",
    num_gpus=25000,
    training_hours=2880,  # 120 days
    cloud_region="us-east",
    pue=1.2,
    gpu_utilization=0.7
)

This produces approximately: - Energy: ~16,934 MWh - Carbon: ~5,758 tonnes CO2

With failed runs and hyperparameter search (multiplier): Industry estimates suggest that successful training runs account for only 30-60% of total compute; the remainder is consumed by failed runs, restarts, and hyperparameter optimization. Applying a 2x multiplier to account for this:

Lower estimate: ~4,100 tonnes CO2
Upper estimate: ~11,500 tonnes CO2

These estimates are consistent with the independent range cited in Section 34.3.3: 3,000 to 13,000 tonnes CO2.

Contextualizing the Numbers

At the midpoint of our range (~7,000 tonnes CO2):

Transatlantic flights: ~4,375 one-way flights (NYC to London)
Average US cars (annual): ~1,400 cars
Average US households (annual electricity): ~1,667 households
Trees needed to offset (one year): ~318,000 trees

These are substantial numbers. They represent a real environmental cost — borne primarily by communities near power plants and data centers — for a product whose benefits accrue primarily to OpenAI, Microsoft, and their users.

Analysis

The Non-Disclosure Problem

The most significant finding of this exercise is not the estimated carbon number — it is the fact that estimation is necessary at all.

In virtually every other industry with significant environmental impact, disclosure is either legally required or normalized through industry standards:

Manufacturing: Companies report emissions under the EPA's Greenhouse Gas Reporting Program (for facilities exceeding 25,000 metric tons/year) and increasingly under voluntary frameworks like the Greenhouse Gas Protocol.
Energy: Power plants report fuel consumption, generation, and emissions.
Aviation: Airlines report per-flight emissions, and regulators maintain comprehensive databases.
Construction: Environmental impact assessments are required for major projects.

AI model training — which can produce emissions comparable to a small factory — has no equivalent reporting requirement. The Accountability Gap identified throughout this textbook extends directly to environmental impact: organizations produce significant carbon emissions through model training and face no systematic obligation to measure, report, or mitigate them.

The Competitive Argument

OpenAI's justification for non-disclosure — competitive pressure — deserves scrutiny. The company argues that disclosing training details would give competitors an advantage. But environmental impact reporting does not require disclosing model architecture, training data, or algorithmic innovations. Reporting energy consumption and carbon emissions (which depend on hardware configuration, training duration, and data center location) reveals almost nothing about the intellectual property that gives a model its competitive edge.

The parallel to corporate financial reporting is instructive. Public companies are required to disclose detailed financial information — revenue, expenses, profits, liabilities — despite the competitive sensitivity of this information. The disclosure requirements exist because society has determined that the public interest in accountability outweighs the competitive cost of transparency. The same logic could be applied to environmental impact.

The Environmental Justice Dimension

The Power Asymmetry in GPT-4's environmental impact follows the pattern identified in Section 34.6:

Who benefits: OpenAI (revenue and valuation), Microsoft (cloud computing revenue and competitive positioning), GPT-4 users worldwide (productivity, creativity, access to information), and shareholders.

Who bears the costs: Communities near the power plants that generate electricity for Microsoft's data centers (air pollution, climate impact), communities near data centers (water consumption, noise, land use), communities in the Global South where hardware components are mined and e-waste is processed, and future generations who bear the consequences of climate change.

These two groups do not overlap. The benefits are concentrated among the most technologically connected and economically privileged. The costs are distributed among the least powerful and least connected.

Inference Emissions: The Ongoing Cost

The CarbonEstimator focuses on training emissions, but for a widely deployed model like GPT-4, inference emissions — the energy cost of running the model to serve user queries — may exceed training emissions within months.

Estimates suggest that GPT-4 processes hundreds of millions of queries per day. Each query requires a forward pass through the model, consuming energy proportional to the model's size and the length of the response. While a single query consumes a small amount of energy, the aggregate across hundreds of millions of daily queries is substantial.

This ongoing cost is significant for environmental governance because it means that the environmental footprint of GPT-4 is not a one-time cost but a continuous, growing expense that compounds with every user, every query, and every new application.

What Would Change With Disclosure?

If OpenAI were required to disclose GPT-4's environmental footprint, several outcomes would follow:

Accountability. The public, regulators, and investors would know the environmental cost of the product. This creates a basis for accountability that currently does not exist.

Comparison. Researchers and policymakers could compare the environmental efficiency of different models. Is GPT-4 more or less efficient per unit of capability than competing models? Without disclosure, this comparison is impossible.

Incentives. Disclosure creates incentives for efficiency. If environmental cost is visible, companies face reputational and regulatory pressure to reduce it. If it is invisible, there is no pressure.

Research direction. The AI research community could prioritize efficiency if carbon cost were reported alongside performance. Green AI's proposal to report compute budgets (Section 34.4.1) would become practical rather than aspirational.

Policy foundation. Regulators cannot govern what is not measured. Mandatory disclosure would create the data foundation necessary for carbon pricing, emissions caps, or other governance mechanisms applied to AI training.

Discussion Questions

The disclosure debate. OpenAI argues that competitive pressure justifies non-disclosure of training details. Evaluate this argument: Is environmental impact reporting meaningfully different from the technical details that give a model its competitive advantage? Could a reporting framework be designed that discloses environmental information without revealing proprietary model details?
Inference versus training. If inference emissions exceed training emissions within months of deployment, should environmental governance focus on training (a one-time cost) or inference (an ongoing cost)? How would you design a governance framework that addresses both?
The benefit-cost weighing. GPT-4 provides significant benefits to millions of users. Its environmental costs are real but difficult to compare directly to those benefits. How should society make this trade-off? Who should make the decision, and through what institutional mechanisms?
The estimation gap. This case study demonstrates that independent estimation can bracket the likely range of environmental impact. But estimation is no substitute for disclosure. What are the limits of estimation? What can only actual disclosure reveal?

Your Turn: Mini-Project

Option A: Your Own Estimate. Using the CarbonEstimator class, produce carbon footprint estimates for three recently released AI models (choose from Claude, Gemini, Llama, Mistral, or others). Document your parameter assumptions, justify your choices, and present your estimates with uncertainty ranges. Compare the estimates and discuss what they reveal about the environmental landscape of AI development.

Option B: Disclosure Framework Design. Draft a proposed environmental disclosure standard for AI model training. Specify: (1) what metrics must be reported, (2) how they should be measured and verified, (3) who should receive the reports, (4) what enforcement mechanisms should apply, and (5) how the standard would handle proprietary concerns. Write a two-page policy proposal.

Option C: Inference Calculator. Extend the CarbonEstimator class to estimate inference emissions. Your extension should take as input: model size (parameters), queries per day, average tokens per query, and hardware type. Estimate the annual carbon footprint of serving GPT-4-scale model queries at current estimated volumes. Compare inference to training emissions.

References

OpenAI. "GPT-4 Technical Report." arXiv:2303.08774, March 2023.
Strubell, Emma, Ananya Ganesh, and Andrew McCallum. "Energy and Policy Considerations for Deep Learning in NLP." Proceedings of the 57th Annual Meeting of the ACL (2019): 3645-3650.
Patterson, David, et al. "Carbon Emissions and Large Neural Network Training." arXiv:2104.10350, 2021.
Li, Pengfei, et al. "Making AI Less 'Thirsty': Uncovering and Addressing the Secret Water Footprint of AI Models." arXiv:2304.03271, 2023.
Schwartz, Roy, et al. "Green AI." Communications of the ACM 63, no. 12 (2020): 54-63.
International Energy Agency. "Electricity 2024: Analysis and Forecast to 2026." IEA, 2024.
The Information. "OpenAI's GPT-4: A Mixture of Experts Model." July 2023.