Case Study 31-1: Training GPT-4 and the Carbon Accounting Problem

Overview

When OpenAI released GPT-4 in March 2023, the announcement included detailed descriptions of the model's capabilities — its performance on standardized tests, its reasoning improvements over GPT-3.5, its multimodal abilities. It did not include the model's energy consumption, carbon footprint, or water use. This absence was not unusual for an AI product release; it was standard practice. But in the context of a technology that is rapidly scaling to consume meaningful fractions of national electricity grids, the absence of environmental disclosure from major AI companies represents one of the most significant accountability gaps in the AI ecosystem.

This case study examines what is known and unknown about the carbon cost of large language model training — using GPT-4 and its peers as the central examples — and what genuine carbon accountability for AI companies would require.

What We Know About Large Language Model Training Carbon Costs

The Patterson et al. Baseline

The most comprehensive academic study of large language model training carbon costs is Patterson et al. (2021), "Carbon Emissions and Large Neural Network Training," published by researchers from Google, UC Berkeley, and the University of Toronto. The paper analyzed the training carbon costs of several major models using a methodology based on published or estimated FLOP counts, hardware specifications, and grid carbon intensity data. Key findings:

GPT-3: Estimated at 552 metric tons CO2e, based on an estimated 3.14 × 10²³ training FLOPs, using Pacific Northwest grid carbon intensity where Microsoft (which provided OpenAI's computing infrastructure) operates major data centers.

Google's dense Transformer (GLaM): Estimated at substantially lower carbon cost due to a mixture-of-experts architecture that activates only a subset of parameters per inference, demonstrating that architectural choice significantly affects training carbon cost.

The hardware efficiency dimension: Patterson et al. demonstrated that cloud computing hardware typically has significantly lower carbon intensity than on-premise hardware, and that renewable energy sourcing can reduce training carbon costs by orders of magnitude.

The paper is important and valuable, but several limitations constrain its use as a comprehensive accountability framework:

All figures are estimates based on external analysis rather than disclosed internal measurements. The difference between the estimated figure (552 metric tons for GPT-3) and the true figure could be substantial. FLOP estimates themselves require inference about model architecture from publicly available information; architecture details that affect efficiency are not always public. Grid carbon intensity varies within regions and by time of day; using average regional intensity may significantly differ from actual carbon intensity at the specific data center during the specific training period.

What We Know About GPT-4

OpenAI has not published the following information about GPT-4: - The number of training FLOP operations performed - The model's parameter count - The data centers used for training - The duration of training - The energy consumed during training - The carbon emissions from training - The water consumed during training

What is publicly known comes primarily from leaked information, industry analysis, and OpenAI's own limited disclosures. Reports from industry sources suggest GPT-4 has substantially more parameters than GPT-3 (which had 175 billion) and that its training required significantly more computation — estimates from analysts range from 2× to 10× GPT-3's training cost. If these estimates are approximately correct, GPT-4's training carbon cost is likely in the range of 1,000–5,000 metric tons CO2e — roughly equivalent to the annual carbon footprint of 200–1,000 average Americans. This is a significant environmental impact by any standard, entirely unaccounted for in public disclosure.

The Inference Carbon Problem

Training carbon costs, while large for individual model training runs, are typically exceeded over a model's deployment lifetime by inference carbon costs — the cumulative energy consumed by billions of user queries.

ChatGPT achieved 100 million users within two months of launch — faster than any consumer technology in history. By mid-2023, estimates suggested ChatGPT was processing hundreds of millions of queries daily. At the estimated 0.001–0.01 kWh per inference (depending on model size and hardware), and at a grid carbon intensity of approximately 200g CO2/kWh (roughly the US average for data centers using some renewable energy), daily inference carbon costs range from tens to hundreds of metric tons CO2 per day — exceeding the training carbon cost within weeks or months of deployment.

OpenAI does not disclose inference energy consumption or carbon emissions. Microsoft, which operates the Azure infrastructure on which ChatGPT runs and which has made substantial renewable energy commitments, includes Azure's energy consumption in its aggregate sustainability reporting but does not report at the model or application level sufficient to calculate ChatGPT-specific carbon costs.

The Voluntary Disclosure Landscape

What Major AI Companies Actually Disclose

An examination of major AI companies' environmental disclosure reveals a systematic pattern of partial, aggregated, and unverifiable reporting:

OpenAI: As a private company as of 2024 (subsequent to early investor rounds that gave it significant private capital), OpenAI is not subject to SEC disclosure requirements. It publishes no systematic environmental disclosure. The OpenAI website includes some general language about sustainability goals but no specific energy, carbon, or water data.

Google/DeepMind: Google publishes the most detailed environmental reporting among major AI companies, including annual environmental reports with data center power usage effectiveness (PUE), total energy consumption, renewable energy percentage, and aggregate carbon emissions. Google has not, however, published model-level or application-level carbon cost data. Its 2023 environmental report noted that data center electricity consumption increased 17% year-over-year, attributing the increase primarily to AI workloads. Google has committed to 24/7 carbon-free energy by 2030, a more demanding standard than annual average renewable energy matching.

Microsoft: Microsoft publishes aggregate Scope 1, 2, and 3 emissions data and has made notable commitments (carbon negative by 2030, historical emissions removal by 2050). Its 2023 environmental report acknowledged that data center electricity demand grew substantially due to AI, and that this growth creates challenges for its 2030 carbon commitments. Microsoft has been more transparent than most AI companies about the tension between AI growth and sustainability commitments.

Meta: Meta reports aggregate data center energy use and renewable energy percentage. It has made claims of net zero operations since 2020, primarily through renewable energy certificates and carbon offsets — a weaker form of environmental credit than 24/7 carbon-free energy.

Amazon Web Services: Amazon reports aggregate energy consumption and renewable energy percentage across AWS and has committed to 100% renewable energy by 2025 and net zero by 2040. AWS hosts AI inference workloads for many companies including Anthropic (which received major AWS investment) and provides the infrastructure for many AI-as-a-service applications.

The Aggregation Problem

Even companies that disclose at the aggregate level — total data center energy consumption, total carbon emissions — do not provide the model-level or application-level disclosure that would enable meaningful assessment of specific AI systems' environmental impact. A sustainability-conscious organization considering whether to deploy a large language model for its employees cannot determine from public information how much carbon its queries will generate. A regulator trying to assess whether AI's environmental impact justifies intervention cannot determine from public information what that impact is.

The aggregation of all data center activities — web search, cloud storage, video streaming, AI inference, enterprise computing — into single corporate-level figures makes it impossible to attribute specific environmental costs to specific AI applications. This is convenient for companies that want to report improving efficiency metrics while growing AI workloads rapidly; it is inconvenient for accountability.

What Transparency Would Actually Require

A Framework for AI Carbon Disclosure

What would genuine carbon accountability for AI companies look like? A proposed framework:

Model-level disclosure: For each major AI model trained, publish the energy consumed (in kWh), the carbon intensity of the computing infrastructure (in gCO2/kWh by location and time period), and the resulting carbon footprint (in metric tons CO2e). This is analogous to ingredient labeling for food — not trade-secretly competitive information about model architecture but straightforward physical measurement.

Inference-level disclosure: For widely deployed AI applications, publish aggregate monthly inference energy consumption and carbon cost. This requires instrumentation of inference infrastructure for energy measurement — technically feasible and done internally by hyperscalers for cost management purposes.

Water disclosure: Publish water consumption at the facility level for data centers involved in AI training and inference. Include the geographic context (is this facility in a water-stressed region?) to enable meaningful assessment of local water impact.

Supply chain disclosure: Publish information about hardware supply chains, including the carbon embedded in chip manufacturing (Scope 3), responsible sourcing commitments for specialty minerals, and e-waste management practices.

Verification: Third-party verification of disclosed figures, using standardized methodology developed with input from environmental regulators, academic researchers, and civil society — analogous to financial auditing for financial reporting.

Regulatory Pressure Toward Disclosure

The regulatory trajectory is toward greater mandatory disclosure, though the timeline and specifics remain uncertain.

The EU Corporate Sustainability Reporting Directive (CSRD) requires double materiality disclosure — both financial risk and environmental impact — from large companies operating in the EU, phased in beginning 2024. AI companies with significant European operations will face CSRD requirements that mandate more comprehensive environmental disclosure than current voluntary reporting provides.

The SEC's climate disclosure rules, adopted in 2024 and subject to legal challenge, would require Scope 1 and 2 disclosure and, for companies that have made climate commitments, disclosure of progress toward those commitments. If implemented, these rules would make voluntary carbon commitments from AI companies more auditable.

Proposed EU AI Act implementing acts could establish specific energy and environmental requirements for AI systems, including disclosure obligations that go beyond current voluntary practice.

The GPT-4 Carbon Cost in Context

Comparisons and Proportionality

To assess the significance of GPT-4's training carbon cost, it helps to place it in several contexts:

Individual model comparison: A training carbon cost of 1,000–5,000 metric tons CO2e for GPT-4 — a reasonable estimate range given its likely scale relative to GPT-3 — represents 0.0001% of global annual CO2 emissions. In absolute terms, this is the annual carbon footprint of 200–1,000 Americans. For a technology deployed at global scale, this is not disproportionate relative to many comparable activities.

Industry trajectory: The relevant concern is not the carbon cost of a single model but the trajectory: dozens of companies training hundreds of models across the AI ecosystem, with frontier model sizes growing rapidly, and inference scaling as AI is deployed across more applications for more users. The aggregate is what matters environmentally, and the aggregate is unknown and growing.

Infrastructure investment: Microsoft's $10 billion investment in OpenAI, and the broader hyperscaler investment in AI infrastructure (data centers, power, cooling) measured in hundreds of billions of dollars, is creating physical infrastructure that will consume energy for decades. The carbon cost of this infrastructure over its operational lifetime — not just the carbon cost of training any single model — is the relevant environmental accounting.

The Honest Assessment

The honest assessment is that we do not know GPT-4's carbon cost because OpenAI has not disclosed it, that the methodology for estimating it is uncertain, that the inference carbon cost over GPT-4's deployment lifetime will substantially exceed the training cost, and that the regulatory framework for requiring disclosure is still developing. In the meantime, a significant global technology with meaningful environmental impact is being deployed without the transparency that would allow democratic assessment of whether that impact is justified.

Business Implications

Organizations deploying AI applications — internally or as products — bear environmental responsibility for the inference energy consumed by their AI use. This responsibility is currently invisible in most corporate carbon accounting: a company that deploys ChatGPT for its employees does not include those inference energy costs in its Scope 2 or Scope 3 emissions reporting.

As disclosure requirements become more comprehensive, this omission will become more visible and harder to defend. Organizations that anticipate mandatory Scope 3 disclosure should begin now to develop methodologies for measuring and disclosing the carbon cost of their AI use, rather than building AI-intensive business models that will be difficult to account for when disclosure becomes mandatory.

Discussion Questions

OpenAI is a private company with no mandatory environmental disclosure requirements in the United States. Should AI companies be required to disclose the carbon cost of training their models regardless of their public/private status? What regulatory mechanism could achieve this?
If we cannot currently measure GPT-4's carbon cost with precision, can we make rational decisions about whether its deployment is environmentally justified? What information would be sufficient for such a determination?
An individual user making 10 ChatGPT queries per day generates perhaps 0.01g of CO2 per day — an individually trivial amount. At 100 million daily users, this is 1,000 kg of CO2 per day. How should we think about individual vs. collective environmental responsibility for AI's carbon footprint?
Microsoft's carbon negative commitment creates explicit tension with its AI growth strategy, which it has acknowledged. How should corporate leaders navigate genuine conflicts between environmental commitments and growth objectives? What would integrity require?
The argument is sometimes made that AI's climate benefits (grid optimization, materials discovery, climate modeling) will outweigh its climate costs. What evidentiary standard should be required before accepting this argument as a justification for AI's current carbon footprint?