Key Takeaways: Chapter 34 — Environmental Data Ethics and Climate

DataField.Dev

Key Takeaways: Chapter 34 — Environmental Data Ethics and Climate

Core Takeaways

Data systems are physical systems with material costs. Data centers consume approximately 2% of global electricity. AI model training requires thousands of megawatt-hours of energy, hundreds of thousands of liters of water, and hardware manufactured from rare earth minerals mined under often exploitative conditions. The "cloud" is not immaterial — it is someone else's computer, and that computer has a carbon footprint, a water footprint, and a hardware lifecycle that ends in electronic waste.
Carbon intensity varies dramatically by region — making location a governance choice. The same computation can produce 30x more carbon in a coal-powered region (South Africa: 900 gCO2/kWh) than in a hydro-powered region (Quebec: ~1.2 gCO2/kWh). When a company decides where to train a model, it is making a decision about carbon emissions — one shaped by tax incentives, land costs, and proximity to customers rather than environmental impact.
The CarbonEstimator class makes invisible costs visible. The Python tool built in this chapter takes training parameters — GPU type, number, duration, and location — and produces estimated carbon emissions with familiar equivalents (flights, car miles, household electricity years, trees needed to offset). Making environmental cost calculable and contextualizable is the first step toward governance.
Training large models produces emissions comparable to hundreds of transatlantic flights. Pre-training a large language model on 2,048 H100 GPUs for 30 days in Virginia produces approximately 270 tonnes of CO2 — equivalent to ~169 one-way transatlantic flights, ~64 years of average US household electricity, or ~12,000 tree-years of carbon absorption. These are not trivial numbers.
Carbon emissions should be reported alongside model performance metrics. Strubell et al.'s (2019) most lasting contribution was the argument that environmental cost should be part of how we evaluate AI research quality. A model that achieves marginal accuracy gains at massive environmental cost should face scrutiny that currently does not exist. Green AI advocates for efficiency as a first-class metric, not an afterthought.
Technical efficiency alone cannot solve the environmental challenge due to the rebound effect. More efficient models reduce per-unit computation costs, potentially leading to increased demand that offsets or exceeds the savings. If AI becomes cheaper, organizations may train more models and deploy AI in more contexts — increasing total emissions even as per-model emissions decline. Governance mechanisms (carbon pricing, reporting requirements, emissions caps) are necessary to ensure efficiency gains translate into actual reductions.
Data systems play a dual role — as environmental costs and environmental tools. Satellite observation, sensor networks, climate modeling, conservation AI, and renewable energy optimization depend on the same computational infrastructure that produces environmental harm. This duality complicates simple narratives: we cannot argue for eliminating data infrastructure without acknowledging the environmental monitoring it enables, nor can we ignore environmental costs by pointing to environmental benefits.
Environmental monitoring and indigenous knowledge must be governed under data sovereignty principles. Indigenous communities possess ecological knowledge developed over millennia that is increasingly recognized as essential for environmental management. The integration of this knowledge into data systems raises governance questions about control, credit, and benefit-sharing. Without explicit governance under CARE Principles, the digitization of indigenous knowledge risks replicating the extractive dynamics of data colonialism.
The environmental costs of data infrastructure are not distributed equitably. Data center siting burdens communities with less political power. E-waste processing exposes workers in the Global South to toxic materials. Rare earth mining displaces communities and contaminates environments in developing countries. Climate impacts fall disproportionately on the Global South. The benefits of AI accrue to wealthy companies; the costs are externalized to the least powerful.
Environmental data justice requires transparency, accountability, participation, and equitable distribution. Organizations should disclose the environmental costs of their data operations. They should be accountable through carbon pricing and extended producer responsibility. Affected communities should participate in decisions about data center siting and operations. And the benefits and costs of data systems should be distributed more equitably than they are today.

Key Concepts

Term	Definition
Carbon footprint	The total greenhouse gas emissions produced by an activity, expressed in equivalent tonnes of CO2 (tCO2e). For AI, this includes training energy, data center overhead, and potentially embodied carbon in hardware.
Carbon intensity	Grams of CO2 emitted per kilowatt-hour of electricity generated (gCO2/kWh). Varies dramatically by region based on the energy mix of the local grid.
PUE (Power Usage Effectiveness)	The ratio of total data center energy to computing equipment energy. A PUE of 1.1 means 10% overhead for cooling and infrastructure; the industry average of ~1.58 means 58% overhead.
Green AI	An approach to AI research that prioritizes computational efficiency alongside accuracy, advocating for reporting compute budgets and treating efficiency as a first-class metric.
Red AI	The trend toward ever-larger models that achieve marginal accuracy improvements at enormous and growing computational cost.
Model compression	Techniques (pruning, quantization, knowledge distillation) that reduce model size and computational requirements with minimal performance loss.
Knowledge distillation	Training a smaller "student" model to mimic a larger "teacher" model, transferring capability at reduced computational cost.
Carbon-aware scheduling	Timing training runs to coincide with periods of low grid carbon intensity (e.g., peak solar or wind generation).
Rebound effect (Jevons paradox)	The phenomenon where efficiency gains reduce costs and increase demand, potentially offsetting or exceeding the environmental savings.
E-waste	Electronic waste from retired hardware, containing toxic materials (lead, mercury, cadmium) and disproportionately processed in the Global South.
Environmental data justice	The principle that environmental costs of data infrastructure should be transparently reported, equitably distributed, and governed with participation from affected communities.
Water footprint	The volume of fresh water consumed by data center cooling, estimated at ~700,000 liters for training GPT-3 and ~500 ml per typical ChatGPT conversation.
CarbonEstimator	The Python dataclass tool from this chapter that estimates training emissions from GPU type, count, duration, and region.

Key Debates

Should AI model developers be required to disclose training carbon footprints? Arguments for: transparency enables governance, accountability, and efficiency incentives. Arguments against: competitive sensitivity, measurement complexity, potential to stifle innovation. The comparison to financial reporting requirements (which mandate disclosure despite competitive sensitivity) suggests disclosure is feasible.
Does Green AI address the fundamental problem? Efficiency improvements are necessary but may be insufficient due to the rebound effect. Are governance mechanisms (carbon pricing, emissions caps) needed alongside technical solutions?
How should we weigh environmental costs against environmental benefits? Climate models require enormous computation. Conservation AI protects ecosystems. Should these applications be exempt from carbon constraints? Who decides?
Is it ethical to train large models in high-carbon regions when low-carbon alternatives exist? A 91% reduction in emissions is available simply by training in Quebec instead of Virginia. Should companies be required to choose the lowest-carbon option, or is this a voluntary sustainability decision?

Applied Framework: Environmental Impact Assessment for AI

When evaluating the environmental impact of any AI system, work through these six steps:

Step	Action	Key Question
1. Measure	Calculate energy and carbon using tools like CarbonEstimator	How much energy does this system consume, and what carbon does it produce?
2. Compare	Evaluate alternatives (different hardware, regions, model sizes)	Could the same result be achieved with less environmental cost?
3. Contextualize	Express in familiar equivalents	How does this compare to flights, household electricity, trees needed?
4. Mitigate	Apply Green AI techniques and carbon-aware practices	What can be done to reduce the impact?
5. Distribute	Ask who bears the costs	Are environmental costs equitably distributed, or externalized to vulnerable communities?
6. Report	Disclose alongside performance metrics	Is the environmental cost visible to stakeholders and the public?

Looking Ahead

From the environmental impacts of data infrastructure, we turn to a uniquely vulnerable population. In Chapter 35, "Children, Teens, and Digital Vulnerability," we examine how data systems affect young people — from COPPA and the UK Age Appropriate Design Code to the mental health effects of algorithmic social media and the pandemic-era expansion of educational surveillance technology. VitraMed's pediatric health data introduces additional ethical complexity as Mira confronts the governance of data about children who cannot consent for themselves.

Use this summary as a study reference and a quick-access card for key vocabulary. The Environmental Impact Assessment framework applies to any AI system — from a fine-tuned classification model to a frontier language model — and should become standard practice in responsible AI development.