Exercises: Environmental Data Ethics and Climate

DataField.Dev

Exercises: Environmental Data Ethics and Climate

These exercises progress from concept checks to challenging applications. Estimated completion time: 3-4 hours. Exercises in Part C require Python.

Difficulty Guide: - Star-1 Foundational (5-10 min each) - Star-2 Intermediate (10-20 min each) - Star-3 Challenging (20-40 min each) - Star-4 Advanced/Research (40+ min each)

Part A: Conceptual Understanding (Star-1)

Test your grasp of core concepts from Chapter 34.

A.1. Section 34.1.1 states that data centers consumed approximately 460 terawatt-hours of electricity globally in 2024. Express this as a percentage of global electricity consumption and identify one country whose total electricity consumption is comparable.

A.2. Explain the concept of Power Usage Effectiveness (PUE) as described in Section 34.1.1. Why does a PUE of 1.58 (industry average) versus 1.10 (Google's reported average) matter for environmental impact calculations? Calculate how much additional energy a data center with PUE 1.58 uses compared to one with PUE 1.10 for the same computational workload.

A.3. Section 34.1.2 presents a table of carbon intensity by region. Explain why the same computation can produce vastly different carbon emissions depending on where it is performed. Then explain Dr. Adeyemi's observation that "location is a governance choice."

A.4. Define the "rebound effect" (Jevons paradox) as described in Section 34.4.3. Explain how it applies to Green AI efficiency improvements and why it means that technical efficiency alone cannot solve the environmental challenge of AI.

A.5. Section 34.5.2 raises governance questions about the integration of indigenous ecological knowledge into data systems. List three specific questions the section identifies and explain why each is important for environmental data justice.

A.6. Section 34.6.1 identifies four patterns of environmental injustice related to data infrastructure. Name each pattern and provide one specific example for each.

A.7. What is the Green AI movement (Section 34.4.1), and what three practices does it advocate? How does Green AI differ from what its authors call "Red AI"?

Part B: Applied Analysis (Star-2)

Analyze scenarios, arguments, and real-world situations using concepts from Chapter 34.

B.1. A technology company announces that it has achieved "carbon neutrality" for its data center operations by purchasing renewable energy credits (RECs) and carbon offsets. Using concepts from Section 34.1 and 34.6, critically evaluate this claim. Consider: (a) Does carbon neutrality through offsets address water consumption and e-waste? (b) Do RECs guarantee that the data center actually runs on renewable energy? (c) Does carbon neutrality address the environmental justice concerns about who bears the costs?

B.2. Mira asks: "VitraMed is training models that predict patient health risks. The models save lives. But training them produces carbon emissions. How do we weigh the health benefits against the environmental costs?" (Section 34.2.4). Develop a framework for making this trade-off. Your framework should include at least four criteria and should address Dr. Adeyemi's follow-up question: "Who gets to make that weighing? And who bears the consequences?"

B.3. Section 34.3.3 notes that OpenAI did not disclose the carbon footprint of training GPT-4. Argue for or against mandatory carbon disclosure requirements for AI model training, modeled on existing environmental reporting requirements in other industries. In your argument, address: (a) what information should be disclosed, (b) to whom, (c) what enforcement mechanisms should apply, and (d) what counterarguments exist.

B.4. The chapter describes the "dual role" of data systems (Section 34.5) — contributing to environmental harm through energy consumption while enabling environmental monitoring and climate science. Analyze a specific AI application for climate or environmental purposes (e.g., wildfire prediction, renewable energy optimization, species identification from camera traps) and evaluate whether its environmental benefits outweigh its computational costs. Be specific about both sides of the ledger.

B.5. Section 34.4.2 describes several technical approaches to reducing AI's environmental footprint: model compression, efficient architectures, transfer learning, data efficiency, and carbon-aware scheduling. For each technique, explain: (a) how it reduces environmental impact, (b) what trade-offs it involves (if any), and (c) whether it addresses the rebound effect.

B.6. Eli observes that "Indigenous peoples managed these ecosystems sustainably for thousands of years. Now their knowledge is being digitized... and used to train models that indigenous communities may never benefit from" (Section 34.5.2). Apply the CARE Principles from Chapter 32 to the specific case of digitizing indigenous ecological knowledge for AI-powered environmental monitoring. What governance mechanisms would ensure that this process serves indigenous communities rather than extracting from them?

Part C: Python & Computational Exercises (Star-2 to Star-3)

These exercises require Python and use the CarbonEstimator class from Section 34.2.

C.1. (Star-2) Basic Estimation. Using the CarbonEstimator class from Section 34.2.2, calculate the carbon footprint for the following training scenarios and compare them:

(a) Fine-tuning a model on 4 A100 GPUs for 48 hours in us-east
(b) The same fine-tuning on 4 A100 GPUs for 48 hours in canada-central
(c) Fine-tuning on 4 H100 GPUs for 24 hours in us-east (same total GPU-hours, newer hardware)

For each, report total energy (kWh), total carbon (kg CO2), and the number of transatlantic flights equivalent. Write a paragraph interpreting the differences.

C.2. (Star-2) Regional Comparison. Use the compare_regions method to compare the carbon emissions of training a model on 256 H100 GPUs for 14 days across all available regions. Identify the three lowest-carbon regions and the three highest-carbon regions. Calculate the ratio between the highest and lowest. Write a paragraph explaining what this ratio means for governance decisions about data center location.

C.3. (Star-3) Extending the Estimator. The CarbonEstimator class does not account for embodied carbon in GPU manufacturing. Modify the class to include an optional include_embodied_carbon parameter. When enabled, it should add an estimated 175 kg CO2 per GPU to the total emissions (representing the manufacturing footprint amortized over a 4-year GPU lifespan, applied proportionally to training duration). Recalculate Examples 1 and 2 from the chapter with this modification. How significant is embodied carbon relative to operational carbon?

C.4. (Star-3) Water Footprint Extension. Add a water_footprint method to the CarbonEstimator class that estimates water consumption based on energy usage. Use the approximation that data center cooling requires approximately 1.8 liters of water per kWh of energy consumed (a rough average across cooling technologies). The method should return total liters and a comparison to familiar equivalents (bathtubs at 190 liters each, Olympic swimming pools at 2,500,000 liters). Test with Example 2.

C.5. (Star-3) Carbon-Aware Scheduling Simulation. Write a function optimal_training_window that takes a list of hourly carbon intensity values (representing a 24-hour cycle of grid carbon intensity that varies with renewable energy generation) and a training duration in hours, and returns the optimal start time that minimizes total carbon emissions. Test with the following scenario: a grid that has carbon intensity of 400 gCO2/kWh at night (hours 0-6 and 20-24), 200 gCO2/kWh during peak solar (hours 10-16), and 300 gCO2/kWh during transition hours (hours 7-9 and 17-19). What is the optimal start time for a 6-hour training run?

C.6. (Star-3) GPT-4 Estimation Challenge. Section 34.3.3 states that independent estimates of GPT-4's training carbon footprint range from 3,000 to 13,000 tonnes of CO2. Using the CarbonEstimator class, determine a combination of parameters (GPU type, number of GPUs, training hours, and cloud region) that would produce a carbon estimate in this range. Explain your parameter choices and discuss the uncertainties involved.

Part D: Synthesis & Critical Thinking (Star-3)

These questions require you to integrate multiple concepts from Chapter 34 and think beyond the material presented.

D.1. The chapter describes the environmental costs of AI and the environmental benefits of AI (climate modeling, conservation, renewable energy optimization). Write a balanced analysis (300-500 words) that avoids both the narrative that "AI will solve climate change" and the narrative that "AI is destroying the planet." What governance frameworks would help maximize the environmental benefits while minimizing the costs?

D.2. Section 34.6 argues that the environmental costs of data infrastructure are not distributed equitably. Connect this argument to the data colonialism framework from Chapter 32 and the labor analysis from Chapter 33. How do the environmental costs of data infrastructure compound the inequalities identified in those chapters? Use specific examples to show how a single community might simultaneously bear the costs of digital redlining (Chapter 32), exploitative labor practices (Chapter 33), and environmental harm from data infrastructure (Chapter 34).

D.3. The CarbonEstimator class makes the environmental cost of model training visible. Dr. Adeyemi asks: "Who gets to make that weighing? And who bears the consequences?" (Section 34.2.4). Apply this question to the following scenario: A company in San Francisco trains a large language model in a data center in India (where carbon intensity is approximately 700 gCO2/kWh) because cloud computing costs are lower there. Analyze the Power Asymmetry and Accountability Gap in this scenario.

D.4. Strubell et al. (2019) argued that "carbon emissions should be reported alongside accuracy metrics" in machine learning publications (Section 34.3.1). Design a comprehensive "environmental impact card" that should accompany every published AI model. What metrics should be included? Who should verify them? Should models with high environmental costs be held to a higher standard of demonstrated benefit?

Part E: Research & Extension (Star-4)

These are open-ended projects for students seeking deeper engagement.

E.1. Data Center Environmental Justice Investigation. Research a specific data center project that has faced community opposition (examples: Meta's data center in Gallatin, Tennessee; Google's data center in The Dalles, Oregon; Microsoft's data centers in Arizona). Write a 1,000-word report covering: (a) the environmental concerns raised by the community, (b) the company's response, (c) the regulatory framework governing the project, (d) the environmental justice dimensions (who benefits, who bears the costs), and (e) the outcome.

E.2. E-Waste Supply Chain Analysis. Research the lifecycle of a GPU used for AI training — from mineral extraction through manufacturing, use, and disposal. Write a 1,000-word analysis covering: (a) the minerals required and where they are mined, (b) the manufacturing process and its environmental footprint, (c) the operational lifespan and energy consumption, (d) what happens when the GPU is retired, and (e) who bears the environmental and health costs at each stage of the lifecycle.

E.3. Carbon Reporting in Practice. Research how at least three major AI companies (Google, Microsoft, Meta, OpenAI, Amazon) report (or fail to report) the carbon footprint of their AI operations. Write a comparative analysis (800-1,200 words) evaluating: (a) what each company discloses, (b) what methodology they use, (c) what is notably absent from their reports, and (d) how their reporting compares to environmental reporting requirements in other industries (e.g., the SEC's proposed climate disclosure rules).

E.4. Environmental Monitoring and Indigenous Rights. Research a specific case where environmental monitoring technology (satellite observation, sensor networks, AI-powered analysis) has intersected with indigenous land rights. Write a 1,000-word case study analyzing: (a) what technology was deployed, (b) how it affected indigenous communities, (c) whether indigenous governance principles were respected, (d) what the outcome was, and (e) what lessons it offers for environmental data governance.

Solutions

Selected solutions are available in appendices/answers-to-selected.md.