27 min read

Jordan Ellis is walking across Hartwell University's main campus on a Tuesday morning in October, backpack over one shoulder, late for Dr. Osei's class. The path cuts through a small woodland grove — the kind of carefully managed "natural area" that...

Chapter 22: Birdsong Monitoring and Environmental Surveillance

Opening: The Microphone in the Forest

Jordan Ellis is walking across Hartwell University's main campus on a Tuesday morning in October, backpack over one shoulder, late for Dr. Osei's class. The path cuts through a small woodland grove — the kind of carefully managed "natural area" that universities install between buildings to signal their relationship to the environment.

Jordan pauses. Something is mounted on a tree at approximately chest height: a small weatherproof housing, roughly the size of a hardcover book, bolted to the trunk with a data cable running up into the canopy and disappearing into a conduit along the building wall. A small LED blinks green. Jordan has walked this path hundreds of times and never noticed it before. They photograph it with their phone.

In class, Jordan asks Dr. Osei about it. The professor nods — she knows the device.

"That's part of the campus passive acoustic monitoring network," she explains. "It's picking up bird vocalizations for a long-term biodiversity study. The biology department has been running it for three years."

"So it's recording sound constantly?"

"Continuously. Audio files are uploaded to a server every hour."

Jordan thinks about this. "And it's picking up anything else? Not just birds?"

Dr. Osei pauses, and that pause is itself informative. "It's indiscriminate, yes. Whatever sounds are present. The algorithm filters for bird vocalizations. But the raw audio is all there."

Jordan thinks about the conversations they have had on that path. Arguments. Confessions. Phone calls taken privately because they didn't want to talk inside. The certainty that they were alone, or at least unmonitored, in a semi-wild space.

The bird-listening network was listening to everything.


22.1 Bioacoustics: The Science of Listening to Nature

The scientific field of bioacoustics is the study of sound production and reception in living organisms — how animals produce sound, what purposes sound serves, how sound communicates information within and between species. Its surveillance applications arise naturally from this scientific interest: if you want to understand an ecosystem, listening to it is one of the most efficient methods available.

Sound carries information that visual observation often cannot. A bird singing at the top of a tree is easy to hear and difficult to see. A whale communicating across hundreds of kilometers of ocean is acoustically present but visually absent. A bat navigating a cave uses echolocation invisible to human eyes. If you want to know which species are present in a habitat, in what abundance, at what times, and engaging in what behaviors, listening is often more efficient than looking.

What Sound Reveals

The acoustic signatures of biological systems reveal a remarkable range of ecological information:

  • Species presence and identity: Most bird and frog species have distinctive vocalizations. An audio recording of a rainforest canopy, processed by appropriate software, can identify which species are present without requiring a human observer to see each individual animal.

  • Abundance and density: Acoustic monitoring cannot directly count animals, but call rate (how often a vocalization is detected per hour) is correlated with population density in many species. Changes in call rate over time may indicate population trends.

  • Behavior: Birdsong specifically encodes behavioral information. Dawn song peaks (the "dawn chorus") indicate territorial establishment. Alarm calls indicate the presence of predators. Begging calls from nestlings indicate breeding activity. Contact calls maintain flock cohesion.

  • Ecosystem health: The acoustic complexity of a habitat — how many distinct acoustic "niches" in frequency and time are occupied — correlates with biodiversity. A biodiverse rainforest sounds different from a disturbed or recovering one. Ecoacousticians have developed indices (the Acoustic Complexity Index, the Bioacoustic Index) to quantify this relationship.

  • Human disturbance: When human-generated noise (traffic, construction, machinery) intrudes on natural habitats, it displaces or masks animal vocalizations. Acoustic monitoring can document this displacement, providing evidence of ecological impact from noise pollution.

💡 Intuition: The Spectrogram as Surveillance Record

A spectrogram is a visual representation of sound — time on the horizontal axis, frequency on the vertical axis, and intensity represented by color or darkness. When a bird sings, its vocalization appears in the spectrogram as a distinctive shape — species-specific, recognizable, identifiable by trained analysts or machine learning systems. Now consider: a human voice also produces a distinctive spectrogram pattern. The spectrogram does not distinguish between a bird calling to establish territory and a human calling to arrange a protest. Both are patterns in frequency and time. Both are equally visible to the algorithm.


22.2 Passive Acoustic Monitoring: The Architecture of Environmental Listening

Passive Acoustic Monitoring (PAM) refers to the deployment of autonomous recording devices in the environment to continuously capture sounds without active intervention. "Passive" distinguishes this from "active" acoustic methods like sonar, which transmit sound and record the echo. PAM devices simply listen.

The Hardware of PAM

A typical terrestrial PAM unit consists of:

  • Microphone array: One or more directional or omnidirectional microphones, weatherproofed, tuned to the frequency ranges of interest (for bird monitoring: typically 100 Hz to 12 kHz; for bat monitoring: up to 150 kHz ultrasound)
  • Recording system: Digital audio recorder capturing at appropriate sampling rates (44.1 kHz for audible frequencies)
  • Storage: SD cards or internal flash memory capable of holding weeks of recordings
  • Power: Solar panels, battery packs, or mains power depending on location
  • Data transmission: Increasingly, cellular or WiFi connectivity for real-time data upload; historically, physical retrieval of storage media

Process Diagram: A PAM System in Operation

Environment produces sound
        ↓
Microphone captures sound waves → Converted to digital audio signal
        ↓
Onboard computer records audio in time-stamped files
        ↓
[Option A: Offline processing]
Researcher retrieves storage media
Data transferred to analysis computer
        ↓
[Option B: Online processing]
Cellular/WiFi upload → Cloud server receives files in near-real time
        ↓
Algorithm processes audio:
  - Species identification
  - Call rate calculation
  - Acoustic index calculation
        ↓
Data entered into database
        ↓
Analysis: Population trends, species distribution, ecosystem health

PAM devices have become dramatically cheaper and smaller over the past decade. A commercially available AudioMoth recorder — a popular open-source PAM device developed by researchers at the University of Southampton — costs approximately $50 USD and is small enough to fit in a shirt pocket. Thousands have been deployed in research projects worldwide.

The Scale of Environmental Acoustic Surveillance

The cumulative scale of PAM deployment is remarkable. Individual research projects deploy dozens to hundreds of devices. The Ocean Biodiversity Information System (OBIS) and the Integrated Ocean Observing System (IOOS) include acoustic monitoring buoys at hundreds of sites across global oceans. The Rainforest Connection project has deployed acoustic monitors in more than 30 countries, using the sound data to detect illegal chainsaw operations in near-real time.

Acoustic buoys in ocean environments monitor whale populations across entire ocean basins. The Cornell Lab of Ornithology's Bioacoustics Research Program has maintained acoustic monitoring arrays in the Pacific and Atlantic for decades, detecting the vocalizations of endangered whale species — including species whose acoustic repertoires were entirely unknown before passive monitoring began.

The aggregated infrastructure of environmental acoustic surveillance is enormous — thousands of devices, monitoring forests, oceans, wetlands, and urban habitats, continuously, generating petabytes of audio data annually.

📊 Real-World Application: The Rainforest Connection Guardian Project

Rainforest Connection, a San Francisco–based nonprofit, deploys modified Android smartphones mounted in rainforest canopies to monitor for the sounds of illegal logging — chainsaws, trucks, and other mechanical sounds that indicate unauthorized clearing. When the system detects these sounds, it sends real-time alerts to rangers who can respond immediately. The project has been deployed in Ecuador, Sumatra, the Democratic Republic of Congo, and elsewhere. It represents a form of environmental surveillance with direct enforcement implications — acoustic monitoring used not just to document ecological change but to trigger active intervention. The line between environmental monitoring and enforcement-oriented surveillance is crossed.


22.3 The Cornell Lab and eBird: Citizen Science as Surveillance Network

The largest bird observation database in the world is not produced by government agencies, academic researchers, or satellite systems. It is produced by ordinary birdwatchers — millions of them — through a smartphone application.

eBird, developed by the Cornell Lab of Ornithology and launched in 2002, is a citizen science platform through which amateur and professional birdwatchers submit observations of bird species: what they saw, how many, at what location, on what date. As of 2024, the eBird database contains more than 1.4 billion bird observations from more than 800,000 contributors globally, growing at a rate of more than 100 million observations per year.

eBird as Surveillance Infrastructure

eBird is, functionally, a distributed surveillance network. Its nodes are human observers — birdwatchers — who carry smartphone cameras and GPS receivers, systematically observe and record the presence of species at specific locations, and upload their observations to a central database. The Cornell Lab processes this data, corrects for observer bias (more observers in some areas than others), and produces population trend estimates for hundreds of bird species globally.

The word "surveillance" is not typically applied to eBird, and participants would likely object to the characterization. They are engaging in a recreational activity they love, contributing to science, and sharing observations with a community of enthusiasts. They do not think of themselves as nodes in a surveillance network.

But the structural features are identical to surveillance systems we do apply that label to: - Systematic, repeated observation of a defined population (bird species) - Spatially and temporally explicit records of presence - Central data aggregation for pattern analysis - Trend detection over time - Data used to inform management decisions (protected area designation, hunting regulations, conservation priorities)

The only thing missing that distinguishes eBird from a surveillance system in the pejorative sense is the element of coercive power — eBird's subjects (birds) cannot be harmed by having their presence documented. But structurally, the architecture is identical.

🔗 Connection: Citizen Science and the Synoptic Gaze

eBird inverts the panoptic model. Rather than one central observer watching many subjects, it is many distributed observers — the 800,000 eBird contributors — watching a distributed subject population (the world's birds). This is the synoptic model (many watching) rather than the panoptic (one watching many). The chapter on synopticism (Chapter 4) introduced this concept; eBird is a clean empirical example of synoptic monitoring at scale. The question is whether the synoptic model is more or less problematic than the panoptic one — a question that gets complicated when the surveillance turns from birds to people.

From eBird to People-Watching

The methodological toolkit of eBird — geolocation, time-stamping, repeated observation, pattern analysis, population modeling — is directly transferable to human surveillance. Replace "bird species" with "political organizations." Replace "eBird contributors" with "police informants." Replace "Cornell Lab" with a fusion center. The architecture is the same.

This is not a theoretical concern. Law enforcement agencies have used citizen-produced observation data — photographs submitted to databases, videos posted to social media, witness reports compiled through apps — as inputs to surveillance and enforcement systems. The line between "citizen science" and "crowdsourced surveillance" is thinner than it appears.


22.4 Machine Learning and BirdNET: The Algorithm That Listens

Processing the enormous volumes of audio data produced by PAM systems manually is impossible — a single AudioMoth recorder running for a week produces approximately 84 hours of audio. A network of 50 recorders produces 4,200 hours per week. No research team has the human capacity to listen to that volume of data.

The solution is machine learning — specifically, acoustic classification systems trained to identify species from their vocalizations.

BirdNET

BirdNET is an AI-based bird sound recognition system developed by the Cornell Lab of Ornithology and the Chemnitz University of Technology. Released in 2019 and continuously improved, BirdNET can identify more than 6,000 bird species from audio recordings — covering most of the world's bird fauna. The system processes a 3-second audio clip in approximately 0.1 seconds, enabling the analysis of thousands of hours of PAM data in practical timeframes.

BirdNET is also available as a free smartphone app, enabling any user to record bird sounds and receive immediate species identification. As of 2024, the app has been downloaded more than 3 million times.

How BirdNET Works (Process Diagram):

Input: Audio file or stream (WAV, MP3, etc.)
        ↓
Preprocessing: Convert to spectrogram
  (Visual representation of frequency over time)
        ↓
Segmentation: Divide spectrogram into 3-second windows
        ↓
Neural network analysis:
  Convolutional neural network processes each spectrogram segment
  Trained on 500,000+ labeled bird sound recordings
        ↓
Output: Species identification with confidence score
  (e.g., "American Robin: 0.94 confidence")
        ↓
Filtering: Confidence threshold applied
  (Low-confidence identifications flagged for human review)
        ↓
Database entry: Species, time, location, confidence score

Evidence Evaluation: When to Trust BirdNET

BirdNET's accuracy varies significantly by context:

Condition Performance
Common North American species, clear recordings High (>90% accuracy)
Rare or poorly represented species Lower (variable)
High background noise Reduced (false positives increase)
Multiple species calling simultaneously Reduced
Geographic regions underrepresented in training data Significantly reduced

This variability matters for using BirdNET data in research contexts — a point that generalizes to all AI-based surveillance systems. Machine learning systems trained on particular datasets perform well on data similar to their training data and less well on novel or underrepresented cases. Understanding the training data composition is essential for evaluating system performance.

🎓 Advanced: Transfer Learning and Acoustic Surveillance

BirdNET's architecture — a convolutional neural network trained on spectrogram images — is identical to the architecture used in many human surveillance applications. Facial recognition systems analyze visual patterns in face images; BirdNET analyzes visual patterns in sound spectrograms. The mathematical operations are the same; only the training data differs. This means that the technical infrastructure of environmental acoustic AI — the same model architectures, training pipelines, and deployment platforms — can be directly repurposed for human acoustic surveillance. A system trained to identify bird species can, with different training data, be trained to identify individual human voices. This technical continuity between environmental and human AI surveillance is not a future concern but a present reality.


22.5 Beyond Birds: The Full Spectrum of Environmental Acoustic Surveillance

Passive acoustic monitoring is not limited to birds. The technology is used to monitor a remarkable range of biological and environmental phenomena.

Marine Acoustic Monitoring

The ocean is acoustically rich. Whales, dolphins, fish, shrimp, and other marine animals produce sounds across a wide frequency range. The sounds of shipping, sonar, seismic exploration, and offshore wind construction add a human acoustic layer. Acoustic buoys and fixed hydrophone arrays monitor marine acoustic environments for:

  • Endangered whale populations: The North Atlantic right whale, with fewer than 400 individuals remaining, is monitored by an extensive acoustic network that detects individual whale calls and estimates population location and movement
  • Fish populations: Many commercially important fish species produce spawning calls; passive acoustic monitoring can detect spawning aggregations
  • Marine mammal behavior: Feeding, communication, and navigation behaviors are encoded in vocalizations
  • Human disturbance: Shipping noise levels, sonar pulses, and seismic survey sounds are documented to assess their impact on marine life

Bat Monitoring

Bats use ultrasonic echolocation for navigation and prey detection — calls above the threshold of human hearing. Bat acoustic monitors (using ultrasonic microphones) can detect bat activity, identify species from their echolocation pulse characteristics, and monitor population trends. Because many bat species are of conservation concern and because bats serve crucial ecosystem functions (insect pest control, pollination, seed dispersal), bat acoustic monitoring is now a required component of environmental impact assessments for wind energy development and other land use changes in many jurisdictions.

Soundscape Ecology

Soundscape ecology — pioneered by Bernie Krause and elaborated by researchers including Almo Farina and Bryan Pijanowski — treats the acoustic environment of a habitat as a holistic indicator of ecological condition. Rather than monitoring individual species, soundscape ecologists analyze the total acoustic output of a landscape, measuring its complexity, diversity, and structure.

The central concept is the acoustic niche hypothesis: in a biodiverse ecosystem, different species occupy different acoustic frequencies and times of day, much as species occupy different spatial and dietary niches. A degraded ecosystem loses acoustic diversity as species disappear, and this loss is measurable before it becomes visually apparent.

Soundscape ecology has policy implications: acoustic data can serve as early warning systems for habitat degradation, and acoustic restoration targets can guide ecological restoration efforts.

🌍 Global Perspective: Indigenous and Traditional Acoustic Knowledge

Before formal bioacoustics emerged as a scientific discipline, indigenous communities around the world had developed sophisticated frameworks for interpreting environmental sound. The Kaluli people of Papua New Guinea understand the forest through its acoustic character — specific bird calls signal the presence of certain food sources, seasonal changes, and spiritual presences. Indigenous rangers in Australia incorporate bird call interpretation into land management practices. Passive acoustic monitoring technology intersects with this knowledge in complex ways: it may validate and quantify what indigenous observers have long known, or it may marginalize traditional knowledge by privileging data that can be processed by algorithms over knowledge that cannot be reduced to a dataset.


22.6 When Environmental Listening Turns on People: ShotSpotter

The transition from environmental acoustic surveillance to human acoustic surveillance is not hypothetical. It has happened. The most significant example is ShotSpotter (now rebranded as SoundThinking) — a gunshot detection system that deploys acoustic sensors in urban neighborhoods to identify and locate the sound of gunfire.

How ShotSpotter Works

ShotSpotter deploys arrays of acoustic sensors — small devices mounted on light poles, buildings, and other urban infrastructure — in neighborhoods designated as "high-crime" by police departments. The sensors continuously record audio. When the system detects a loud impulsive sound, an algorithm evaluates whether it resembles a gunshot, based on acoustic characteristics (frequency spectrum, duration, decay rate). If the algorithm classifies the sound as a probable gunshot, it:

  1. Triangulates the location of the sound source using time-difference-of-arrival (TDOA) analysis — the same technique used in cetacean acoustic monitoring to locate whale positions
  2. Sends an alert to police dispatch with a location estimate (typically within 25 meters)
  3. Provides a brief audio clip of the detected sound for review
  4. In many cities, triggers an automatic police dispatch to the indicated location

The system is deployed in more than 100 U.S. cities, with the highest concentrations in major metropolitan areas including New York, Chicago, and San Francisco.

Process Diagram: ShotSpotter Alert Sequence

Urban acoustic sensor array (continuous recording)
        ↓
Impulsive sound detected
        ↓
Algorithm evaluation:
  Frequency spectrum analysis
  Duration/decay profile
  Comparison to trained gunshot model
        ↓
[If confidence > threshold]
        ↓
Multiple sensors detect same event:
  TDOA analysis → Location estimate
        ↓
Human review (in some implementations) OR
Automatic alert generation
        ↓
Alert transmitted to police dispatch
  - Location
  - Audio clip
  - Confidence score
        ↓
Police dispatch → Officers respond

The ShotSpotter Controversy

ShotSpotter has become one of the most contested urban surveillance technologies in the United States, with criticisms falling into several categories:

Accuracy and False Positives

Multiple independent analyses have found ShotSpotter's accuracy lower than the company claims. A 2021 investigation by the MacArthur Justice Center in Chicago found that 89% of ShotSpotter alerts in Chicago over a 21-month period resulted in no evidence of a shooting — no spent casings, no physical evidence, and no victim. The company disputes this characterization, arguing that many gunshots go unreported and leave no physical evidence at the scene.

The core problem is definitional: ShotSpotter's algorithm is trained to identify sounds that resemble gunshots. Many sounds resemble gunshots — fireworks, backfiring vehicles, construction equipment. In an urban environment with continuous acoustic complexity, the false positive rate is difficult to eliminate entirely.

⚠️ Common Pitfall: Algorithmic Authority in Life-or-Death Contexts

When an algorithm classifies an ambiguous sound as a probable gunshot, the resulting police dispatch is a real-world event: officers respond, arriving at a location with heightened alertness and weapons drawn. In several documented cases, ShotSpotter alerts have led to police stops of innocent people in the vicinity of an alert location. The algorithm's classification creates police action; the police action has consequences for people present regardless of whether a gunshot actually occurred. This is the problem of treating algorithmic output as authoritative in high-stakes contexts — the same problem arises in predictive policing, facial recognition, and automated content moderation.

Geographic Concentration and Racial Disparity

ShotSpotter is deployed almost exclusively in neighborhoods that are predominantly Black or Latino. This geographic concentration reflects both crime pattern data and historical patterns of police resource allocation — neighborhoods that have historically received more policing are designated "high-crime" in part because of that heavier policing, creating a feedback loop in which surveillance follows surveillance.

The result is that residents of predominantly Black and Latino neighborhoods are acoustically monitored in ways that residents of white neighborhoods are not. The ambient sounds of daily life — arguments, celebrations, vehicles — are captured and processed in some neighborhoods and not others, creating an asymmetry in acoustic surveillance that maps almost precisely onto existing racial inequalities in policing.

🔗 Connection: Social Sorting Through Acoustic Geography

The geographic concentration of ShotSpotter deployments is a precise example of social sorting — using surveillance technology to divide populations into different categories receiving different treatment. Chapter 2's discussion of social sorting established this as a fundamental surveillance mechanism. ShotSpotter extends that sorting into the acoustic domain: neighborhoods deemed "high-crime" are acoustically monitored; neighborhoods deemed "low-crime" are not. The sorting criterion is ostensibly crime rate, but the deployment pattern follows race and class geography with enough precision that the distinction is largely academic.

Evidentiary Use and Algorithmic Manipulation

A 2021 investigation by the independent technology publication Motherboard documented that ShotSpotter analysts had, in at least one case, altered their audio classification retroactively at the request of a prosecutor — reclassifying a sound as a gunshot after the fact to support a criminal prosecution. This incident raised fundamental questions about the integrity of algorithmic evidence in criminal proceedings: if the classification can be changed after the fact by human analysts, it is not the objective machine output it is presented as. It is a human judgment, with all the susceptibility to bias and error that human judgment involves, dressed in the authority of algorithmic objectivity.


22.7 The Continuity Between Environmental and Human Acoustic Surveillance

The ShotSpotter case is not an anomaly. It is the clearest expression of a structural continuity that runs through this entire chapter: the infrastructure built to listen to the natural world is technically indistinguishable from the infrastructure built to listen to people.

Consider the following parallels:

Environmental Acoustic Surveillance Human Acoustic Surveillance
PAM units on trees in forests Acoustic sensors on poles in urban neighborhoods
Continuous audio recording Continuous audio recording
Algorithm identifies species from spectrogram Algorithm identifies gunshots from spectrogram
Triangulation to locate whale position Triangulation to locate gunshot position
Alert sent to conservation ranger Alert sent to police dispatcher
Data aggregated in central database Data aggregated in central database
Used to inform management decisions Used to inform policing decisions

The only differences are: 1. The target population (birds vs. people) 2. The institutional context (ecological research vs. law enforcement) 3. The power relationship (no coercive authority over birds; significant coercive authority over people)

The Jordan Scenario: Campus Listening

Jordan's realization about the campus PAM network — that it listens indiscriminately — is not just conceptually interesting. It has practical implications.

The Hartwell University Biology Department's PAM units are designed to listen for birds. The algorithm they use (based on BirdNET) processes audio to identify bird species. The raw audio files — uploaded hourly to a university server — contain everything the microphone recorded during that hour. In the grove between the science building and the social sciences wing, on a Tuesday morning in October, those raw files contain:

  • Dawn chorus activity of five resident bird species
  • A bicycle passing on the path
  • Two undergraduates arguing about a class assignment
  • Jordan's phone call to their mother, half of which was spent discussing a family financial difficulty
  • A facilities worker discussing their employment situation with a colleague

The raw audio is on a university server. The biology department has a data management policy — but the policy governs access for research purposes. It does not clearly address subpoenas, law enforcement requests, or Title IX investigations. Hartwell is a fictional university, but the scenario it represents is real: passive acoustic monitoring systems on campuses, in parks, and along nature trails capture ambient conversation as an incidental byproduct of environmental monitoring, with governance frameworks that have not anticipated this feature.

Best Practice: Privacy-by-Design in Environmental Acoustic Monitoring

Several approaches can reduce the privacy risks of PAM systems while preserving their ecological value:

  1. On-device processing: Instead of uploading raw audio, process audio on the device and upload only species identification data — never the raw sound file. This eliminates the raw audio archive that could be accessed by non-research parties.

  2. Frequency filtering: Configure microphones to record only above human voice frequency ranges (above 3 kHz for most bird monitoring), which would capture most bird vocalizations while excluding intelligible human speech.

  3. Privacy impact assessments: Before deploying PAM systems in areas with regular human presence, conduct formal privacy impact assessments that address data retention, access controls, and legal requests.

  4. Clear data governance policies: Establish explicit policies governing law enforcement requests, retention periods, and access controls for raw audio, not just processed ecological data.


22.8 Camera Traps and GPS Collaring: The Visual and Spatial Dimensions of Wildlife Surveillance

Acoustic monitoring is the most privacy-relevant form of environmental surveillance, but it is not the only form. Two additional technologies — camera traps and GPS collaring — extend environmental monitoring into visual and spatial domains with direct parallels to human surveillance.

Camera Traps

Camera traps are motion-triggered cameras deployed in wildlife habitats to photograph or video animals without human presence. A researcher can deploy 50 camera traps across a forest landscape, leave them for a month, retrieve them, and have tens of thousands of photographs of wildlife activity. The technology has transformed wildlife ecology: population estimates, behavioral observations, and distribution maps that previously required enormous field effort can now be generated with modest investment.

The structural parallel to CCTV is exact: motion-triggered cameras, deployed in specific locations, recording all motion events, producing a visual archive that can be analyzed for behavioral patterns. Camera traps photograph animals. CCTV cameras photograph people. The technology is identical.

Camera traps also photograph people — hikers, poachers, rangers, and others who pass through their field of view. In some applications, this is intentional: anti-poaching camera networks use human detection algorithms to identify potential poachers entering protected areas. The surveillance of animals and the surveillance of people occur on the same device.

GPS Collaring

Radio telemetry and GPS-based tracking have been central to wildlife ecology since the 1960s. Animals are captured, fitted with transmitters or GPS collars, and released; their movements are then recorded remotely. GPS collars on large mammals — wolves, elephants, whales fitted with acoustic tags — report location data at intervals ranging from minutes to days, enabling researchers to track individual animals' home ranges, migration routes, and behavioral patterns.

The structural parallel to location tracking of people is precise: GPS device attached to subject, continuous location data transmitted to central server, analysis of movement patterns, behavioral inference from location data. The difference between tracking a wolf and tracking a person is institutional and ethical, not technological.

📝 Note: The Ethics of Environmental Surveillance vs. Human Surveillance

The chapter has repeatedly noted that environmental surveillance raises fewer ethical concerns than human surveillance, and it is worth examining why. The primary reason is consent: animals cannot consent to being monitored, and this fact makes consent-based ethical frameworks inapplicable. We do not require wolves to consent to GPS collaring because wolves cannot enter agreements or understand the implications of data collection.

But "consent is inapplicable" is not the same as "no ethical obligations apply." Environmental surveillance raises genuine ethical questions: Does the stress of capture for collaring harm the animal? Does the collar itself interfere with behavior or survival? Do the management decisions informed by surveillance data serve the animal's interests or only human interests? Are there indigenous land rights considerations in where and how monitoring is conducted?

The gap between environmental surveillance ethics (consent inapplicable, but other obligations apply) and human surveillance ethics (consent is central) is precisely the gap that function creep exploits. When environmental monitoring infrastructure is turned on humans, it crosses from a domain where consent is inapplicable to a domain where it is essential — but the infrastructure does not change. Only the subject does.


22.9 The Acoustic Landscape of Power

To close this chapter, it is worth stepping back to ask a structural question: what does it mean that we have built a global listening infrastructure — for birds, for whales, for gunshots, for weather — and that this infrastructure is technically indistinguishable from the infrastructure we might build to listen to people?

The answer has two parts.

First: The normalization of listening as an environmental management practice has prepared cultural ground for the normalization of listening as a social management practice. If we are comfortable with thousands of microphones in forests, parks, and ocean buoys continuously recording everything they detect, we have already accepted the principle that recording ambient sound without individual notification is acceptable. ShotSpotter extends this principle from the forest to the urban neighborhood. The step is smaller than it appears.

Second: The technical continuity means that the same organizations that build and operate environmental monitoring infrastructure are potential vendors for human surveillance applications. The expertise, the hardware, the software, and in some cases the actual devices are the same. The boundary between "environmental" and "human" surveillance is a policy boundary, not a technical one. Policy boundaries are changeable in ways that technical ones are not.

Jordan's discomfort when they learn that the campus bird-monitoring network records everything is not naive or excessive. It is an accurate reading of a structural situation: the infrastructure of environmental listening exists, it is indiscriminate, and its governance has not caught up with its capabilities.

The next chapter examines weather surveillance — the oldest and most fully institutionalized form of environmental monitoring. As we will see, weather observation networks provide yet another template for the relationship between legitimate environmental monitoring and the surveillance of human populations.


22.10 Summary

Passive acoustic monitoring began as a scientific tool for understanding bird populations and ecosystem health. It has grown into a global surveillance infrastructure of thousands of devices, automated analysis systems, and centralized data repositories. Machine learning systems like BirdNET have made it possible to process enormous audio archives automatically, enabling population monitoring at scales previously impossible.

This infrastructure is technically continuous with human acoustic surveillance. ShotSpotter is not a metaphor for the risks of environmental monitoring technology — it is a direct application of the same hardware and algorithmic approaches to urban law enforcement. The transition from environmental to human surveillance does not require new technology; it requires only a change in subject and institutional context.

The ethical implications are significant. Environmental acoustic monitoring systems deployed in areas of human activity capture human conversation as an incidental byproduct. The governance frameworks for this data — who can access it, for what purposes, under what legal authorities — are largely undeveloped. The gap between monitoring capability and governance adequacy is growing.


Key Terms

Passive Acoustic Monitoring (PAM): The deployment of autonomous recording devices to continuously capture environmental sounds without active intervention.

Bioacoustics: The scientific study of sound production and reception in living organisms, including applications to population monitoring and ecosystem assessment.

Spectrogram: A visual representation of sound that plots time against frequency, with intensity shown by color or brightness; the input format for most machine learning acoustic classification systems.

BirdNET: An AI-based acoustic classification system developed by Cornell Lab of Ornithology and Chemnitz University of Technology, capable of identifying more than 6,000 bird species from audio recordings.

eBird: A citizen science platform through which birdwatchers submit bird observations to the Cornell Lab of Ornithology; as of 2024, containing more than 1.4 billion observations — the world's largest biodiversity dataset.

Acoustic ecology: The study of the relationships between living organisms and their acoustic environment; measures ecosystem health through sound complexity.

ShotSpotter: An urban acoustic sensor system that detects and localizes gunshot sounds and alerts police dispatch; deployed in more than 100 U.S. cities primarily in majority-Black and majority-Latino neighborhoods.

Time-difference-of-arrival (TDOA): A technique for determining the location of a sound source by comparing the time at which the sound arrives at multiple sensors at known positions; used in both marine mammal acoustic monitoring and ShotSpotter gunshot localization.