Chapter 25: Urban Sensors and Smart City Infrastructure

DataField.Dev

26 min read

It is a Saturday afternoon in early November. Jordan Ellis has been assigned a project for Dr. Osei's class: document every surveillance technology you encounter during one hour in Hartwell's newly designated "Smart Mobility District" — a six-block...

In This Chapter

Opening: Jordan Maps the Smart District
25.1 What Is a "Smart City"?
25.2 The Sensor Layer: What Is Watching You
25.3 Sidewalk Toronto: A Cautionary Tale
25.4 Smart Streetlights, V2I, and the Integrated Sensing Network
25.5 The Fusion Center: When All This Data Comes Together
25.6 Who Owns Smart City Data?
25.7 Privacy by Design in Urban Infrastructure
25.8 Python Analysis: Understanding Urban Pedestrian Data
25.9 Jordan's Synthesis: From List to System
25.10 Summary
Key Terms

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 25: Urban Sensors and Smart City Infrastructure

Opening: Jordan Maps the Smart District

It is a Saturday afternoon in early November. Jordan Ellis has been assigned a project for Dr. Osei's class: document every surveillance technology you encounter during one hour in Hartwell's newly designated "Smart Mobility District" — a six-block area downtown where the city has installed what the marketing materials call "next-generation urban intelligence infrastructure."

Jordan carries a notebook and their phone, which has the camera running. They start at the corner of Main and Elm.

At the intersection: a traffic signal with a camera embedded in its housing. A small box mounted on the signal arm, probably a traffic sensor. On the light pole — a secondary device Jordan doesn't immediately recognize, with a cellular antenna. On the building across the street: a CCTV dome camera covering the intersection approach.

Jordan writes it down and walks half a block.

A parking lot entrance: a camera with a LIDAR unit alongside it, pointed at the parking approach. An LED display showing available spaces. Under the canopy of a nearby bus shelter: a screen with real-time bus arrival information, and mounted above it, a small box that Marcus once explained is a WiFi probe detector — it captures the MAC addresses of smartphones that broadcast probe requests looking for familiar WiFi networks.

Jordan stops at the bus shelter and looks up at the box. Their phone is in their pocket. Its WiFi is on. Right now, Jordan realizes, this device is capturing a hardware identifier from their phone. They are being passively logged at this bus stop.

They continue walking, counting. By the end of the hour, Jordan has documented: - 14 CCTV cameras (dome, fixed, PTZ) - 6 traffic sensors (inductive loops, radar, or camera-based) - 4 license plate readers (two on roadway gantries, two on light poles) - 2 LIDAR-equipped parking sensors - 3 WiFi probe detectors (bus shelters) - 2 ShotSpotter-type acoustic sensors (mounted on lamp posts) - 1 air quality sensor (mounted on a building) - 1 weather/climate station (rooftop)

In six blocks. On a Saturday afternoon.

Jordan looks at the list. None of these devices required Jordan's consent. None of them sent Jordan a notification. The city's privacy policy — which Jordan had to specifically search for online — mentions "smart city infrastructure" in one paragraph and describes it as "data collected in aggregate form for transportation management and public safety purposes."

Jordan is not aggregate. Jordan is a person with a face, a phone, a car, and a history of political involvement. And every one of these devices was watching them walk down the street.

25.1 What Is a "Smart City"?

The phrase "smart city" entered mainstream urban planning discourse in the early 2010s, promoted by technology companies — IBM's "Smarter Cities" initiative, Cisco's "Smart+Connected Communities," Siemens' "Crystal" concept — as a vision of urban management transformed by data, connectivity, and automated analysis.

The core claim of smart city advocates is efficiency: by instrumenting urban infrastructure with sensors and connecting those sensors to data platforms, cities can manage traffic more smoothly, use energy more efficiently, respond to emergencies more rapidly, and allocate public resources more effectively. The technology, in this framing, is neutral infrastructure — analogous to paving roads or installing sewers.

This framing has been accepted to varying degrees by city governments, which have deployed billions of dollars of smart city technology with varying levels of understanding of what they were installing, how data would be managed, and who would own it.

The critique of smart city discourse — articulated by scholars including Shannon Mattern, Adam Greenfield, and Rob Kitchin — is that the "efficiency" framing obscures the political dimensions of smart city infrastructure. Every design choice about which data to collect, how to process it, who can access it, and for what purposes encodes values and power relationships. A traffic sensor that counts vehicles and a license plate reader that records their identities are both "traffic infrastructure" — but they have vastly different implications for privacy, accountability, and civil liberties.

💡 Intuition: Infrastructure Is Not Neutral

When the city installs a sewage pipe, the pipe has no opinion about whose waste it carries. When the city installs a sensor that records license plates, it is making a series of choices: to record vehicles rather than just count them; to store this data for some period; to allow specific parties to access it; to use it for specific enforcement purposes. These choices are political, not technical. The hardware itself is neutral — the license plate reader is just a camera. But the choices about what to read, what to store, who can access it, and what to do with it are choices that determine whose interests the infrastructure serves. Smart city technology makes infrastructure political in ways that sewage pipes do not.

25.2 The Sensor Layer: What Is Watching You

The "sensor layer" of a smart city refers to the physical devices embedded in urban infrastructure that collect data about the environment and its inhabitants. Jordan's six-block audit identified the core components:

Traffic Sensors

Traffic sensors count, classify, and in some cases identify vehicles and pedestrians. Several technologies are deployed:

Inductive loops: Metal coils embedded in pavement that detect the electromagnetic signature of passing vehicles. They count vehicles and can detect their presence at intersections. They cannot identify, photograph, or track vehicles.

Video-based traffic sensing: Cameras analyze traffic images to count vehicles, measure speeds, detect congestion, and classify vehicle types (cars, trucks, motorcycles, pedestrians, cyclists). Modern systems use computer vision to do this automatically without human review.

LIDAR sensors: Light Detection and Ranging — sensors that emit laser pulses and measure return times to create three-dimensional maps of objects passing through the detection zone. Used for vehicle and pedestrian counting and classification, and increasingly for autonomous vehicle testing infrastructure.

Radar sensors: Measure speed and count vehicles using microwave radar. Common in speed enforcement and traffic flow measurement.

Bluetooth and WiFi detectors: Sensors that detect the anonymous identifiers broadcast by vehicle Bluetooth systems or driver smartphones can estimate travel times between two points — a common method for mapping traffic congestion without knowing who the vehicles belong to.

The key distinction for privacy purposes is between sensors that count (traffic loops, many LIDAR systems) and sensors that identify (license plate readers, cameras with facial recognition capability). Counting sensors generate data about patterns; identifying sensors generate data about specific individuals.

License Plate Readers (LPRs)

License plate readers are perhaps the most pervasive and underappreciated surveillance technology in American cities. An LPR is a camera system designed to automatically read the alphanumeric characters on a vehicle license plate and compare them against a database.

LPRs are deployed in multiple configurations: - Fixed LPRs on light poles and signal gantries, monitoring specific roadway locations 24 hours a day - Mobile LPRs mounted on police vehicles, reading plates as the vehicle drives through neighborhoods - Parking enforcement LPRs used by both government parking enforcement and private parking operators - Toll collection systems that use LPRs to charge vehicles without stopping

What an LPR captures: - License plate number - Date, time, and GPS location of capture - Photograph of the vehicle and surroundings - Sometimes: speed estimate, direction of travel

What this enables: - Checking plates against hot lists of stolen vehicles, vehicles with outstanding warrants, or vehicles of interest to law enforcement - Building a historical record of where a specific vehicle (and its driver) has been over time - Generating alerts when a specific vehicle enters or exits a defined area

The retention problem:

The most significant privacy concern with LPRs is not the real-time hot list check — it is the historical database. Many LPR systems retain all captured plate reads, not just those that match a hot list, for extended periods. The retention policies vary widely:

Some agencies retain data for 48 hours
Many retain for 60-90 days
Some retain indefinitely
Some share data with third-party aggregators (Vigilant Solutions / Motorola, LEARN, etc.)

A historical database of license plate captures is, in effect, a historical location record for every vehicle that has traveled in an area equipped with LPRs. Over time, these records can reconstruct detailed patterns of daily life: where you work, where you worship, who you visit, which medical offices you attend, which political meetings you go to.

📊 Real-World Application: LPRs and Medical Appointments

In 2022, following the Supreme Court's Dobbs v. Jackson Women's Health Organization decision, which eliminated the federal constitutional right to abortion, reproductive rights advocates raised specific concern about LPR data. License plate readers near abortion clinics record the plates of every vehicle that parks or drives past. In states that criminalized abortion after Dobbs, this LPR data could potentially be used to identify individuals who traveled to out-of-state abortion providers — particularly if the data was retained and accessible to law enforcement. The surveillance infrastructure was not built for this purpose; but the data it generates is available for this purpose, and the legal framework governing law enforcement access to LPR databases is incomplete in most states.

WiFi Probing

WiFi probing is a surveillance technique that exploits a feature of how smartphones look for familiar wireless networks. When a smartphone's WiFi is enabled, it periodically broadcasts "probe requests" — signals asking nearby access points "Are you a network I know?" These probe requests include the phone's MAC address (Media Access Control address) — a hardware identifier unique to each WiFi chip.

In the original implementation, probe requests included the device's real, permanent MAC address. This made it possible for a fixed sensor that detects probe requests to track a specific device across multiple locations over time — effectively tracking a person's movements through any area equipped with such sensors, without requiring them to connect to any network.

Most modern smartphones (iOS 14+, Android 10+) now use randomized MAC addresses for probe requests — generating a random identifier for each probe rather than the device's real hardware address. This limits the tracking utility of WiFi probing for individual devices.

However, older devices do not randomize addresses. Commercial WiFi analytics systems used in retail environments (and smart city pedestrian monitoring) often use probe requests to estimate pedestrian counts and dwell times. The aggregate data is useful for planning purposes; the individual tracking capability (even with randomization) is meaningful for devices that have not been updated.

⚠️ Common Pitfall: "My MAC Address Is Randomized, So I'm Safe"

MAC address randomization is an important privacy protection for WiFi probe tracking. But it is not a complete defense. First, older devices do not randomize. Second, randomization is device-specific and may not be consistently applied across all probe transmission types. Third, WiFi probing is one layer of a multi-layer tracking system: even if your MAC address is randomized, your physical appearance (captured by nearby cameras), your vehicle (captured by LPRs if you drove), and your phone's cellular connection (traceable through your carrier) all provide additional identifying layers that WiFi probing is typically combined with in sophisticated urban tracking systems.

25.3 Sidewalk Toronto: A Cautionary Tale

No single project better illustrates both the promise and the failure modes of smart city governance than the Sidewalk Toronto project — Alphabet's attempt to build a "neighborhood from scratch" incorporating comprehensive smart city technology on Toronto's waterfront.

The Project

In 2017, Sidewalk Labs (a subsidiary of Alphabet, Google's parent company) was selected to develop the Quayside district of Toronto's eastern waterfront — a 12-acre brownfield site. Sidewalk Labs' vision was comprehensive: buildings instrumented with sensors tracking occupancy, air quality, and energy use; streets equipped with programmable pavement surfaces that could be reconfigured for different uses; a modular timber construction system; underground delivery robots; a data platform aggregating all information from the development.

The potential benefits were real: better energy efficiency, more responsive infrastructure management, improved pedestrian and cycling conditions. Sidewalk Labs presented Quayside as a proof of concept for data-driven urban management that could be replicated in cities worldwide.

The Collapse

Sidewalk Toronto collapsed in 2020, with Sidewalk Labs citing "unprecedented economic uncertainty" due to COVID-19 as the primary cause. But the project's difficulties predated the pandemic — it had been under sustained and intense criticism since 2017, driven by a specific set of governance failures.

The data ownership problem: Sidewalk Labs initially proposed that data collected from the development — from all sensors, from all residents and visitors, from all commercial and residential activities — would be governed by a "Civic Data Trust," an independent entity that would control access to the data on behalf of the community. The governance structure of this Trust was never clearly defined, and critics including Alphabet's own former privacy chief, Ann Cavoukian, raised concerns that it would be ineffective at preventing Alphabet from accessing data for commercial purposes.

The consent impossibility: Sidewalk Toronto would have been a publicly accessible district — anyone walking through would be in range of its sensors. There was no mechanism by which a resident of Toronto could consent to or refuse being monitored when walking through the Quayside waterfront. This is structurally identical to the problem Jordan encounters in the Smart Mobility District — but scaled to an entire neighborhood designed by a technology company.

The scope creep: Sidewalk Labs' proposals kept expanding. The initial 12-acre pilot became a proposal for 190 acres. The data platform kept acquiring new potential applications. The boundaries between the "neighborhood" and the surrounding city were ambiguous. Each expansion generated new concerns about what the project's eventual scale and data collection would look like.

The power asymmetry: Sidewalk Labs had enormous resources, technical expertise, and corporate sophistication. The Toronto community and government had less. The negotiations over data governance were structurally unequal — a pattern that recurs whenever technology companies negotiate smart city contracts with municipal governments.

🔗 Connection: Sidewalk Toronto and Vendor Lock-In

One of the concerns that drove opposition to Sidewalk Toronto was vendor lock-in: if the development's physical infrastructure was designed to operate with Sidewalk Labs' proprietary data platform, Toronto would have no practical way to replace the platform if the company's practices proved problematic, if the company was acquired, or if it simply decided to change its terms. This is the smart city equivalent of a city's water supply depending on a single private company with proprietary infrastructure. Vendor lock-in transforms a governance relationship (city contracting with a service provider) into a dependency relationship (city unable to function without the provider). The negotiating power dynamics of this dependency — who has more to lose if the relationship ends — favor the technology company.

25.4 Smart Streetlights, V2I, and the Integrated Sensing Network

Individual smart city sensors are significant. The integration of multiple sensor types into networked systems that share data creates qualitatively different surveillance capabilities.

Smart Streetlights

The smart streetlight is a paradigm case of infrastructure dual-use. Cities have installed "smart" LED streetlights for genuine efficiency reasons: LED bulbs last longer, use less power, and can be dimmed automatically during low-traffic hours, saving significant energy costs. The "smart" part refers to networking — each light communicates with a central management system, enabling individual control and monitoring.

But smart streetlights frequently also include: - Cameras (for traffic monitoring, sometimes with facial recognition capability) - Acoustic sensors (for ShotSpotter or similar gunshot detection) - Air quality sensors - WiFi probe detectors - Environmental sensors (temperature, humidity, light)

A single smart streetlight thus simultaneously provides illumination, monitors the acoustic environment, monitors the visual environment (with potential facial recognition), monitors pedestrian WiFi devices, and measures environmental conditions. This is the fusion of Chapter 22's acoustic surveillance, Chapter 23's environmental monitoring, and CCTV from Chapter 8 — in a single device, on every street corner.

The San Diego, California smart streetlight program is a well-documented example of this dual-use trajectory. The city installed 3,000 smart streetlights marketed as traffic management and public safety tools. In 2020, it emerged that the streetlights had been used by the San Diego Police Department to review footage from incidents including protest activities related to the George Floyd demonstrations — purposes not disclosed in the original program description. The city council subsequently adopted restrictions on police access to streetlight data.

V2I (Vehicle-to-Infrastructure)

V2I (Vehicle-to-Infrastructure) communication refers to systems that enable data exchange between vehicles and roadway infrastructure. In smart city contexts, V2I enables: - Real-time traffic signal timing adjustment based on vehicle queue length and wait times - Emergency vehicle preemption (traffic signals detect approaching emergency vehicles and clear their path) - Speed feedback (roadside signs display a driver's current speed, detected by radar) - Parking guidance (sensors detect available spaces and direct drivers) - Automatic toll collection - In emerging deployments: data exchange between connected vehicles and smart infrastructure that includes vehicle identity, speed, direction, and destination

As connected and autonomous vehicles become more common, V2I capability will expand dramatically. The data generated — including vehicle identity, location, speed, and route — is highly sensitive from a privacy perspective. Governance frameworks for this data, including who can retain it and for how long, are largely undeveloped.

25.5 The Fusion Center: When All This Data Comes Together

Individual sensors are surveillance nodes. Fusion centers are the infrastructure that connects them.

Fusion centers are intelligence-sharing hubs that aggregate data from multiple agencies and sensor systems. Originally established after 9/11 to improve information sharing between federal, state, and local law enforcement, fusion centers have expanded far beyond their original counterterrorism mission to serve as hubs for everyday policing, traffic enforcement, and — in cities with smart city infrastructure — integration of sensor data from diverse systems.

In a city with comprehensive smart city infrastructure, a fusion center might integrate: - LPR data from all cameras in the city - CCTV footage from public cameras (and in some cases, private cameras through voluntary partnerships) - ShotSpotter or acoustic sensor alerts - Social media monitoring feeds - Criminal justice databases (warrants, records, probation) - DMV records for vehicle and driver identification - In some cities: cell site simulator ("Stingray") data - Commercial data sources (location data purchased from data brokers)

The result is a "common operating picture" — a real-time or near-real-time integrated view of activity across the city. Analysts at the fusion center can, in principle, answer questions like: "Where is this vehicle right now?" (LPRs), "Where has it been in the last 30 days?" (LPR history), "Who was driving it at this time?" (DMV plus camera feed), "Where does the driver live and work?" (LPR patterns), and "Has anyone in the driver's social network been arrested?" (criminal justice databases).

This integrated capability — which emerges from the combination of individually understandable data sources — represents a qualitative expansion of surveillance power beyond what any individual system provides. The aggregate is not merely the sum of the parts.

🌍 Global Perspective: Surveillance Cities Around the World

Smart city surveillance infrastructure varies dramatically by national context. Hangzhou, China's "City Brain" system integrates traffic cameras, facial recognition, and AI traffic management into one of the most comprehensive urban surveillance platforms in the world — with minimal public accountability or independent oversight. Amsterdam's smart city program explicitly incorporates "privacy by design" principles and public accountability mechanisms, including a data register that describes what data is collected and by whom across the city's sensor network. Singapore's "Smart Nation" initiative sits between these poles — extensive sensor deployment with strong government coordination but limited public oversight mechanisms. The range of approaches reflects different national values, political systems, and civil society capacities — a reminder that the technology does not determine the governance, even if it enables it.

25.6 Who Owns Smart City Data?

Perhaps the most consequential and least-resolved question in smart city governance is: who owns the data generated by public infrastructure?

The intuitive answer — the city owns data generated in city-owned infrastructure — turns out to be legally complex and frequently incorrect in practice.

The Vendor Lock-In Problem in Data

When cities contract with private companies to install and manage smart city infrastructure, the data management arrangements in those contracts determine who actually controls the data. In many contracts, particularly early-generation smart city agreements:

The vendor operates the data platform
The city receives agreed-upon data products (reports, dashboards, alerts) but not raw data
The vendor retains rights to aggregate or de-identified data derived from the city system
The vendor's platform architecture makes it difficult for the city to switch providers without losing historical data

This means that a city might have thousands of sensors generating data about its residents while having limited ability to access the underlying data, audit how it is used, or provide it to researchers or accountability organizations.

The Sidewalk Toronto controversy foregrounded this problem explicitly — who would control the data from the Quayside development? — but similar arrangements have been adopted by cities across North America and Europe without comparable public scrutiny.

Municipal Transparency and the FOIA Problem

Freedom of Information Act (FOIA) requests provide a mechanism for public accountability of government data systems. Smart city data held by city agencies is, in principle, subject to FOIA requests — though law enforcement exemptions substantially limit what can be obtained about surveillance-related data.

But smart city data held by private vendors under contract — common in the arrangements described above — may be treated as a trade secret, exempt from FOIA disclosure. This creates a governance gap: the city installed the infrastructure with public money; the data generated is held by a private company; the public cannot access it through the standard accountability mechanism.

⚠️ Common Pitfall: "The Data Is Anonymized So It's Not a Privacy Problem"

Municipal smart city programs frequently describe their data as "anonymized" or "aggregate" to address privacy concerns. This description is often inaccurate. License plate reader data is inherently identified — it records license plate numbers, which link directly to registered owners. WiFi probe data with non-randomized MAC addresses is identified. CCTV footage with facial recognition capability is identified. Even genuinely aggregate data — pedestrian counts by location and time — can be combined with other data sources (commercial location data, retail transaction records) to make inferences about specific individuals. "Anonymized" is a claim that requires specific technical verification, not a property that attaches automatically to data because a government program says so.

25.7 Privacy by Design in Urban Infrastructure

Can smart city infrastructure be designed to provide the efficiency and safety benefits its advocates claim while avoiding the privacy harms its critics document? This is the question "privacy by design" (PbD) asks — and the answer is: partially, but only if the design commitment is genuine and the governance is robust.

The seven principles of privacy by design (Ann Cavoukian, 1990s; adopted by the International Conference of Data Protection and Privacy Commissioners):

Proactive, not reactive: Build in privacy protection before problems occur, not after
Privacy as the default: Data collection should require active choice, not be the default
Privacy embedded into design: Privacy protections should be integral to the system architecture, not add-ons
Full functionality: Privacy should not be achieved by sacrificing legitimate functionality (positive-sum, not zero-sum)
End-to-end security: Full lifecycle protection for data from collection to deletion
Visibility and transparency: Users and subjects should be able to understand what data is collected and how it is used
Respect for user privacy: Keep it user-centric; enable individual rights and preferences

Applied to smart city infrastructure, PbD would imply: - Traffic management by counting, not identifying, vehicles (traffic loops, not LPRs) - Aggregate pedestrian analytics, not individual tracking - On-device processing where possible (as in the Rainforest Connection Guardian system from Chapter 22) - Time-limited retention with automatic deletion - Public data registries disclosing all sensors and their purposes - Community governance mechanisms with real authority over data use

The Limits of Privacy by Design

Privacy by design, as typically implemented, faces several limitations:

Technical PbD is necessary but insufficient. You can design a sensor that aggregates rather than identifies — but if the governance context rewards identification, the design will be modified over time. San Diego's streetlights were arguably designed with limited identification capability; they were repurposed for identification anyway because the technical capability (cameras) was present and the governance allowed it.

PbD is a vendor commitment, not a community decision. Cavoukian resigned from Sidewalk Toronto's privacy advisory role in 2018, citing her concerns that the data governance framework was insufficient to enforce PbD principles. She designed a framework; the company's implementation did not meet it. PbD implemented by the technology vendor serves the vendor's interests first.

PbD does not address who the system serves. Even a perfectly privacy-protective smart city system can be optimized for some users (drivers, residents, businesses) at the expense of others (pedestrians, cyclists, low-income residents, unhoused people). The distribution of benefits and burdens from smart city infrastructure is a political question that privacy design cannot resolve.

25.8 Python Analysis: Understanding Urban Pedestrian Data

This section demonstrates how publicly available urban sensor data can be analyzed using Python. Many cities publish pedestrian count data from their sensor networks as part of open data initiatives — offering a rare opportunity to examine smart city surveillance data directly.

The following code loads publicly available pedestrian count data, produces basic visualizations, and illustrates what can and cannot be inferred from aggregate sensor data.

"""
Urban Pedestrian Data Analysis
Chapter 25: Smart City Surveillance

This script analyzes publicly available pedestrian foot traffic data
from a city's sensor network. Many cities publish this data through
open data portals (e.g., data.cityofX.gov).

Required libraries: pandas, matplotlib, seaborn
Install with: pip install pandas matplotlib seaborn
"""

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import datetime

# ---------------------------------------------------------------
# Step 1: Load sample pedestrian count data
# In practice, download from your city's open data portal.
# Example format: sensor_id, timestamp, count, location, direction
# ---------------------------------------------------------------

# Create sample data for demonstration
# (In a real analysis, replace this with pd.read_csv('your_file.csv'))
import numpy as np

np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-01-31', freq='H')

# Simulate two sensor locations
locations = ['Main & Elm (Downtown)', 'University Ave & 3rd St']

records = []
for location in locations:
    for timestamp in dates:
        hour = timestamp.hour
        # Pedestrian traffic peaks during morning commute, lunch, and evening
        base_count = 20
        if 7 <= hour <= 9:    # Morning rush
            base_count = 120
        elif 11 <= hour <= 14:  # Lunch
            base_count = 90
        elif 16 <= hour <= 19:  # Evening
            base_count = 110
        elif 22 <= hour or hour <= 5:  # Late night / early morning
            base_count = 5

        # Weekend modifier
        if timestamp.weekday() >= 5:
            base_count = int(base_count * 0.6)

        # Add some randomness
        count = max(0, int(np.random.normal(base_count, base_count * 0.2)))

        records.append({
            'timestamp': timestamp,
            'location': location,
            'pedestrian_count': count
        })

df = pd.DataFrame(records)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['hour'] = df['timestamp'].dt.hour
df['day_of_week'] = df['timestamp'].dt.day_name()
df['is_weekend'] = df['timestamp'].dt.weekday >= 5

print("Dataset overview:")
print(f"  Total records: {len(df):,}")
print(f"  Date range: {df['timestamp'].min().date()} to {df['timestamp'].max().date()}")
print(f"  Locations monitored: {df['location'].nunique()}")
print(f"  Average hourly count: {df['pedestrian_count'].mean():.1f} pedestrians\n")

# ---------------------------------------------------------------
# Step 2: Hourly patterns — when are people moving?
# ---------------------------------------------------------------

hourly_avg = (df.groupby(['hour', 'location'])['pedestrian_count']
              .mean()
              .reset_index())

fig, ax = plt.subplots(figsize=(12, 5))

for location in locations:
    location_data = hourly_avg[hourly_avg['location'] == location]
    ax.plot(location_data['hour'],
            location_data['pedestrian_count'],
            marker='o', linewidth=2, markersize=4,
            label=location)

ax.set_xlabel('Hour of Day (24h)', fontsize=12)
ax.set_ylabel('Average Pedestrian Count', fontsize=12)
ax.set_title('Average Pedestrian Traffic by Hour', fontsize=14, fontweight='bold')
ax.set_xticks(range(0, 24))
ax.set_xticklabels([f'{h:02d}:00' for h in range(24)], rotation=45, ha='right')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('pedestrian_hourly_patterns.png', dpi=150, bbox_inches='tight')
print("Saved: pedestrian_hourly_patterns.png")
plt.show()

# ---------------------------------------------------------------
# Step 3: Weekday vs. weekend comparison
# ---------------------------------------------------------------

day_avg = (df.groupby(['hour', 'is_weekend', 'location'])['pedestrian_count']
           .mean()
           .reset_index())
day_avg['Day Type'] = day_avg['is_weekend'].map({True: 'Weekend', False: 'Weekday'})

fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharey=True)

for i, location in enumerate(locations):
    loc_data = day_avg[day_avg['location'] == location]
    for day_type, group in loc_data.groupby('Day Type'):
        axes[i].plot(group['hour'], group['pedestrian_count'],
                     marker='o', linewidth=2, markersize=4,
                     label=day_type)
    axes[i].set_title(location, fontsize=11)
    axes[i].set_xlabel('Hour of Day')
    axes[i].set_ylabel('Average Pedestrian Count')
    axes[i].set_xticks(range(0, 24, 3))
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

fig.suptitle('Weekday vs. Weekend Pedestrian Patterns', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('pedestrian_weekday_weekend.png', dpi=150, bbox_inches='tight')
print("Saved: pedestrian_weekday_weekend.png")
plt.show()

# ---------------------------------------------------------------
# Step 4: What aggregate data can and cannot tell us
# ---------------------------------------------------------------

print("\n--- What this data CAN tell us ---")
print("  1. Peak usage times (for scheduling maintenance, staffing, transit)")
print("  2. Weekday vs. weekend demand differences")
print("  3. Seasonal trends (if data covers multiple months)")
print("  4. Impact of events (if correlated with event calendar)")
print("  5. Whether infrastructure investments changed foot traffic")

print("\n--- What this data CANNOT tell us ---")
print("  1. Who specifically is present (individual identity)")
print("  2. Where people are going (no direction inference from counts alone)")
print("  3. Why traffic changed (correlation only — cannot infer cause)")
print("  4. Demographic composition of pedestrians")
print("  5. Whether any specific individual was present on a specific date")

print("\n--- What aggregate data MIGHT reveal with additional analysis ---")
print("  1. Commercial activity patterns (correlated with retail data)")
print("  2. Approximate demographics (if correlated with census geography)")
print("  3. Routine individual schedules (if sensors identify devices)")
print("     --> This is where aggregate environmental data becomes surveillance")

# ---------------------------------------------------------------
# Step 5: The surveillance threshold — when counting becomes tracking
# ---------------------------------------------------------------

print("\n--- The surveillance threshold ---")
print("""
Pedestrian count data (this analysis):
    - Counts people passing a sensor
    - No individual identification
    - Aggregate patterns only
    - Privacy risk: LOW (with proper governance)

WiFi probe data (Chapter 25 main text):
    - Records device hardware identifiers
    - Can track specific devices across multiple sensors
    - Individual movement patterns reconstructable
    - Privacy risk: HIGH (without randomization enforcement)

LPR data (Chapter 25 main text):
    - Records specific vehicle plate numbers
    - Links to registered owner identity
    - Historical location tracking enabled
    - Privacy risk: HIGH (data retention dependent)

The transition from COUNT to IDENTIFY is the critical privacy threshold.
""")

What This Analysis Demonstrates:

The code above shows what publicly available pedestrian count data can and cannot tell us. The key insight — illustrated in the "surveillance threshold" section — is that aggregate count data occupies a fundamentally different privacy position than identified data:

Pedestrian count data tells you how many people were at a location, not who they were
WiFi probe data with real MAC addresses tells you which specific device was at a location
LPR data tells you which specific vehicle (and by extension, whose vehicle) was at a location

The difference between "how many" and "who specifically" is the difference between environmental monitoring and surveillance. Smart city infrastructure often contains both types of sensors — some that count, some that identify — and the governance frameworks governing them are often identical, obscuring this critical distinction.

25.9 Jordan's Synthesis: From List to System

Jordan returns to the dorm after their hour of observation and looks at their list again. Fourteen cameras. Six traffic sensors. Four LPRs. Two acoustic sensors. Three WiFi detectors.

When Jordan describes each device individually to Marcus, Marcus nods — he knows all of them, has talked about all of them. When Jordan describes them as a system — a network of interlocking sensors, all feeding data into city infrastructure, some of it held by the city, some by vendors, some shared with law enforcement — Marcus's enthusiasm for the technology quiets a little.

"That's different," Marcus says. "When you describe it like that."

It is different. Each individual device has an explanation. The signal camera is for traffic management. The LPR is for parking enforcement. The acoustic sensor is for gunshot detection. The WiFi detector is for pedestrian flow analysis. Each explanation is legitimate. But the combination — the system — creates something that no individual device's explanation encompasses.

Yara, who is over for dinner, cuts through to what Jordan is thinking: "It's not the camera. It's that the camera knows everything the LPR knows, and everything the WiFi sensor knows, and everything the shot detector knows. And none of them are required to tell you that."

Jordan nods. They were in that Smart Mobility District. They were documented by LPRs (they passed in a ride-share on the way downtown). Their phone's MAC address was captured at three bus shelters (they checked — their phone has randomization but it wasn't working correctly on their older model). Their face is in the CCTV footage from multiple angles.

And they never consented to any of it. And there is no law that says they had to be told.

The panopticon has become pavement.

25.10 Summary

The "smart city" is a surveillance infrastructure presented in the language of efficiency and service. Its sensor layer — traffic sensors, license plate readers, acoustic monitors, WiFi probes, CCTV cameras, environmental sensors — constitutes a comprehensive monitoring network that documents the movements, behaviors, and activities of urban residents without their specific consent.

The Sidewalk Toronto case demonstrates that the governance challenges of smart city infrastructure are not primarily technical but political — involving questions of data ownership, vendor lock-in, community consent, and the distribution of benefits and harms that technology companies are ill-equipped to resolve on behalf of communities.

License plate readers are perhaps the most pervasive and underappreciated example of smart city surveillance — building historical location databases for vehicles (and their drivers) that enable reconstruction of detailed life patterns. WiFi probing is a passive surveillance technology embedded in public infrastructure that most people do not know exists. Fusion centers integrate these data streams into capabilities that exceed what any individual component can produce.

Privacy by design offers genuine technical tools for reducing smart city surveillance harms — but only when implemented genuinely, enforced by robust governance, and accompanied by community authority over design decisions. Chapter 39 will examine privacy by design in more detail, including its limitations and the conditions under which it can be effective.

Key Terms

Smart city: An urban governance model using sensor networks, data platforms, and automated analysis to manage infrastructure and services — marketed as efficiency-improving but functioning as a comprehensive surveillance architecture.

License plate reader (LPR): A camera system that automatically reads license plate numbers and compares them to databases; generates historical vehicle location records when data is retained.

WiFi probing: A technique in which fixed sensors capture the MAC addresses broadcast by smartphones looking for familiar wireless networks, enabling device tracking across multiple locations.

Fusion center: An intelligence-sharing hub aggregating data from multiple agencies and sensor systems; originally counterterrorism-focused, now increasingly used for everyday urban surveillance.

LIDAR (Light Detection and Ranging): A sensor that uses laser pulses to create three-dimensional maps of objects passing through a detection zone; used for vehicle and pedestrian counting and classification.

V2I (Vehicle-to-Infrastructure): Communication systems enabling data exchange between vehicles and roadway infrastructure; enables traffic management but generates location and identity data about vehicles and drivers.

Vendor lock-in: A dependency relationship in which a city's smart city infrastructure is so deeply integrated with a private vendor's proprietary platform that replacement is practically infeasible, giving the vendor disproportionate power in the relationship.

Privacy by design (PbD): A framework for building privacy protections into the architecture of systems from the beginning rather than adding them later; effective only when implemented genuinely and backed by robust governance.