Case Study 25-1: Adversarial Attacks in the Wild — Real-World AI Security Failures

Overview

The academic literature on adversarial attacks has produced thousands of papers demonstrating that machine learning models can be fooled by carefully crafted inputs. But translating laboratory demonstrations to real-world deployments requires bridging a significant gap: physical-world conditions are noisy, variable, and much more challenging for adversarial attack construction than the controlled conditions of a research lab. This case study examines documented cases in which adversarial attacks or AI security failures have occurred or been convincingly demonstrated in real-world deployment conditions — and what these cases teach about the security design of AI systems.

The cases covered here range from formally demonstrated attacks against deployed systems to documented security failures involving AI components. They collectively illustrate that the adversarial robustness problem is not a theoretical concern but a practical challenge for every organization deploying AI in environments with adversarial actors.

Case 1: Tesla Autopilot and Lane-Marking Attacks

The Attack

In 2020, security researchers at McAfee Advanced Threat Research demonstrated a practical adversarial attack against Tesla vehicles' autopilot system using a commercially available MobilEye camera system and the EyeQ3 chip used in Tesla Model S and Model X vehicles. The attack exploited the speed limit sign recognition function of the autopilot system.

The researchers discovered that adding a small black rectangle of tape — approximately 2 inches wide — to the bottom of a 35 MPH speed limit sign caused the system to misread the sign as 85 MPH. The misclassification was not random; it was consistent and reproducible. The tape caused the classifier to interpret the "3" in "35" as an "8," changing the sign's perceived speed limit from 35 to 85 MPH.

More significantly, a prior demonstration by Keen Security Lab researchers had shown that adding specific patterns to road lane markings could cause Tesla's autopilot to change lanes in response to the manipulated markings. The attack exploited the autopilot system's reliance on lane detection to maintain lane position.

What Happened Next

Tesla issued a statement in response to the speed limit sign attack noting that Autopilot is a driver-assist feature requiring driver attention and is not designed to be the sole arbitration of speed limit information. The company stated that the misclassification required specific conditions unlikely to be present in normal driving.

This response illustrates a common pattern in responses to AI security demonstrations: the attack is acknowledged but characterized as requiring conditions unlikely in real-world deployment. This response may be technically accurate while being insufficient from a security perspective. Security vulnerabilities need not be continuously present to be dangerous; an attacker who can reproduce the conditions of an attack can exploit it reliably.

Tesla subsequently updated its autopilot system to rely less heavily on camera-based sign recognition and to cross-reference navigation map data for speed limit information. This defense-in-depth approach — relying on multiple independent information sources rather than a single AI sensor — is a sound security response to adversarial attack risk.

Lessons

Defense in depth is essential for safety-critical AI. A system that relies on a single AI component to make safety-critical decisions — speed limit recognition, lane following, obstacle detection — is vulnerable to attacks on that component. Redundant, independent information sources that cross-validate AI conclusions reduce this vulnerability.

Adversarial robustness cannot be assumed. The Tesla autopilot system was not designed with this specific attack in mind, but the vulnerability was discoverable by security researchers with moderate effort. Systems deployed in adversarial environments — which, for autonomous vehicles, includes public roads — require adversarial robustness testing before deployment.

Responsible disclosure and response matter. The McAfee and Keen Security Lab researchers engaged in responsible disclosure — notifying Tesla before public disclosure and allowing time for response. Tesla's response included system updates. This disclosure-response cycle is the mechanism by which AI security research improves deployed systems.

Case 2: Facial Recognition Bypasses — Samsung vs. Apple

The Context

Facial recognition for device authentication — "face unlock" — became a mainstream feature in smartphones beginning around 2017. Apple's Face ID, introduced with the iPhone X, uses a sophisticated structured light system with depth sensing and infrared mapping. Samsung's earlier face unlock implementations on devices like the Galaxy S8 used 2D camera images without depth sensing.

The 2D Photo Attack

Shortly after the Samsung Galaxy S8's face unlock feature was released in 2017, researchers and users demonstrated that the feature could be bypassed using a photograph of the registered user. Holding a 2D photograph of the device owner in front of the camera was sufficient to unlock the device. Samsung acknowledged the limitation and issued a warning that face recognition "can be unlocked by someone or something that looks like you" and recommended using fingerprint recognition for sensitive authentication.

This case is not technically an adversarial attack in the strict mathematical sense — it did not require computing an adversarial perturbation. It is, however, an instructive failure of AI-based authentication: a system deployed for security authentication was bypassed by the simplest imaginable impersonation technique.

The Printed Glasses Attack Against Face ID

Apple's Face ID uses 3D facial mapping and was designed to be resistant to photograph attacks. In 2019, researchers at Tencent's Xuanwu Lab demonstrated a more sophisticated attack against Face ID when applied to sleeping or unconscious users.

Standard Face ID requires eye tracking — the eyes must appear open and directed at the device. The researchers demonstrated that attaching printed paper with black-and-white patterns (representing stylized glasses) to the eye area of a sleeping person's face could fool the eye-tracking component and enable unauthorized access. The attack required physical access to an unconscious user and the ability to attach printed material to their face — conditions that limit its practical applicability but that demonstrate how adversarial manipulation of specific AI components can bypass broader security systems.

Clearview AI Data Breach

Clearview AI is a facial recognition company that built a database of approximately 30 billion facial images scraped from social media and other public internet sources, and offered facial recognition searches to law enforcement and commercial clients. In 2020, Clearview AI disclosed a data breach in which its entire client list — including government agencies, law enforcement organizations, and commercial customers — was stolen by an attacker.

The Clearview breach is not an adversarial attack on the AI model itself, but it illustrates the security risks of building and operating AI systems that collect and process sensitive personal data at scale. A facial recognition database of 30 billion images and the client list showing who has queried which faces is extraordinarily sensitive data. The breach exposed clients who may not have wanted their use of facial recognition technology disclosed, and potentially exposed the query histories that could reveal the faces of individuals under surveillance.

From a security perspective, the Clearview case illustrates that AI systems create new data assets with new security profiles. A facial recognition company's database of query logs is not just a record of business transactions; it is a map of who has been surveilled and by whom. The security requirements for such data must reflect its sensitivity.

Case 3: AI Medical Imaging Vulnerabilities

Background

AI systems for medical image analysis — reading X-rays, CT scans, MRIs, and pathology slides — have demonstrated performance comparable to specialist physicians in several domains. Radiology AI has been deployed for screening mammography, lung cancer detection, diabetic retinopathy screening, and other clinical applications. The deployment of AI in medical diagnostics creates adversarial attack surfaces with direct implications for patient safety.

Demonstrated Attacks on Medical AI

Multiple research groups have demonstrated adversarial attacks on medical imaging AI. In 2019, researchers published results showing that adversarial perturbations could cause deep learning models trained on chest X-rays to misclassify pneumonia as normal or vice versa, with perturbations invisible to radiologists. The perturbations required to cause misclassification were smaller than the variation between different X-ray machines, suggesting that real-world exploitability was feasible.

More provocatively, research published in 2020 demonstrated that a GAN (Generative Adversarial Network) could inject or remove cancer indicators in CT scans — adding the appearance of cancer to a real patient's scan or removing cancer indicators from a scan showing a genuine tumor. The manipulated scans were visually indistinguishable from unmanipulated scans to radiologists in a blinded evaluation.

These demonstrations illustrate a scenario with direct criminal applications: insurance fraud (manipulating scans to create or eliminate diagnoses that affect coverage decisions), medical malpractice (manipulating records to obscure or create conditions), and direct harm to specific individuals whose diagnoses are corrupted.

Why These Attacks Matter for Healthcare AI Deployment

The medical imaging attack research illustrates that AI systems deployed in high-stakes clinical contexts must be designed with adversarial threats in mind. A radiologist reviewing AI-flagged findings might reasonably rely on an AI system's analysis — particularly for subtle findings that are difficult to detect. An adversarially manipulated scan that passes through the AI analysis undetected would be seen by the radiologist as an AI-reviewed scan, potentially causing them to give it less careful independent scrutiny.

The healthcare AI security challenge is compounded by the regulatory context: FDA-cleared AI medical devices undergo review for clinical performance but not for adversarial robustness. The regulatory framework for healthcare AI is still developing, and adversarial robustness requirements for medical AI are not yet standardized.

Case 4: NLP Systems — Text Adversarial Attacks

Spam Filters and Email Classification

Email spam filters are among the earliest deployed machine learning classifiers. They are also among the earliest targets of adversarial attack: spammers have been crafting emails designed to evade spam filters since spam filters were first deployed. Early evasion techniques were simple: adding random words to spam emails to shift the statistical features used by naive Bayes classifiers; replacing characters with visually similar characters to evade keyword detection.

Modern spam filter evasion techniques are more sophisticated and increasingly automated. Machine learning can be used to generate spam emails that are optimized to pass through spam filters — effectively, adversarial examples for NLP classifiers trained on email text.

Sentiment Analysis Manipulation

Sentiment analysis — AI classification of text as expressing positive, negative, or neutral sentiment — is widely deployed for customer feedback analysis, social media monitoring, and content moderation. Adversarial attacks on sentiment analysis have been demonstrated in research contexts, showing that small, targeted word substitutions can reliably change a classifier's sentiment assessment without changing the text's meaning to a human reader.

The practical implication is that platforms using sentiment analysis for content moderation can be gamed: bad actors who understand the sentiment classifier's decision boundaries can craft content that evades detection by choosing words that preserve the intended meaning (to human readers) while defeating the classifier's identification.

Overarching Lessons for Business Professionals

Adversarial Robustness Is a Design Requirement, Not a Post-Deployment Fix

The cases reviewed here share a common pattern: AI systems were designed for accuracy on clean data and deployed in environments where adversarial actors have incentives to attack them. Adversarial robustness must be a design requirement from the outset, not a retrofit in response to demonstrated attacks.

For each AI system deployment, the design process should ask: who has incentives to attack this system, what attacks are technically feasible given the deployment context, and what would the consequences of a successful attack be? These questions must be answered before deployment, not after.

Independent Validation Is Essential for Safety-Critical AI

The cases involving Tesla autopilot and medical imaging AI both illustrate the risks of deploying AI in safety-critical applications without independent validation of adversarial robustness. The regulatory frameworks for autonomous vehicles and medical AI are still developing in their adversarial robustness requirements, but organizations should not wait for regulatory requirements to impose adversarial robustness testing.

Security Research on AI Systems Is Valuable and Should Be Encouraged

The adversarial attacks documented in this case study were all discovered and responsibly disclosed by security researchers — not first exploited by malicious actors. The security research community's work on AI adversarial robustness serves a vital function: identifying vulnerabilities before they are exploited. Organizations deploying AI should support security research on their systems, including through bug bounty programs and responsible disclosure policies.

AI Security Failures Can Have Cascading Consequences

The Clearview AI data breach illustrates how AI systems create new data assets whose compromise has consequences beyond the immediate breach. A facial recognition database is not just a collection of images; it is a surveillance infrastructure whose security failure reveals not only the data but the entire history of who was surveilled and by whom. The security requirements for AI systems must reflect the full scope of what their compromise reveals.