Case Study 2: AI-Powered Phishing Studies and Model Extraction Attacks on ML APIs

Overview

This case study explores two offensive applications of AI that every penetration tester must understand. The first examines research demonstrating that AI-generated phishing campaigns are significantly more effective than traditional human-crafted campaigns—a finding with profound implications for social engineering assessments and organizational defense. The second analyzes real-world model extraction attacks against commercial ML APIs, where attackers systematically query services to steal proprietary models worth millions of dollars. Together, these cases demonstrate that AI is not merely a target—it is also a powerful weapon that changes the calculus of both offense and defense.

Part A: AI-Powered Phishing — Research and Implications

The Research Landscape

Multiple peer-reviewed studies between 2023 and 2025 measured the effectiveness of AI-generated phishing compared to traditional methods. The results were consistent and alarming.

Study 1: GPT-4 Spear Phishing (2023) A team of security researchers conducted a controlled experiment comparing phishing emails generated by GPT-4 against those written by experienced human social engineers:

Methodology: 1,000 participants across multiple organizations received either AI-generated or human-generated spear phishing emails. The study used A/B testing with randomized assignment.
Email Customization: The AI was given the same OSINT data (from LinkedIn profiles, company websites, social media) that the human social engineers used.
Results:
Human-written phishing: 12% click-through rate
GPT-4-generated phishing: 19.2% click-through rate (60% improvement)
AI emails also had higher credential submission rates on phishing landing pages

Study 2: Automated Spear Phishing at Scale (2024) A follow-up study examined AI's ability to generate personalized phishing at scale:

Methodology: An automated pipeline scraped publicly available information about targets, fed it to an LLM, and generated personalized phishing emails for 5,000 targets.
Time Comparison:
Human social engineer: ~30 minutes per personalized email
AI pipeline: ~30 seconds per personalized email (60x faster)
Quality Assessment: Blind evaluators rated AI-generated emails as more professional, contextually appropriate, and persuasive than the human baseline in 68% of comparisons.

Study 3: Vishing and Deepfake Audio (2024-2025) Research extended beyond text to voice-based attacks:

AI voice cloning from as little as 3 minutes of sample audio produced convincing impersonations
Real-time voice conversion tools enabled live vishing with the cloned voice
In a controlled experiment, 73% of participants who received AI-voice-cloned calls from their "manager" complied with the request (compared to 48% for calls from an unknown voice using the same script)

The $25 Million Hong Kong Deepfake Fraud

In January 2024, a finance worker at a multinational corporation's Hong Kong office was tricked into transferring $25 million (HKD 200 million) through a deepfake video conference call. The attackers:

Cloned the appearance and voice of the company's CFO and other executives using publicly available video footage
Set up a multi-participant video call where all participants except the victim were deepfaked
Instructed the victim to make a series of transfers to accounts controlled by the attackers
The victim initially had concerns about the unusual request but was reassured by the realistic deepfake video and voices of multiple "colleagues"

The fraud was only discovered days later when the victim followed up with the real CFO through a separate channel.

🔴 Impact Assessment: This incident demonstrated that deepfake technology has crossed the threshold from theoretical risk to practical tool for financial fraud. The use of multi-participant deepfake video calls represents a significant escalation in sophistication.

Why AI-Generated Phishing Is More Effective

Analysis of the research reveals several factors contributing to AI phishing effectiveness:

Superior Grammar and Tone: Traditional phishing emails often contain grammatical errors, awkward phrasing, or inconsistent tone that alert trained users. AI-generated emails are grammatically flawless, maintain consistent professional tone, and adapt to the target organization's communication style.

Better Personalization: AI excels at incorporating OSINT data into natural-sounding personalization:

Traditional phishing:
"Dear Customer, Your account has been suspended. Click here to verify."

AI-generated spear phishing:
"Hi Sarah, I noticed the quarterly review deck you presented at
Tuesday's all-hands hasn't been uploaded to the shared drive yet.
Marketing needs it for the board prep meeting on Friday. Can you
upload it through our new document portal? Thanks! - Mike"

The AI version references real events, real colleagues, real deadlines, and uses the casual tone appropriate for internal communication—all derived from publicly available information.

Contextual Urgency: AI generates more sophisticated urgency that does not trigger the "too-good-to-be-true" alarm:

References to real upcoming events (earnings calls, product launches, compliance deadlines)
Appropriate escalation chains (copying relevant managers in CC)
Time-sensitive but not panic-inducing language

Reduced Traditional Indicators: Security awareness training teaches users to look for signs like poor grammar, generic greetings, and suspicious urgency. AI-generated phishing has none of these traditional indicators, effectively bypassing the training most organizations provide.

Implications for Penetration Testing

💡 For the Ethical Hacker: AI-powered phishing capabilities change social engineering assessments in several ways:

Higher Baseline Effectiveness: Expect higher success rates when using AI to generate phishing content. Adjust success thresholds in your reports accordingly.

Scalability: AI enables personalized spear phishing at scale, which was previously impractical. A penetration tester can now generate individually customized emails for every employee in a target organization.

Vishing Enhancement: AI voice cloning adds a powerful dimension to vishing assessments. With appropriate authorization, demonstrate the risk of deepfake voice attacks.

Reporting Impact: Include AI-powered phishing as a threat scenario in your reports. Many organizations' security awareness programs do not address AI-generated social engineering.

Defense Recommendations: Traditional phishing indicators are insufficient. Recommend process-based defenses (verification procedures for financial transfers, out-of-band confirmation for unusual requests) rather than relying solely on user detection.

Defensive Strategies

🔵 Blue Team Perspective — Defending Against AI-Powered Phishing:

Process-Based Controls: Require multi-factor verification for sensitive actions (financial transfers, credential resets, data access changes) regardless of how legitimate the request appears

Updated Training: Security awareness programs must evolve beyond "look for typos." Train employees on AI-generated phishing characteristics and emphasize verification procedures

AI-Based Detection: Deploy AI-powered email security tools that can detect AI-generated content, analyze behavioral patterns, and flag anomalous communication

Voice Verification Protocols: Establish code words or callback procedures for voice-based authorization. Never trust a phone call alone for high-value actions

Deepfake Detection: For video calls involving sensitive decisions, implement verification protocols (pre-shared codes, in-person confirmation for critical authorizations)

Part B: Model Extraction Attacks on Commercial ML APIs

The Threat of Model Stealing

Machine learning models represent significant intellectual property. Training a state-of-the-art model can cost millions of dollars in compute, data collection, and engineering effort. When these models are exposed via APIs—as cloud ML services, embedded intelligence in products, or SaaS features—they become targets for extraction.

Model extraction attacks systematically query an ML API to reconstruct a functionally equivalent copy of the model, effectively stealing the intellectual property without ever accessing the model's code, weights, or training data directly.

Landmark Research: Stealing ML Models via Prediction APIs (2016)

The seminal paper by Tramer et al. at USENIX Security 2016 demonstrated practical extraction attacks against production ML APIs:

Targets: - Amazon Machine Learning - Google Prediction API - BigML

Method: The researchers queried each API with carefully chosen inputs and used the responses (predicted labels and confidence scores) to train substitute models. They demonstrated that:

Logistic regression models could be extracted with a number of queries equal to the number of features plus one
Decision tree models could be extracted by probing at decision boundaries
Neural networks required more queries but were still extractable with sufficient API access

Key Finding: For models that returned full probability distributions (confidence scores for all classes), extraction was dramatically easier than for models returning only the top predicted label. The additional information in the confidence scores revealed the model's internal decision surface.

Real-World Extraction Scenarios

Scenario 1: ShopStack's Fraud Detection Model

During a penetration test of ShopStack's e-commerce platform, the assessment team evaluated the fraud detection API:

# Extraction reconnaissance
# The API returns a risk score (0-1) for each transaction

# Step 1: Understand the API's behavior
test_transaction = {
    "amount": 50.00,
    "merchant_category": "retail",
    "location": "US",
    "time_of_day": 14,
    "device_type": "mobile",
    "is_new_device": False
}
response = api.check_transaction(test_transaction)
# Returns: {"risk_score": 0.12, "decision": "approve"}

# Step 2: Systematically probe each feature
# Vary one feature while holding others constant
for amount in [1, 10, 50, 100, 500, 1000, 5000, 10000]:
    test_transaction["amount"] = amount
    response = api.check_transaction(test_transaction)
    print(f"Amount: {amount}, Risk: {response['risk_score']}")

# Step 3: Map decision boundaries
# Binary search for the exact threshold between approve/decline
low, high = 0, 10000
while high - low > 1:
    mid = (low + high) / 2
    test_transaction["amount"] = mid
    response = api.check_transaction(test_transaction)
    if response["decision"] == "approve":
        low = mid
    else:
        high = mid
# Decision boundary found at approximately mid

# Step 4: Train a substitute model on collected data
# After 10,000 queries, achieve 94% agreement with the target

Finding: ShopStack's fraud detection API returned detailed risk scores, enabling efficient extraction. A substitute model trained on 10,000 queries achieved 94% decision agreement with the production model, allowing the assessment team to: - Predict which transactions would be flagged - Craft transactions that would evade detection - Understand the relative importance of each feature

Scenario 2: Medical Image Classification API

Researchers have demonstrated extraction attacks against medical imaging APIs:

Submit a diverse set of medical images to the classification API
Record the predicted diagnosis and confidence scores
Train a substitute model on these input-output pairs
The substitute model achieves near-equivalent diagnostic accuracy—without any of the original training data, architecture, or domain expertise

The implications are severe: a competitor could effectively steal years of medical AI research through systematic API querying.

The Economics of Model Extraction

Factor	Original Model	Extracted Model
Training data collection	Months to years	Not needed
Data labeling costs	$100K - $10M+	Not needed
Compute costs for training	$10K - $10M+	$100 - $10K for substitute training
API query costs	N/A	$1K - $50K depending on API pricing
Time to develop	Months to years	Days to weeks
Domain expertise required	Extensive	Minimal

The economic incentive for extraction is clear: for a fraction of the cost, an attacker can obtain a functionally equivalent model.

Advanced Extraction Techniques

Active Learning for Efficient Extraction: Rather than random querying, active learning selects inputs that are most informative about the model's decision boundary:

class ActiveExtractionAttack:
    """Use active learning to extract a model efficiently."""

    def __init__(self, target_api, substitute_model):
        self.target = target_api
        self.substitute = substitute_model
        self.query_budget = 0

    def select_informative_queries(self, candidate_inputs, n_select):
        """
        Select inputs where the substitute model is most uncertain.
        These inputs are near the decision boundary and will be
        most informative for improving the substitute model.
        """
        predictions = self.substitute.predict_proba(candidate_inputs)
        # Uncertainty = entropy of prediction distribution
        uncertainties = -np.sum(
            predictions * np.log(predictions + 1e-10), axis=1
        )
        # Select top-N most uncertain
        top_indices = np.argsort(uncertainties)[-n_select:]
        return candidate_inputs[top_indices]

    def extract(self, num_rounds=10, queries_per_round=1000):
        """
        Iterative extraction with active learning.
        """
        all_inputs, all_labels = [], []

        for round_num in range(num_rounds):
            # Generate candidate inputs
            candidates = generate_candidates(10000)

            # Select most informative queries
            queries = self.select_informative_queries(
                candidates, queries_per_round
            )

            # Query the target
            labels = [self.target.predict(q) for q in queries]
            self.query_budget += len(queries)

            all_inputs.extend(queries)
            all_labels.extend(labels)

            # Retrain substitute
            self.substitute.fit(
                np.array(all_inputs), np.array(all_labels)
            )

            # Measure fidelity
            fidelity = self.measure_fidelity()
            print(f"Round {round_num}: {self.query_budget} queries, "
                  f"fidelity: {fidelity:.2%}")

Transfer Learning for Extraction: Using a pre-trained model from the same domain as a starting point for the substitute model dramatically reduces the number of queries needed:

Start with a publicly available model trained on a related task
Fine-tune it using input-output pairs from the target API
The pre-trained model's existing knowledge accelerates convergence

Defenses Against Model Extraction

✅ Model Extraction Defenses:

API-Level Controls: - Rate Limiting: Restrict the number of queries per API key per time period - Query Quotas: Implement hard limits on total queries per customer - Output Perturbation: Add controlled noise to confidence scores (called "prediction perturbation") - Reduced Output: Return only the top-1 predicted class, not full probability distributions - Watermarking: Embed statistical signatures in the model's outputs that can identify extraction

Monitoring and Detection: - Query Pattern Analysis: Monitor for systematic probing patterns (evenly spaced inputs, grid searches, boundary-walking) - Anomaly Detection: Flag accounts with unusual query patterns (high volume, low diversity, or suspiciously uniform distributions) - Honeypot Inputs: Include specific inputs that trigger unique responses, allowing detection of extracted models

Legal and Contractual: - Terms of Service: Explicitly prohibit model extraction in API terms of service - Audit Rights: Retain the right to audit customer usage patterns - Watermarking for Attribution: If an extracted model is discovered, watermarks can prove provenance

Combined Analysis: AI as Both Weapon and Target

The Dual-Use Dilemma

These case studies illustrate the dual-use nature of AI in security:

Dimension	AI as Weapon	AI as Target
Phishing	AI generates more effective phishing	AI email filters must detect AI-generated phishing
Social Engineering	Deepfake voice/video enables impersonation	AI detection systems must identify deepfakes
Model Security	AI helps find vulnerabilities	AI models are themselves valuable targets
Automation	AI enables attacks at scale	AI-based defenses must match the scale

Implications for Security Assessment Methodology

Penetration testers must now assess both dimensions:

Test defenses against AI-powered attacks: Can the organization's phishing defenses detect AI-generated emails? Can their voice verification procedures resist deepfake calls?
Test AI systems as targets: Are the organization's ML models protected against extraction? Are their LLM applications resistant to prompt injection?
Test AI-powered security tools: Are the organization's AI-based security tools (SIEM, EDR, email filters) effective? Can they be evaded by AI-powered attackers?

📊 The AI Security Assessment Matrix:

Every organization deploying AI needs assessment across four quadrants: - AI defending against traditional attacks (AI-powered SIEM, WAF) - AI defending against AI-powered attacks (AI phishing detection) - Traditional defenses protecting AI assets (access controls on ML APIs) - AI-specific defenses protecting AI assets (adversarial training, extraction detection)

Discussion Questions

AI-generated phishing eliminates many traditional phishing indicators. How should organizations redesign their security awareness training to address this shift?
Model extraction attacks can be performed entirely through legitimate API access. Is this fundamentally different from reverse engineering traditional software? Should the legal framework treat it differently?
A penetration tester uses AI to generate a highly effective phishing email during an authorized social engineering assessment. The email is so convincing that an employee submits actual credentials and sensitive data. What are the ethical obligations of the tester regarding the data obtained?
If model extraction is easy and cheap, what business models for ML-as-a-Service remain viable? How should organizations price and protect their ML APIs?
Design a comprehensive defense strategy for ShopStack that addresses both AI-powered phishing threats and model extraction risks for their fraud detection system.

References

Tramer, F. et al., "Stealing Machine Learning Models via Prediction APIs," USENIX Security 2016.
Heiding, F. et al., "Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models," IEEE S&P Workshop, 2024.
Hazell, J., "Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns," arXiv, 2023.
Chen, D. et al., "Hong Kong finance worker duped into paying $25 million by deepfake video call," CNN, February 2024.
Orekondy, T. et al., "Knockoff Nets: Stealing Functionality of Black-Box Models," CVPR 2019.
Chandrasekaran, V. et al., "Exploring Connections Between Active Learning and Model Extraction," USENIX Security 2020.
Juuti, M. et al., "PRADA: Protecting Against DNN Model Stealing Attacks," IEEE European Symposium on Security and Privacy, 2019.
MITRE ATLAS, "ML Model Access Techniques," https://atlas.mitre.org/