Chapter 33 Further Reading: AI and Machine Learning Security

Essential Standards and Frameworks

OWASP Top 10 for LLM Applications

OWASP Foundation, 2024 https://owasp.org/www-project-top-10-for-large-language-model-applications/

The authoritative reference for LLM-specific security risks. Covers prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft. Essential reading for anyone assessing LLM-based applications.

NIST AI Risk Management Framework (AI RMF 1.0)

National Institute of Standards and Technology, 2023 https://www.nist.gov/artificial-intelligence/ai-risk-management-framework

The primary US government framework for managing AI risks. Provides a structured approach to identifying, assessing, and mitigating AI-specific risks. Useful for framing assessment findings in regulatory and compliance contexts.

MITRE ATLAS (Adversarial Threat Landscape for AI Systems)

MITRE Corporation https://atlas.mitre.org/

The MITRE ATT&CK equivalent for AI systems. Provides a knowledge base of adversarial tactics, techniques, and case studies specific to ML systems. Invaluable for structuring AI security assessments and mapping findings to a standardized taxonomy.

EU AI Act

European Parliament, Regulation (EU) 2024/1689 https://eur-lex.europa.eu/eli/reg/2024/1689/oj

The world's first comprehensive AI regulation. Establishes risk-based requirements for AI systems, including mandatory security testing for high-risk applications. Understanding this regulation is essential for assessments involving European organizations or global deployments.

Books

Adversarial Machine Learning by Anthony D. Joseph, Blaine Nelson, Benjamin I.P. Rubinstein, and J.D. Tygar

Cambridge University Press, 2019 ISBN: 978-1107043466

The foundational academic textbook on adversarial ML. Covers evasion attacks, poisoning attacks, privacy attacks, and defenses with mathematical rigor. Best suited for readers who want deep technical understanding of the theoretical underpinnings.

Not with a Bug, But with a Sticker: Attacks on Machine Learning Systems and What To Do About Them by Ram Shankar Siva Kumar and Hyrum Anderson

Wiley, 2023 ISBN: 978-1119883982

An accessible and practical guide to ML security written by Microsoft's AI Red Team leads. Covers real-world attack scenarios, assessment methodologies, and defensive strategies. Highly recommended for penetration testers entering the AI security space.

AI Security and Privacy by Tianwei Zhang, Yang Liu, and Niyato Dusit

Springer, 2024 ISBN: 978-981997103-6

Comprehensive academic coverage of AI security and privacy topics including adversarial examples, data poisoning, model privacy, fairness attacks, and secure AI systems design. Strong on formal definitions and theoretical analysis.

Prompt Engineering for Generative AI by James Phoenix and Mike Taylor

O'Reilly Media, 2024 ISBN: 978-1098153434

While focused on effective prompt engineering, this book provides the foundation for understanding prompt injection from both offensive and defensive perspectives. Understanding how prompts work is essential for testing LLM applications.

Foundational Research Papers

"Explaining and Harnessing Adversarial Examples"

Goodfellow, Shlens, and Szegedy, ICLR 2015 https://arxiv.org/abs/1412.6572

The paper that introduced FGSM and demonstrated that adversarial examples are a systemic vulnerability, not an edge case. Essential reading for understanding the foundations of adversarial ML.

"Towards Evaluating the Robustness of Neural Networks"

Carlini and Wagner, IEEE S&P 2017 https://arxiv.org/abs/1608.04644

Introduced the C&W attack, one of the strongest adversarial attacks. Also demonstrated that many proposed defenses (including defensive distillation) were ineffective. Critical for understanding the state of adversarial robustness evaluation.

"Stealing Machine Learning Models via Prediction APIs"

Tramer et al., USENIX Security 2016 https://arxiv.org/abs/1609.02943

The seminal paper on model extraction attacks. Demonstrated practical extraction against production ML APIs including Amazon ML and Google Prediction API. Foundational for understanding model theft risks.

"Membership Inference Attacks Against Machine Learning Models"

Shokri et al., IEEE S&P 2017 https://arxiv.org/abs/1610.05820

Introduced the shadow model approach for membership inference. Demonstrated that ML models leak information about their training data. Essential for understanding ML privacy risks.

"Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"

Greshake et al., AISec 2023 https://arxiv.org/abs/2302.12173

The landmark paper on indirect prompt injection. Demonstrated practical attacks against LLM applications with retrieval, web browsing, and email integration. Required reading for LLM security assessment.

"Universal and Transferable Adversarial Attacks on Aligned Language Models"

Zou et al., 2023 https://arxiv.org/abs/2307.15043

Demonstrated that gradient-based optimization could generate adversarial suffixes that reliably bypass LLM safety training across multiple models. A pivotal paper showing the limitations of alignment-based defenses.

Tools and Practical Resources

Adversarial Robustness Toolbox (ART)

IBM Research https://github.com/Trusted-AI/adversarial-robustness-toolbox

The most comprehensive library for adversarial ML research and testing. Implements dozens of attacks (FGSM, PGD, C&W, DeepFool, etc.) and defenses (adversarial training, input preprocessing, certified defenses). Supports PyTorch, TensorFlow, Keras, scikit-learn, and more.

TextAttack

QData Lab, University of Virginia https://github.com/QData/TextAttack

A Python framework for adversarial attacks on NLP models. Includes character-level, word-level, and sentence-level attacks with automated evaluation of attack quality. Essential for testing text classification and sentiment analysis models.

Garak

NVIDIA https://github.com/leondz/garak

An LLM vulnerability scanner that automates prompt injection testing, jailbreak attempts, and safety evaluation across multiple LLM providers. Provides structured vulnerability reports and supports custom probes.

Counterfit

Microsoft https://github.com/Azure/counterfit

An open-source command-line tool for assessing the security of ML models. Provides a framework for running adversarial attacks against models deployed in cloud environments or locally.

AI Exploit Framework

Various Contributors https://github.com/protectai/ai-exploits

A collection of real-world AI/ML vulnerability proof-of-concepts and exploit techniques. Useful for understanding practical exploitation of ML systems.

Online Training and Labs

NVIDIA AI Red Team Learning Resources

https://developer.nvidia.com/

NVIDIA provides resources on AI red teaming, including tutorials on using Garak for LLM security assessment and adversarial robustness testing.

HuggingFace ML Security Course

https://huggingface.co/learn

HuggingFace offers free courses on ML fundamentals that provide the base knowledge needed for AI security assessment. Understanding model architectures, training procedures, and inference pipelines is prerequisite knowledge for effective AI security testing.

PortSwigger Web Security Academy — LLM Attacks

https://portswigger.net/web-security/llm-attacks

PortSwigger's free labs on LLM security, including prompt injection, insecure output handling, and tool exploitation. Practical, hands-on exercises in a guided environment.

Damn Vulnerable LLM Agent (DVLA)

https://github.com/WithSecureLabs/damn-vulnerable-llm-agent

An intentionally vulnerable LLM-powered application for practicing prompt injection, tool exploitation, and other LLM-specific attack techniques.

Conferences and Community

AAAI Conference on Artificial Intelligence — Security Track

The premier AI research conference includes security-focused papers and workshops covering adversarial ML, AI safety, and trustworthy AI.

NeurIPS — ML Safety Workshop

The annual NeurIPS conference hosts workshops dedicated to ML safety and security, featuring cutting-edge research on adversarial robustness, alignment, and AI security.

IEEE Symposium on Security and Privacy (Oakland)

The top academic security conference regularly features papers on adversarial ML, model privacy, and AI security. Many of the foundational papers referenced in this chapter were published here.

USENIX Security Symposium

Another top-tier security venue that publishes significant AI security research, particularly on practical attacks and real-world deployment security.

AI Village at DEF CON

https://aivillage.org/

The AI security community at DEF CON. Features talks, workshops, and hands-on challenges focused on attacking and defending AI systems. An excellent entry point for security professionals entering the AI security space.