Chapter 25: Further Reading — Cybersecurity and AI Systems

Foundational Research on Adversarial Attacks

1. Szegedy, Christian, et al. "Intriguing Properties of Neural Networks." International Conference on Learning Representations (ICLR), 2014. The paper that formally identified adversarial examples — the finding that neural network classifiers can be reliably fooled by small, mathematically engineered perturbations to inputs. This is the foundational paper for the adversarial machine learning field.

2. Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and Harnessing Adversarial Examples." ICLR, 2015. Introduced the Fast Gradient Sign Method (FGSM) — an efficient technique for generating adversarial examples — and offered a theoretical explanation for why neural networks are vulnerable to adversarial perturbation. Also introduced the concept of adversarial training as a defense.

3. Carlini, Nicholas, and David Wagner. "Evaluating the Robustness of Neural Networks: An Adversarial Approach." IEEE Symposium on Security and Privacy, 2017. The Carlini-Wagner attack paper, which demonstrated that many proposed adversarial defenses were ineffective against adaptive attackers. This paper established methodological standards for adversarial robustness evaluation that the field still uses.

4. Eykholt, Kevin, et al. "Robust Physical-World Attacks on Deep Learning Visual Classification." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. The original stop sign adversarial attack paper. Demonstrated that adversarial attacks work against real deployed classifiers using physical objects in the real world, including the iconic "STOP sign classified as speed limit" result.

Data Poisoning and Backdoor Attacks

5. Chen, Xinyun, et al. "Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning." arXiv, 2017. One of the foundational papers on backdoor attacks in deep learning. Demonstrated that training data poisoning can insert hidden backdoor behaviors activated by specific trigger patterns. Essential reading for understanding the supply chain security risk.

6. Goldblum, Micah, et al. "Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses." IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022. A comprehensive survey of data poisoning and backdoor attacks across machine learning domains, with coverage of defenses. Useful for understanding the state of the field.

Model Privacy and Inference Attacks

7. Carlini, Nicholas, et al. "Extracting Training Data from Large Language Models." USENIX Security, 2021. Demonstrates that large language models memorize and reproduce verbatim training data — including personal information — when prompted in specific ways. This paper established that LLM training data privacy cannot be assumed to be protected by the abstraction of training.

8. Shokri, Reza, et al. "Membership Inference Attacks Against Machine Learning Models." IEEE Symposium on Security and Privacy, 2017. The foundational paper on membership inference attacks — determining whether specific data was in a model's training set. Demonstrated that membership inference is feasible against real deployed machine learning models.

9. Fredrikson, Matt, Somesh Jha, and Thomas Ristenpart. "Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures." ACM Conference on Computer and Communications Security, 2015. Demonstrated that model inversion — reconstructing training data from model outputs — is feasible against real models using only API access to confidence scores.

Offensive AI

10. FBI Internet Crime Complaint Center. "Internet Crime Report." Annual. The FBI's annual IC3 report is the most comprehensive source for documented cybercrime losses in the United States. Business email compromise data, phishing statistics, and emerging threat categories are reported annually. Available at ic3.gov. The most recent available report should be used.

11. SlashNext. "The State of Phishing 2023." An industry threat intelligence report documenting AI-enabled phishing trends, including quantitative analysis of phishing volume changes following the release of generative AI tools. Freely available from the SlashNext website.

12. Brundage, Miles, et al. "The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation." University of Oxford and University of Cambridge Technical Report, 2018. A widely cited analysis of how AI could be misused for digital, physical, and political attacks. Covers phishing, social engineering, autonomous cyber weapons, and information operations. Freely available online. Predates the generative AI era but provides a useful analytical framework.

Defensive AI Security

13. National Institute of Standards and Technology. "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations." NIST AI 100-2, 2024. NIST's authoritative taxonomy of adversarial machine learning attacks and defenses. Provides standardized terminology and a comprehensive overview of the threat and defense landscape. Freely available at nist.gov.

14. National Institute of Standards and Technology. "AI Risk Management Framework (AI RMF 1.0)." NIST AI 100-1, 2023. The NIST AI RMF provides a framework for managing AI risks including security risks. The RMF's Map, Measure, Manage, Govern structure is the primary US government framework for AI risk management. Freely available at nist.gov.

LLM Security

15. Perez, Fábio, and Ian Ribeiro. "Ignore Previous Prompt: Attack Techniques for Language Models." arXiv, 2022. An early systematic study of prompt injection attacks against language models. Demonstrated that simple prompt injection techniques could reliably override system prompts in various language models. The starting point for understanding prompt injection as a security challenge.

16. Greshake, Kai, et al. "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injections." arXiv, 2023. Extended prompt injection research to indirect attacks — injecting malicious instructions through external content retrieved by LLM agents (web pages, emails, documents). This is the attack vector most relevant to enterprise LLM deployments.*

Regulatory and Policy

17. European Union Agency for Cybersecurity (ENISA). "Cybersecurity of AI and Standardisation." ENISA, 2023. ENISA's analysis of AI cybersecurity requirements and the standards landscape. Provides context for understanding the EU AI Act's security requirements and the state of AI security standardization. Freely available at enisa.europa.eu.

18. Cybersecurity and Infrastructure Security Agency (CISA). "Guidelines for Secure AI System Development." CISA, 2024. Co-authored by CISA and international cybersecurity agencies, this guidance provides practical recommendations for AI security including secure design, supply chain security, and incident response. Freely available at cisa.gov. Directly applicable to organizations building and deploying AI systems.