Chapter 33 Quiz: AI and Machine Learning Security

Test your understanding of AI/ML security concepts, attack techniques, and defensive strategies.

Question 1: What is the fundamental security challenge that makes prompt injection attacks possible against LLM applications?

A) LLMs have insufficient training data to recognize malicious inputs B) LLMs cannot reliably distinguish between developer instructions and user-supplied instructions C) LLMs always prioritize the most recent instruction over earlier ones D) LLMs lack built-in encryption for system prompts

Question 2: Which adversarial attack generates adversarial examples by perturbing each pixel in the direction of the gradient of the loss function with respect to the input?

A) Carlini & Wagner (C&W) attack B) DeepFool C) Fast Gradient Sign Method (FGSM) D) Jacobian-Based Saliency Map Attack (JSMA)

Question 3: A penetration tester discovers that MedSecure's medical imaging AI can be fooled by adding imperceptible noise to X-ray images. What type of attack is this?

A) Data poisoning B) Model extraction C) Adversarial evasion attack D) Membership inference

Question 4: In a data poisoning backdoor attack, which of the following best describes the behavior of the trojaned model?

A) The model performs poorly on all inputs B) The model performs normally on clean inputs but produces attacker-chosen output when a trigger pattern is present C) The model always outputs the same class regardless of input D) The model leaks training data in its outputs

Question 5: What is the primary goal of a model extraction (model stealing) attack?

A) To delete the target model from production B) To create a functionally equivalent copy of the target model by querying its API C) To modify the target model's weights in place D) To prevent the target model from making predictions

Question 6: A membership inference attack against ShopStack's fraud detection model successfully confirms that a specific transaction was in the training data. What is the privacy implication?

A) No privacy implication—transaction data is public B) It reveals that the transaction occurred and was processed by ShopStack, potentially disclosing business relationships C) It only reveals the model's architecture, not the data D) It proves the model is overfitting

Question 7: Which of the following is an example of indirect prompt injection?

A) A user directly typing "Ignore your instructions" into a chatbot B) Hiding malicious instructions in a webpage that an LLM-powered search tool will retrieve and process C) Brute-forcing the API key to access the LLM D) Sending a very long input to cause a denial of service

Question 8: According to the OWASP Top 10 for LLM Applications, which vulnerability involves LLMs with too much authority to take actions in connected systems?

A) LLM01: Prompt Injection B) LLM06: Sensitive Information Disclosure C) LLM08: Excessive Agency D) LLM10: Model Theft

Question 9: An attacker crafts a physical sticker that, when placed on a stop sign, causes an autonomous vehicle's vision system to classify it as a speed limit sign. What type of attack is this?

A) Digital adversarial example B) Physical-world adversarial patch C) Data poisoning D) Model inversion

Question 10: Which defense technique involves training a model on both clean and adversarial examples to improve its robustness?

A) Randomized smoothing B) Adversarial training C) Model distillation D) Input squeezing

Question 11: A model inversion attack against a facial recognition system is able to reconstruct recognizable images of individuals from the model's outputs. What fundamental model property does this attack exploit?

A) The model memorizes features of its training data, which can be recovered through optimization B) The model stores original training images in its weights C) The model's API returns raw training data D) The model compresses training images into its layers

Question 12: Why is returning full probability distributions (confidence scores for all classes) from an ML API more dangerous than returning only the predicted class?

A) It increases latency B) It provides attackers with more information for gradient estimation, model extraction, and adversarial example generation C) It violates data privacy regulations D) It makes the API more expensive to operate

Question 13: A 2023 study found that AI-generated spear phishing emails achieved click-through rates approximately how much higher than human-written equivalents?

A) 10% higher B) 30% higher C) 60% higher D) 200% higher

Question 14: Which framework provides a structured taxonomy of adversarial techniques against AI systems, analogous to MITRE ATT&CK for traditional cyber threats?

A) OWASP Top 10 B) NIST CSF C) MITRE ATLAS D) CIS Benchmarks

Question 15: When testing an LLM application for insecure output handling, a penetration tester makes the LLM generate <script>alert('XSS')</script> in its response, which executes in the user's browser. This demonstrates which vulnerability combination?

A) Prompt injection leading to cross-site scripting via insecure output handling B) SQL injection via the LLM C) Model extraction via output analysis D) Data poisoning of the training set

Question 16: Which of the following is NOT an effective defense against model extraction attacks?

A) Rate limiting API queries B) Returning only the predicted class label instead of full probability distributions C) Monitoring for unusual query patterns D) Making the model's architecture publicly available

Question 17: In the context of AI-powered offensive tools, "AI-enhanced fuzzing" refers to:

A) Using AI to generate random inputs without any strategy B) Using ML models to learn input grammars and target code paths more efficiently than random fuzzing C) Using AI to encrypt fuzzing payloads D) Using AI to slow down fuzzing operations for stealth

Question 18: The EU AI Act mandates security testing for which category of AI systems?

A) All AI systems regardless of risk level B) Only AI systems used by government agencies C) High-risk AI systems D) Only generative AI systems

Answer Key

1: B — The fundamental challenge is that LLMs process developer instructions (system prompt) and user input as a single text stream, with no reliable mechanism to enforce the priority of one over the other. This makes prompt injection a systemic, not implementation-specific, vulnerability.

2: C — FGSM generates adversarial examples in a single step by computing the sign of the gradient of the loss with respect to the input and adding a scaled perturbation in that direction. It is computationally efficient but generally produces weaker adversarial examples than iterative methods.

3: C — This is an adversarial evasion attack (also called an inference-time attack). The attacker crafts a perturbation to the input that causes the deployed model to make an incorrect prediction, without modifying the model itself.

4: B — A backdoor (trojan) attack creates a model that behaves normally on clean inputs, passing standard accuracy tests. Only when the specific trigger pattern is present does the model produce the attacker's desired output. This makes backdoors particularly difficult to detect.

5: B — Model extraction aims to create a substitute model that functionally replicates the target model's behavior. This is done by systematically querying the API and training a new model on the observed input-output pairs.

6: B — Confirming that a specific transaction was in the training data reveals that the transaction occurred and was processed by ShopStack, potentially disclosing customer relationships, transaction amounts, and other sensitive business information that was expected to be private.

7: B — Indirect prompt injection involves planting malicious instructions in external data sources (webpages, documents, emails) that the LLM will process. The attacker does not directly interact with the LLM; instead, the LLM encounters the malicious instructions while performing its normal function.

8: C — LLM08: Excessive Agency describes the risk when LLMs are connected to tools or systems that allow them to take actions (send emails, execute queries, modify data) without adequate access controls, human oversight, or scope limitations.

9: B — This is a physical-world adversarial patch attack. The sticker is optimized to cause targeted misclassification when captured by a camera and processed by the vision model, demonstrating that adversarial attacks can transcend the digital domain.

10: B — Adversarial training augments the training set with adversarial examples, teaching the model to correctly classify both clean and perturbed inputs. This is the most studied and widely deployed robustness technique, though it typically incurs a small accuracy penalty on clean data.

11: A — Models, particularly deep neural networks, memorize features and patterns from their training data. Model inversion exploits this by optimizing an input to maximize the model's confidence for a target class, effectively reconstructing characteristic features of the training data.

12: B — Full probability distributions give attackers precise information about decision boundaries, enabling gradient estimation for black-box adversarial attacks, more efficient model extraction, and better membership inference. Returning only the top-1 class label significantly reduces information leakage.

13: C — Research has shown that AI-generated spear phishing emails achieved approximately 60% higher click-through rates compared to human-crafted equivalents, due to better grammar, more convincing personalization, and more effective social engineering.

14: C — MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) provides a knowledge base of adversarial techniques targeting AI/ML systems, structured similarly to the MITRE ATT&CK framework used for traditional cybersecurity threats.

15: A — This is a two-step vulnerability: first, prompt injection causes the LLM to generate malicious HTML/JavaScript; then, insecure output handling (rendering the LLM's response as raw HTML) allows the script to execute in the user's browser, resulting in XSS.

16: D — Making the model's architecture public does not defend against extraction—it actually makes extraction easier by telling the attacker what architecture to use for the substitute model. All other options are legitimate defenses: rate limiting restricts query volume, returning only labels reduces information per query, and monitoring can detect extraction attempts.

17: B — AI-enhanced fuzzing uses ML models to learn the structure of valid inputs, identify code paths that are more likely to contain bugs, and generate test cases more efficiently than purely random fuzzing. This has been shown to significantly improve bug discovery rates.

18: C — The EU AI Act establishes a risk-based framework where high-risk AI systems (used in areas like healthcare, law enforcement, education, and critical infrastructure) are subject to mandatory requirements including security testing, risk assessment, and documentation.