Chapter 13 Quiz: Neural Networks Demystified

DataField.Dev

Chapter 13 Quiz: Neural Networks Demystified

Multiple Choice

Question 1. An artificial neuron performs four operations in sequence. What is the correct order?

(a) Activate, sum, weight, receive inputs
(b) Receive inputs, apply activation, weight, sum
(c) Receive inputs, weight the inputs, sum and add bias, apply activation function
(d) Sum inputs, weight the sum, apply bias, activate

Question 2. In a neural network, what does the term "weight" represent?

(a) The physical size of the network in memory
(b) The importance assigned to each input — a multiplier that determines how much influence an input has on the output
(c) The number of layers in the network
(d) The accuracy of the model on the training data

Question 3. Why are activation functions necessary in neural networks?

(a) They speed up computation by simplifying the math.
(b) They introduce non-linearity, enabling the network to learn complex, non-linear patterns.
(c) They prevent the network from using too much memory.
(d) They ensure the network always produces a positive output.

Question 4. A neural network must classify a customer review as one of five sentiment categories (very negative, negative, neutral, positive, very positive). Which activation function is most appropriate for the output layer?

(a) ReLU
(b) Sigmoid
(c) Softmax
(d) Linear (no activation)

Question 5. Professor Okonkwo compares gradient descent to hiking down a foggy mountain. In this analogy, what does the "mountain" represent?

(a) The training dataset
(b) The neural network's architecture
(c) The loss function — a surface where the height at each point represents the error for a particular set of weights
(d) The GPU's processing capacity

Question 6. What is overfitting?

(a) Using a model that is too simple for the complexity of the data
(b) Training a model on too much data, causing it to become confused
(c) A model that performs well on training data but poorly on new, unseen data because it has memorized rather than generalized
(d) A model that requires too many GPUs to train efficiently

Question 7. Which of the following is NOT a technique for preventing overfitting?

(a) Dropout
(b) Early stopping
(c) Data augmentation
(d) Increasing the number of hidden layers

Question 8. A Convolutional Neural Network (CNN) is best suited for which type of data?

(a) Structured, tabular data in rows and columns
(b) Images and spatial data
(c) Time series data only
(d) Small datasets with fewer than 100 examples

Question 9. What was the key innovation of the Transformer architecture introduced in the 2017 paper "Attention Is All You Need"?

(a) Convolutional filters that detect edges in images
(b) Recurrent connections that process sequences one element at a time
(c) Attention mechanisms that allow the model to process all elements of a sequence simultaneously and determine which parts are most relevant to each other
(d) A new type of activation function that replaced ReLU

Question 10. Transfer learning allows companies to deploy deep learning more cost-effectively because:

(a) It eliminates the need for any training data.
(b) It uses a pre-trained model as a starting point, dramatically reducing the data and compute required for a new task.
(c) It replaces GPUs with standard CPUs.
(d) It converts deep learning models into traditional ML models automatically.

Question 11. According to the Deep Learning Decision Framework, which of the following scenarios is LEAST appropriate for deep learning?

(a) Classifying thousands of product images into categories
(b) Predicting customer churn using structured demographic and transaction data, where a gradient-boosted tree already achieves strong performance
(c) Analyzing natural language in customer reviews to extract specific complaints
(d) Building a visual search feature that matches customer-uploaded photos to a product catalog

Question 12. Why are GPUs critical for deep learning?

(a) GPUs are cheaper than CPUs.
(b) GPUs can perform millions of simple mathematical operations in parallel, which matches the computational pattern of neural network training.
(c) GPUs provide better data security than CPUs.
(d) Neural networks can only run on NVIDIA hardware.

Question 13. The "start simple" principle in data science recommends:

(a) Always using deep learning because it is the most advanced approach
(b) Starting with the simplest model that could work and adding complexity only when the data proves it necessary
(c) Avoiding machine learning entirely and using rule-based systems
(d) Using deep learning for prototyping and traditional ML for production

Question 14. In the Athena Retail Group case, Ravi approved deep learning for image-based product categorization but rejected it for churn prediction. What was the primary reason for this distinction?

(a) The churn prediction model was more expensive to build.
(b) For images (unstructured data), deep learning was necessary because traditional ML cannot process raw images well. For churn prediction (structured tabular data), the existing gradient-boosted tree was competitive, and the marginal accuracy gain did not justify the additional cost and reduced interpretability.
(c) The board of directors refused to fund two deep learning projects.
(d) The churn prediction dataset was too small for any model to work.

Question 15. What is the difference between an epoch and a batch in neural network training?

(a) An epoch is one training example; a batch is a group of epochs.
(b) An epoch is one complete pass through the entire training dataset; a batch is a subset of examples processed together before a single weight update.
(c) An epoch is the final training step; a batch is the first training step.
(d) There is no difference — they are synonyms.

True or False

Question 16. True or False: A single artificial neuron is computationally equivalent to a weighted average with a threshold.

Question 17. True or False: The word "deep" in deep learning refers to the philosophical depth of the model's understanding.

Question 18. True or False: The universal approximation theorem guarantees that a neural network will find the optimal solution for any problem, given enough training time.

Question 19. True or False: For structured, tabular data, gradient-boosted trees (such as XGBoost) frequently match or outperform deep neural networks at a fraction of the cost.

Question 20. True or False: Transfer learning eliminates the need for any task-specific training data.

Short Answer

Question 21. Explain the "vanishing gradient problem" in one to two sentences. Why did it matter historically, and how has it been addressed?

Question 22. A vendor claims their deep learning model achieves "99 percent accuracy." What single question should you ask before accepting this claim, and why?

Question 23. Using the factory assembly line analogy, explain why deeper neural networks (more layers) can learn more complex patterns than shallow networks.

Question 24. Describe one scenario where a business leader should choose deep learning over traditional ML, and one scenario where traditional ML is clearly the better choice. In each case, explain which factor from the Deep Learning Decision Framework is most decisive.

Question 25. Explain why the following statement is misleading: "Neural networks work just like the human brain."

Answers are available in Appendix: Answers to Selected Exercises.