Chapter 33 Quiz: Introduction to Machine Learning for Business

DataField.Dev

Chapter 33 Quiz: Introduction to Machine Learning for Business

Instructions: Choose the best answer for each question. Answer key with explanations follows all questions.

Questions

Q1. The most accurate definition of machine learning is:

a) A system that thinks and reasons like a human being b) A collection of algorithms that find patterns in historical data and apply them to new data c) A software program that automatically writes code based on business rules d) A statistical technique that works only with large amounts of data

Q2. A customer churn model trained on data from 2022 is applied to customers in 2025. The model's performance has declined significantly from its original test set results. This is most likely caused by:

a) A software bug in the scikit-learn library b) The model overfitting to the original training data c) Changes in customer behavior and product features since 2022 (model drift) d) The test set being too small in the original evaluation

Q3. Which of the following is an example of a supervised regression problem?

a) Grouping customers into segments based on purchasing behavior b) Predicting whether a customer will churn (yes or no) c) Flagging anomalous transactions for fraud review d) Predicting the dollar amount of a customer's next order

Q4. You are building a churn prediction model and your dataset has 95% non-churners and 5% churners. A model that always predicts "no churn" achieves 95% accuracy. Which evaluation metric best captures how well a model actually identifies churners?

a) Accuracy b) R² c) ROC AUC d) RMSE

Q5. Priya builds a churn model and gets 98% accuracy on the training set and 64% accuracy on the test set. What does this indicate?

a) The model is underfitting b) The model is overfitting c) The model is well-calibrated d) The test set was too large

Q6. In scikit-learn, the .fit() method is used to:

a) Make predictions on new data b) Transform data into a different format c) Train the model on labeled training data d) Evaluate the model against a test set

Q7. A StandardScaler is fit on the entire dataset (training + test combined) before the train/test split. This is an example of:

a) Cross-validation b) Feature engineering c) Regularization d) Data leakage

Q8. The purpose of the stratify=y parameter in train_test_split() is to:

a) Shuffle the data randomly before splitting b) Ensure both the training and test sets have the same class proportions as the original dataset c) Scale the features to zero mean and unit variance d) Prevent the model from seeing the test labels during training

Q9. In the context of a churn prediction model: - A customer the model predicts will churn, who then does not churn, is a: - A customer the model predicts will not churn, who then does churn, is a:

a) True Positive; True Negative b) False Negative; False Positive c) False Positive; False Negative d) True Negative; False Positive

Q10. Which metric answers the question: "Of all the customers who actually churned, what fraction did the model correctly identify?"

a) Precision b) Recall c) Specificity d) F1 Score

Q11. A company has 500 labeled training examples for a supervised classification problem. The best approach to model evaluation is:

a) Evaluate on the training data since the full dataset is small b) Use cross-validation to get a reliable performance estimate without wasting much data on a held-out test set c) Use a 50/50 train/test split d) Skip evaluation and deploy the model immediately

Q12. Which of the following is NOT a valid reason to use machine learning?

a) You have thousands of labeled examples and need to identify patterns that humans cannot articulate b) A business executive read about ML in a magazine and wants to use it c) A rule-based system misses important patterns because the decision boundary is complex d) You need to make predictions at a scale that human analysts cannot match

Q13. The Pipeline object in scikit-learn is primarily used to:

a) Speed up model training by parallel processing b) Chain preprocessing and modeling steps together so that transformers are fit on training data only c) Automatically select the best algorithm for a given dataset d) Store model results in a database

Q14. What is the key difference between model parameters and hyperparameters?

a) Parameters are used in classification; hyperparameters are used in regression b) Parameters are learned from training data; hyperparameters are set by the practitioner before training c) Parameters are stored externally; hyperparameters are computed internally d) There is no meaningful difference between the two terms

Q15. A model with R² = 0.82 on a sales forecasting task means:

a) The model is correct 82% of the time b) The model explains 82% of the variation in the sales data; 18% remains unexplained c) The average prediction error is 18% of the actual value d) The model outperforms the baseline by 82 percentage points

Q16. In cross-validation with k=5, how many times is the model trained?

a) 1 b) 5 c) 10 d) It varies depending on the dataset size

Q17. An account manager asks you: "Why does the model think this customer is going to churn?" You cannot answer this question because the model is a large ensemble of 500 decision trees. This highlights the importance of:

a) Using more training data b) Applying regularization c) Interpretability requirements in model selection d) Cross-validation

Q18. The "base rate" or "majority class baseline" is important because:

a) It gives the model a starting point for learning b) It defines the minimum performance a model must exceed to be considered useful c) It is used to set the learning rate in gradient descent d) It is the threshold used to convert probabilities to class labels

Q19. Maya is consulting for a company that wants an ML model to predict sales pipeline closures. She discovers that the two most predictive fields — budget confirmation and decision timeline — are captured in only 28% of historical records. Her most appropriate recommendation is:

a) Build the model anyway and accept higher error rates b) Use a deep learning model, which handles missing data better c) Improve data collection processes first; revisit ML when data quality meets a minimum threshold d) Use unsupervised learning instead, which does not require labeled examples

Q20. Which of the following statements about reinforcement learning is correct?

a) It is the most commonly used type of ML for business prediction problems b) It requires a large labeled dataset to train c) An agent learns by taking actions and receiving rewards or penalties based on outcomes d) It is equivalent to supervised classification on sequential data

Answer Key

Q1. Answer: b Machine learning is specifically about finding patterns in historical data and applying them to new cases. It is not AGI (a), not a code-writing tool (c), and does not exclusively require large data (d) — small datasets can sometimes support effective models.

Q2. Answer: c Model drift occurs when the real-world relationship between features and outcomes changes after the model was trained. A 2022 model applied in 2025 is susceptible to this if customer behavior, product features, or market conditions have changed. Overfitting (b) would cause poor initial performance, not degradation over time.

Q3. Answer: d Regression predicts a continuous numeric output. Predicting a dollar amount is regression. Grouping customers (a) is unsupervised clustering. Predicting churn yes/no (b) is binary classification. Fraud flagging (c) is anomaly detection or classification.

Q4. Answer: c With severe class imbalance, accuracy is misleading because the majority-class baseline already achieves high accuracy. ROC AUC measures the model's ability to discriminate between classes across all thresholds and is less sensitive to class imbalance. RMSE and R² (b, d) are regression metrics.

Q5. Answer: b A large gap between training performance (98%) and test performance (64%) is the classic signature of overfitting. The model has memorized the training data, including noise, and fails to generalize. Underfitting (a) would produce poor performance on both sets.

Q6. Answer: c .fit() trains the model — it finds the parameter values that best match the training data. .predict() makes predictions (a). .transform() applies a preprocessing step (b). .score() evaluates performance (d).

Q7. Answer: d Fitting the scaler on the combined dataset means information from the test set (its mean and standard deviation) influenced the preprocessing. This is data leakage. It leads to an overly optimistic performance estimate because the preprocessing was tuned with knowledge of the test set.

Q8. Answer: b stratify=y ensures class proportions are preserved in both the training and test splits. This is important for classification problems, especially with class imbalance. Without it, a random split might accidentally produce a test set with very few positive examples.

Q9. Answer: c A customer predicted to churn who does not = False Positive (FP): we were wrong in predicting the positive class. A customer predicted not to churn who does churn = False Negative (FN): we missed a true positive case. This distinction drives precision and recall calculations.

Q10. Answer: b Recall = TP / (TP + FN). It answers: of all actual positives (customers who actually churned), what fraction did we catch? Precision (a) answers: of all predicted positives, what fraction were correct?

Q11. Answer: b With only 500 examples, holding out 20% (100 examples) for testing leaves only 400 for training — a meaningful sacrifice on a small dataset. Cross-validation uses all data for both training and evaluation across different folds, producing more reliable performance estimates on small datasets.

Q12. Answer: b "A business executive read about it in a magazine" is business pressure, not a technical justification. ML should be adopted because it solves a specific problem better than alternatives (a, c, d), not because of trend-following.

Q13. Answer: b The Pipeline's primary value is ensuring that preprocessing steps like scaling or imputation are fit only on training data and then applied (not refit) on test data. This prevents data leakage. It does not parallelize training (a), auto-select algorithms (c), or store to databases (d).

Q14. Answer: b Parameters (like linear regression coefficients) are learned by the training algorithm from the data. Hyperparameters (like regularization strength, tree depth, number of trees) are set by the practitioner before training begins and control how the learning happens.

Q15. Answer: b R² represents the proportion of variance in the target variable explained by the model. R² = 0.82 means the model captures 82% of the variation in sales; the remaining 18% is unexplained. It does not mean 82% of predictions are correct (a) or that error is 18% (c).

Q16. Answer: b In k-fold cross-validation with k=5, the model is trained 5 times — once for each fold serving as the test set while the other 4 folds form the training set. This produces 5 performance estimates that are then averaged.

Q17. Answer: c When interpretability is a business or regulatory requirement, model selection must account for it. A complex ensemble that cannot explain individual predictions may be technically superior but operationally unsuitable. This is a fundamental trade-off in applied ML.

Q18. Answer: b The base rate baseline defines the floor of acceptable performance. If your model does not meaningfully outperform the strategy of always predicting the majority class, it has not learned anything useful from the data.

Q19. Answer: c A model trained on data where the most predictive features are available only 28% of the time will produce unreliable scores. The appropriate response is to fix the data collection process first. More sophisticated algorithms (b, d) do not fix a fundamental data availability problem.

Q20. Answer: c Reinforcement learning is defined by an agent-environment interaction loop with reward signals. It does not require labeled datasets (b), is not the most common business ML paradigm (a), and is distinct from supervised sequential modeling (d).