Quiz: Chapter 24
Recommender Systems: Collaborative Filtering, Content-Based, and Hybrid Approaches
Instructions: Answer all questions. Multiple-choice questions have one correct answer unless otherwise stated. Short-answer questions should be answered in 2-4 sentences.
Question 1 (Multiple Choice)
In user-based collaborative filtering, the prediction for user A on item X is computed by:
- A) Averaging the ratings of all users on item X
- B) Averaging the ratings of the most similar users on item X, weighted by their similarity to user A
- C) Computing the cosine similarity between item X and items user A has rated
- D) Multiplying user A's latent factor vector by item X's latent factor vector
Answer: B) Averaging the ratings of the most similar users on item X, weighted by their similarity to user A. User-based CF identifies the k users most similar to the target user (by their rating patterns), then predicts the target's rating as the similarity-weighted average of those neighbors' ratings on the target item. Choice A ignores similarity entirely. Choice C describes item-based CF. Choice D describes matrix factorization (SVD).
Question 2 (Multiple Choice)
Which of the following is the primary advantage of item-based CF over user-based CF in production?
- A) Item-based CF produces more accurate predictions
- B) Item-based CF can handle new users with no history
- C) Item-item similarity is more stable over time than user-user similarity
- D) Item-based CF does not require a user-item matrix
Answer: C) Item-item similarity is more stable over time than user-user similarity. An item's profile (who interacts with it) changes slowly as new users discover it, while a user's preferences can shift rapidly. This stability means item-item similarity can be precomputed and cached, reducing the computational cost of serving recommendations. Amazon's 2003 paper on item-based CF cited this stability as a key advantage. Choice B is wrong because neither item-based nor user-based CF can handle users with no history.
Question 3 (Multiple Choice)
A content-based recommender uses TF-IDF vectors of product descriptions to compute item similarity. What is the main limitation of this approach?
- A) It cannot handle new items that have no interaction history
- B) It tends to recommend only items similar to what the user has already liked (filter bubble)
- C) It requires explicit ratings from users
- D) It cannot scale beyond 1,000 items
Answer: B) It tends to recommend only items similar to what the user has already liked (filter bubble). Content-based filtering recommends items whose features match the user's profile, which is built from past preferences. It cannot discover that a user who likes action movies might enjoy a documentary, because the features are dissimilar. Choice A is wrong because content-based filtering actually solves the new-item cold start problem --- new items with content features can be recommended immediately.
Question 4 (Multiple Choice)
In SVD-based matrix factorization for recommenders, what do the latent factors represent?
- A) The explicit features of each item (genre, price, category)
- B) Hidden dimensions of preference discovered automatically from rating patterns
- C) The principal components of the user demographic data
- D) The TF-IDF weights of item descriptions
Answer: B) Hidden dimensions of preference discovered automatically from rating patterns. SVD decomposes the user-item matrix into user-factor and item-factor matrices. The factors are not predefined --- they emerge from the data. A factor might capture something interpretable like "prefers action movies" or something abstract that has no clean label. The key property is that these factors compress the sparse user-item matrix into a dense, low-rank representation that generalizes better.
Question 5 (Multiple Choice)
A recommender model achieves RMSE = 0.82 on held-out ratings and NDCG@10 = 0.45. A second model achieves RMSE = 0.91 and NDCG@10 = 0.52. Which model should you deploy for a top-10 product recommendation widget?
- A) Model 1, because lower RMSE indicates better predictions
- B) Model 2, because higher NDCG@10 indicates better ranking of relevant items
- C) Either model, because the metrics are measuring the same thing
- D) Neither model, because NDCG below 0.60 is unacceptable
Answer: B) Model 2, because higher NDCG@10 indicates better ranking of relevant items. The task is top-10 recommendation, which is a ranking problem. NDCG@10 directly measures ranking quality: whether relevant items appear near the top of the list. RMSE measures rating prediction accuracy, which is a different task. A model can predict ratings poorly but still rank items correctly (or vice versa). Choice D is wrong because the acceptability of a NDCG score depends on the domain, catalog size, and sparsity --- 0.52 may be excellent for a very sparse dataset.
Question 6 (Short Answer)
Explain the cold start problem for new items. Why can collaborative filtering not handle it, and how does content-based filtering solve it?
Answer: The new-item cold start problem occurs when an item is added to the catalog with no interaction history. Collaborative filtering relies on the co-occurrence of items in user histories to compute item-item similarity or to include the item in the user-item matrix. Without interactions, the item has no row/column in the interaction matrix and no neighbors. Content-based filtering solves this by using the item's features (text descriptions, categories, metadata) to compute similarity with existing items. A new movie with known genre, cast, and plot keywords can be matched to the user's profile immediately. This is why production systems use hybrid approaches: CF for established items, content-based for new ones.
Question 7 (Multiple Choice)
Which statement about implicit feedback is correct?
- A) A missing entry in the user-item matrix means the user dislikes the item
- B) Implicit feedback is less abundant than explicit feedback
- C) A missing entry in the user-item matrix is ambiguous --- the user may not have seen the item
- D) Implicit feedback can be directly used with the surprise library's SVD implementation
Answer: C) A missing entry in the user-item matrix is ambiguous --- the user may not have seen the item. This is the fundamental difference between explicit and implicit feedback. In explicit feedback, a missing rating means the user has not rated the item. In implicit feedback, a missing interaction could mean the user saw it and was not interested, or the user never saw it at all. This ambiguity is why implicit feedback algorithms (like ALS) treat missing entries differently from zero entries, using confidence weights. Choice A is wrong because it assumes missing = dislike, which conflates "not seen" with "not liked." Choice B is wrong because implicit feedback is far more abundant.
Question 8 (Multiple Choice)
NDCG@10 = 0.75 for a recommender system. What does this mean?
- A) 75% of recommended items are relevant
- B) The system achieves 75% of the ideal ranking quality for the top 10 positions
- C) 75% of users received at least one relevant recommendation in the top 10
- D) The average precision across all users is 0.75
Answer: B) The system achieves 75% of the ideal ranking quality for the top 10 positions. NDCG compares the actual Discounted Cumulative Gain (which penalizes relevant items appearing at lower positions) to the ideal DCG (where all relevant items are ranked at the very top). An NDCG of 0.75 means the system's ranking is 75% as good as the best possible ranking. Choice C describes Hit Rate, not NDCG. Choice D describes MAP.
Question 9 (Short Answer)
A colleague proposes evaluating a recommender by training on all data from January through November and testing on December data. Why might this overstate the model's real-world performance compared to a leave-one-out per-user evaluation?
Answer: The temporal split overstates performance because it leaks future information during training. Items that become popular in December may already have many interactions in the training data from earlier months, making them easy to recommend. In production, the model must recommend items in real time without knowing what will be popular next month. Additionally, temporal patterns (holiday shopping, seasonal content) in December may not reflect year-round behavior. A per-user leave-one-out evaluation more faithfully simulates the production task: given this user's history up to now, predict what they will interact with next. However, the temporal split does correctly simulate the deployment scenario of "trained on past data, tested on future data," which per-user random splits do not.
Question 10 (Multiple Choice)
A weighted hybrid recommender uses the formula: score = 0.7 * CF_score + 0.3 * content_score. For a new user with 0 interactions, the CF score is undefined. What is the correct behavior?
- A) Set CF_score = 0 and proceed with the weighted formula
- B) Fall back to pure content-based scoring (or popularity if no content profile exists)
- C) Set CF_score = 3.0 (the midpoint of a 1-5 scale) as a default
- D) Skip the user entirely and show no recommendations
Answer: B) Fall back to pure content-based scoring (or popularity if no content profile exists). A switching hybrid handles cold start by changing the recommendation strategy based on available data, not by substituting arbitrary defaults into a formula designed for users with interaction history. Setting CF_score = 0 (choice A) would systematically penalize items with high content scores. Setting CF_score = 3.0 (choice C) adds noise that dilutes the content signal. Showing nothing (choice D) is the worst outcome --- the new user sees a blank page and churns.
Question 11 (Short Answer)
Explain why a popularity baseline is a surprisingly strong benchmark for recommender systems. Under what conditions will a personalized recommender struggle to outperform it?
Answer: A popularity baseline recommends the most globally popular items to every user. It is strong because popular items are popular for a reason --- they appeal to broad audiences. In domains with concentrated preferences (e.g., blockbuster movies, top-40 music), most users will engage with popular items regardless of personalization. A personalized recommender struggles to beat popularity when: (1) the user-item matrix is extremely sparse, giving insufficient data for personalization; (2) user preferences are homogeneous (everyone likes the same things); (3) the catalog is small, so popular items dominate; or (4) the evaluation metric is Hit Rate@K with large K, where popularity has a natural advantage because popular items are likely to be in anyone's test set.
Question 12 (Multiple Choice)
Which combination of techniques best addresses both the new-user and new-item cold start problems simultaneously?
- A) User-based CF + item-based CF
- B) Content-based filtering + popularity baseline
- C) SVD + content-based filtering
- D) NMF + ALS
Answer: B) Content-based filtering + popularity baseline. Content-based filtering handles the new-item cold start (items with features but no interactions can be recommended). The popularity baseline handles the new-user cold start (users with no history receive globally popular items). Neither user-based CF, item-based CF, SVD, nor NMF can handle new users or new items without interaction data. Choice C partially works (content-based handles new items), but SVD alone cannot handle new users.
Question 13 (Short Answer)
You are building a recommender for an e-commerce site. The product catalog changes daily (new items added, old items removed). How does this affect your choice between collaborative filtering and content-based methods? What maintenance does a production recommender require?
Answer: High catalog turnover favors content-based or hybrid methods because new items arrive daily without interaction history. Pure CF cannot recommend new products until enough users interact with them, creating a chicken-and-egg problem: users cannot discover new items, so new items never accumulate interactions. Content-based filtering can recommend new items immediately based on their product descriptions and categories. Maintenance requirements include: retraining the model periodically (daily or weekly) to incorporate new interactions, removing delisted items from the candidate pool, monitoring recommendation diversity (to avoid stale recommendations), and tracking whether recommendation quality degrades over time (model decay). The item-feature index for content-based filtering must also be updated as the catalog changes.
Question 14 (Multiple Choice)
MAP@10 differs from NDCG@10 primarily because:
- A) MAP uses binary relevance (relevant or not), while NDCG can use graded relevance
- B) MAP is always higher than NDCG for the same recommendation list
- C) MAP penalizes irrelevant items at the top of the list more than NDCG does
- D) MAP considers only the first relevant item, while NDCG considers all relevant items
Answer: A) MAP uses binary relevance (relevant or not), while NDCG can use graded relevance. MAP computes precision at each rank position where a relevant item appears, then averages. It treats relevance as binary. NDCG uses a gain function that can assign different scores to different levels of relevance (e.g., a 5-star rated item is more relevant than a 3-star rated item). Both metrics penalize relevant items appearing at lower positions, but they use different discount functions: NDCG uses a logarithmic discount, while MAP uses a precision-based formulation.
Question 15 (Short Answer)
A recommender system for a video streaming platform consistently recommends action movies to users who have watched one action movie. Users complain that recommendations are "boring" and "all the same." Diagnose the problem and propose two specific technical solutions.
Answer: This is the filter bubble (or diversity) problem. The system, likely content-based or a CF model trained on limited history, has over-specialized to the user's narrow initial signal. Two solutions: (1) Add an exploration component --- inject a fraction (e.g., 20%) of recommendations from outside the user's dominant genre, either randomly or by using an epsilon-greedy or Thompson sampling strategy. This balances exploitation (recommending what the model thinks the user will like) with exploration (discovering new preferences). (2) Apply a diversity re-ranking step --- after generating the initial ranked list, re-rank to maximize intra-list diversity (e.g., ensure no more than 3 of the top 10 recommendations share the same genre) using a maximal marginal relevance (MMR) algorithm.
This quiz covers Chapter 24: Recommender Systems. Return to the chapter for full context.