Chapter 27 Quiz: Building a Complete Analytics System
Instructions
- 40 questions total
- Mix of multiple choice, true/false, and short answer
- Time limit: 50 minutes
- Passing score: 70%
Section 1: System Requirements and Design (12 questions)
Question 1
In a stakeholder analysis for a football analytics system, which group typically requires the LOWEST response time?
A) Head coach during games B) Recruiting coordinator reviewing prospects C) Athletic director reviewing quarterly reports D) Analytics staff during game preparation
Question 2
True or False: Non-functional requirements are less important than functional requirements in analytics systems.
Question 3
What is the recommended maximum API response time for real-time coaching dashboards?
A) 5 seconds B) 1 second C) 500 milliseconds D) 50 milliseconds
Question 4
Which architectural pattern is BEST suited for decoupling data ingestion from processing?
A) Monolithic architecture B) Message queue / event-driven architecture C) Direct database writes D) Synchronous API calls
Question 5
True or False: A single database can efficiently serve both real-time queries and historical analytics in a production system.
Question 6
Short Answer: List three non-functional requirements that are critical for a game-day analytics system.
Question 7
In a microservices architecture, what component typically handles communication between services?
A) Shared database B) Message broker (Kafka, RabbitMQ) C) Direct HTTP calls only D) File system
Question 8
Which storage technology is most appropriate for caching frequently accessed game state data?
A) PostgreSQL B) Redis C) S3 D) SQLite
Question 9
True or False: Role-based access control (RBAC) should restrict recruiting data to only recruiting staff and analytics personnel.
Question 10
What is the primary purpose of the "repository pattern" in system design?
A) Store data in multiple databases B) Abstract data access from business logic C) Improve query performance D) Handle user authentication
Question 11
Short Answer: Explain why horizontal scaling is preferred over vertical scaling for game-day analytics loads.
Question 12
A "service registry" in a microservices architecture is used for:
A) Registering user accounts B) Managing service discovery and dependency injection C) Storing service logs D) Scheduling background jobs
Section 2: Data Pipeline (10 questions)
Question 13
When ingesting data from external APIs, what should be the FIRST validation step?
A) Calculate EPA values B) Check data freshness C) Validate schema (required fields present) D) Compare to historical averages
Question 14
True or False: Data ingestion pipelines should immediately stop processing when a single record fails validation.
Question 15
What is "backpressure" in the context of data pipelines?
A) Storing data in reverse order B) Mechanism to slow producers when consumers can't keep up C) Compressing data for storage D) Prioritizing certain data types
Question 16
When calculating EPA, what determines the expected points BEFORE a play?
A) The play result B) Down, distance, and field position C) Player statistics D) Win probability
Question 17
Short Answer: Why is it important to log data quality issues even when they don't prevent processing?
Question 18
An "idempotent" data ingestion process means:
A) Data is processed faster B) Processing the same data multiple times produces the same result C) Data is compressed D) Processing happens in parallel
Question 19
True or False: Data transformations should be applied during ingestion rather than at query time for performance.
Question 20
What is the purpose of a "data quality score" in an analytics pipeline?
A) Ranking data sources by cost B) Measuring the completeness and accuracy of ingested data C) Prioritizing which data to process first D) Determining storage requirements
Question 21
When processing play-by-play data, EPA for a touchdown should approximately equal:
A) 1.0 B) 3.0 C) 7.0 D) Variable based on field position
Question 22
Which approach is BEST for handling late-arriving data in a streaming pipeline?
A) Reject all late data B) Use watermarks and late data handling windows C) Process late data with the next batch D) Store late data in a separate table
Section 3: Analytics Implementation (10 questions)
Question 23
In a win probability model, which feature typically has the LARGEST impact on predictions?
A) Home field advantage B) Score differential (adjusted for time) C) Current down and distance D) Weather conditions
Question 24
True or False: Win probability for the home and away teams should always sum to exactly 1.0.
Question 25
Short Answer: Describe how Win Probability Added (WPA) is calculated for a single play.
Question 26
For fourth-down decisions, the "expected win probability" of going for it equals:
A) The conversion probability B) Win probability if successful minus win probability if failed C) (P(convert) × WP if convert) + (P(fail) × WP if fail) D) Win probability after the decision
Question 27
A "leverage index" of 2.5 indicates:
A) The team is losing by 2.5 touchdowns B) The situation is 2.5x more important than average C) There are 2.5 quarters remaining D) The conversion probability is 25%
Question 28
True or False: A well-calibrated win probability model should show that teams with 80% win probability actually win approximately 80% of the time.
Question 29
When generating opponent scouting reports, which analysis should be broken down by game situation?
A) Player heights and weights B) Run/pass tendencies by down and field position C) Historical win/loss records D) Stadium capacity
Question 30
The success rate metric considers a first-down play successful if it gains:
A) Any positive yards B) At least 40% of the needed yards C) 10 or more yards D) More than the defense expected
Question 31
What is the purpose of caching model predictions in a real-time system?
A) Reduce storage costs B) Improve latency for repeated queries C) Ensure predictions are consistent D) Track model accuracy
Question 32
True or False: EPA can be negative for a play that gains positive yards.
Section 4: Operations and Deployment (8 questions)
Question 33
In a Docker deployment, which file defines multi-container applications?
A) Dockerfile B) docker-compose.yml C) package.json D) requirements.txt
Question 34
True or False: Health check endpoints should only verify database connectivity.
Question 35
What is the primary purpose of Kubernetes Horizontal Pod Autoscaler (HPA)?
A) Automatically deploy new code B) Scale pods up/down based on metrics C) Manage database connections D) Handle SSL certificates
Question 36
Short Answer: List four metrics that should be monitored for a production analytics system.
Question 37
The "circuit breaker" pattern in distributed systems is used to:
A) Physically disconnect servers B) Prevent cascading failures by stopping calls to failing services C) Encrypt data in transit D) Balance load across servers
Question 38
True or False: Production systems should log all API requests including full request bodies for debugging.
Question 39
What is the purpose of a "blue-green deployment" strategy?
A) Color-coding different environments B) Zero-downtime deployments by switching between two identical environments C) Deploying to multiple geographic regions D) Running tests before deployment
Question 40
When should automated alerts be triggered for a game-day analytics system?
A) Only when the system is completely down B) When latency, error rate, or data freshness exceed thresholds C) Every hour during games D) Only after receiving user complaints
Answer Key
Section 1: System Requirements and Design
-
C) Athletic director reviewing quarterly reports - Executive reports have the longest acceptable response times as they are used for strategic planning rather than real-time decisions.
-
False - Non-functional requirements (performance, reliability, security) are equally critical, especially for real-time systems where game-day uptime is essential.
-
C) 500 milliseconds - Real-time coaching dashboards should respond quickly enough that users don't perceive delay, typically under 500ms.
-
B) Message queue / event-driven architecture - Message queues decouple producers from consumers, allowing each to scale independently.
-
False - Production systems typically use separate storage solutions optimized for different query patterns (OLTP vs. OLAP).
-
Sample Answer: Three critical non-functional requirements: (1) 99.9% uptime during games; (2) Response time under 500ms for dashboard queries; (3) Support for 50+ concurrent users during peak game-day loads.
-
B) Message broker (Kafka, RabbitMQ) - Message brokers enable asynchronous, decoupled communication between services.
-
B) Redis - Redis provides sub-millisecond latency for key-value lookups, ideal for caching frequently accessed data.
-
True - Recruiting data is sensitive competitive information that should be restricted to personnel who need it.
-
B) Abstract data access from business logic - The repository pattern provides a clean separation between data persistence and business logic.
-
Sample Answer: Horizontal scaling (adding more servers) is preferred because: (1) It allows adding capacity on-demand for game-day spikes; (2) It's more cost-effective than continually upgrading individual servers; (3) It provides better fault tolerance through redundancy; (4) It enables geographic distribution for lower latency.
-
B) Managing service discovery and dependency injection - Service registries allow services to find each other and manage dependencies.
Section 2: Data Pipeline
-
C) Validate schema (required fields present) - Schema validation should happen first to ensure basic data structure before more complex checks.
-
False - Pipelines should log errors and continue processing valid records; stopping for single failures would make systems fragile.
-
B) Mechanism to slow producers when consumers can't keep up - Backpressure prevents system overload by coordinating flow rates.
-
B) Down, distance, and field position - Expected points before a play depends on the game situation, not the play result.
-
Sample Answer: Logging quality issues is important because: (1) Allows trend analysis to detect degrading data sources; (2) Provides context for debugging analytics anomalies; (3) Enables proactive outreach to data providers; (4) Creates an audit trail for data lineage.
-
B) Processing the same data multiple times produces the same result - Idempotency enables safe retries and exactly-once semantics.
-
True - Pre-computing transformations during ingestion improves query performance at the cost of some storage.
-
B) Measuring the completeness and accuracy of ingested data - Quality scores quantify how reliable the data is.
-
D) Variable based on field position - EPA for a touchdown is approximately 7 minus the expected points at the starting field position.
-
B) Use watermarks and late data handling windows - Modern streaming systems use watermarks to handle out-of-order data gracefully.
Section 3: Analytics Implementation
-
B) Score differential (adjusted for time) - Score differential, especially late in games, is the strongest predictor of win probability.
-
True - This is the basic property of probabilities for mutually exclusive, exhaustive outcomes.
-
Sample Answer: WPA = Win Probability after the play - Win Probability before the play. It quantifies how much a single play changed the team's likelihood of winning.
-
C) (P(convert) × WP if convert) + (P(fail) × WP if fail) - Expected value is the probability-weighted average of all outcomes.
-
B) The situation is 2.5x more important than average - Leverage index measures how much more impactful than average the current situation is.
-
True - This is the definition of calibration - predicted probabilities should match observed frequencies.
-
B) Run/pass tendencies by down and field position - Situational tendencies are crucial for game planning.
-
B) At least 40% of the needed yards - Success rate uses different thresholds by down (40% on 1st, 60% on 2nd, 100% on 3rd/4th).
-
B) Improve latency for repeated queries - Caching avoids redundant computations for identical inputs.
-
True - EPA can be negative for positive-yard plays if the situation worsened (e.g., 2nd & 10 becomes 3rd & 8).
Section 4: Operations and Deployment
-
B) docker-compose.yml - Docker Compose defines and runs multi-container applications.
-
False - Health checks should verify all critical dependencies (database, cache, external APIs, disk space, etc.).
-
B) Scale pods up/down based on metrics - HPA automatically adjusts replica count based on CPU, memory, or custom metrics.
-
Sample Answer: Four metrics to monitor: (1) API response latency (avg, p95, p99); (2) Error rate percentage; (3) Database connection pool utilization; (4) Data freshness (time since last update).
-
B) Prevent cascading failures by stopping calls to failing services - Circuit breakers allow systems to degrade gracefully.
-
False - Logging full request bodies can expose sensitive data and create storage issues; log judiciously.
-
B) Zero-downtime deployments by switching between two identical environments - Blue-green allows instant rollback and testing before switching traffic.
-
B) When latency, error rate, or data freshness exceed thresholds - Proactive alerting catches issues before users notice.
Scoring Guide
| Score | Grade | Feedback |
|---|---|---|
| 36-40 | A | Excellent systems understanding, ready for production work |
| 32-35 | B | Strong grasp of concepts, review deployment topics |
| 28-31 | C | Satisfactory, focus on data pipeline design |
| 24-27 | D | Needs improvement in architecture concepts |
| <24 | F | Re-study chapter material thoroughly |