Chapter 26 Quiz: Real-Time Analytics Systems

Instructions

  • 35 questions total
  • Mix of multiple choice, true/false, and short answer
  • Time limit: 45 minutes
  • Passing score: 70%

Section 1: System Architecture (10 questions)

Question 1

In a real-time analytics system, the "ingestion layer" is responsible for:

A) Storing historical data B) Receiving and queuing incoming data streams C) Rendering visualizations D) Training machine learning models


Question 2

Which data structure is most appropriate for a message queue in event-driven architectures?

A) Array B) Stack C) Queue (FIFO) D) Binary tree


Question 3

True or False: Real-time analytics systems should process all events synchronously in the main application thread.


Question 4

What is the primary purpose of an in-memory cache (like Redis) in a real-time system?

A) Permanent data storage B) Fast access to frequently used data C) Running machine learning models D) Generating reports


Question 5

Short Answer: Explain the difference between "push" and "pull" data delivery in real-time systems. Which is typically better for live sports analytics?


Question 6

In a microservices architecture for real-time analytics, what component typically handles communication between services?

A) Database B) Message broker (e.g., Kafka, RabbitMQ) C) Web browser D) CSV files


Question 7

True or False: Horizontal scaling (adding more servers) is generally more effective than vertical scaling (upgrading server specs) for handling variable game-day loads.


Question 8

The "processing layer" in a real-time architecture typically handles:

A) User authentication B) Data validation, feature engineering, and model scoring C) Physical server maintenance D) Network routing


Question 9

What protocol is commonly used for bidirectional real-time communication between server and client dashboards?

A) HTTP GET B) FTP C) WebSocket D) SMTP


Question 10

True or False: Event ordering is guaranteed when using multiple parallel processors without additional coordination.


Section 2: Data Processing (10 questions)

Question 11

When processing a stream of play-by-play events, what type of validation should be performed first?

A) Statistical validation B) Schema/field validation (required fields present) C) Historical comparison D) Model prediction


Question 12

A "data quality score" of 0.95 means:

A) 95% of events are exactly correct B) 5% of events failed validation C) The system is 95% efficient D) 95% of expected data is present and valid


Question 13

Short Answer: Why is it important to validate that scores only increase (never decrease) when processing game events?


Question 14

True or False: Data from external APIs should be trusted without validation since they are professional services.


Question 15

When a tracking data frame has only 20 players detected (instead of 22), the system should:

A) Reject the frame entirely B) Log a warning but continue processing C) Crash and restart D) Fabricate data for missing players


Question 16

What is "backpressure" in streaming data systems?

A) Historical data analysis B) Mechanism to slow data production when consumers can't keep up C) Data compression technique D) Security measure


Question 17

True or False: Real-time systems should always prioritize accuracy over latency.


Question 18

When converting raw tracking coordinates to analytical features, what should be done first?

A) Run machine learning prediction B) Standardize coordinate system (e.g., offense moving left-to-right) C) Generate visualizations D) Archive raw data


Question 19

A "late-arriving event" is:

A) An event from a future game B) An event that arrives after subsequent events have been processed C) An invalid event D) A halftime event


Question 20

What is the purpose of event "idempotency" in stream processing?

A) Faster processing B) Ensuring reprocessing the same event doesn't cause duplicate effects C) Better compression D) Improved security


Section 3: Win Probability and Decision Support (8 questions)

Question 21

Win probability models should update:

A) Only at quarter breaks B) After every play or significant event C) Once per game D) Every hour


Question 22

A "leverage index" measures:

A) How close the game is (importance of current situation) B) The quarterback's performance C) Database storage requirements D) Network bandwidth


Question 23

Short Answer: What three options should a fourth-down decision support system analyze, and what factors affect each?


Question 24

True or False: Win probability should always equal exactly 50% at the start of a game.


Question 25

When the home team's win probability is 0.75, the away team's win probability is:

A) 0.75 B) 0.25 C) 0.50 D) Cannot be determined


Question 26

"Win Probability Added (WPA)" for a play is calculated as:

A) Home score minus away score B) Win probability after the play minus win probability before C) Total yards gained D) Time remaining divided by score differential


Question 27

True or False: A well-calibrated win probability model should show that teams with 70% win probability actually win about 70% of the time.


Question 28

Fourth-down conversion probability primarily depends on:

A) Weather conditions B) Distance to gain C) Jersey colors D) Stadium capacity


Section 4: Implementation and Deployment (7 questions)

Question 29

What is the recommended maximum latency for a real-time sports analytics system?

A) 5 seconds B) 500 milliseconds C) 100 milliseconds or less D) 10 minutes


Question 30

True or False: Docker containers are useful for deploying real-time analytics systems because they ensure consistent environments across development and production.


Question 31

Short Answer: What metrics should be monitored for a production real-time analytics system?


Question 32

When a real-time system experiences a spike in latency, the first diagnostic step should be:

A) Restart all servers B) Check queue depth and processing backlog C) Delete all data D) Disable all features


Question 33

"Horizontal Pod Autoscaling" in Kubernetes:

A) Makes pods taller B) Automatically adjusts the number of pod replicas based on load C) Changes database schemas D) Rotates log files


Question 34

True or False: Production systems should expose health check endpoints for monitoring.


Question 35

The primary purpose of a "circuit breaker" pattern in real-time systems is:

A) Electrical safety B) Preventing cascading failures when a dependency fails C) Improving database queries D) Generating reports


Answer Key

Section 1: System Architecture

  1. B) Receiving and queuing incoming data streams - The ingestion layer handles initial data reception and buffering.

  2. C) Queue (FIFO) - FIFO queues ensure events are processed in order.

  3. False - Real-time systems typically use asynchronous processing, often with multiple threads or workers, to maintain responsiveness.

  4. B) Fast access to frequently used data - In-memory caches provide sub-millisecond access times for hot data.

  5. Sample Answer: Push delivery sends data to consumers as it becomes available (server initiates), while pull delivery requires consumers to request data. Push (using WebSockets or server-sent events) is typically better for live sports because it minimizes latency - updates reach dashboards immediately without waiting for polling intervals.

  6. B) Message broker (e.g., Kafka, RabbitMQ) - Message brokers decouple services and handle reliable message delivery.

  7. True - Horizontal scaling allows adding capacity on demand and is more cost-effective for variable loads like game days.

  8. B) Data validation, feature engineering, and model scoring - The processing layer transforms raw data into analytical outputs.

  9. C) WebSocket - WebSockets enable full-duplex, low-latency communication between server and client.

  10. False - Parallel processing can result in out-of-order delivery; coordination (sequence numbers, ordering guarantees) is needed.

Section 2: Data Processing

  1. B) Schema/field validation (required fields present) - Basic validation should occur first before more complex checks.

  2. D) 95% of expected data is present and valid - Quality scores measure overall data completeness and correctness.

  3. Sample Answer: Scores that decrease indicate data errors - either duplicate events, incorrect sequencing, or corrupted data. In football, scores can only increase (points are added, not removed). Detecting decreasing scores helps identify data quality issues immediately and prevents incorrect analytics.

  4. False - All external data should be validated. APIs can have bugs, network issues can corrupt data, and formats can change unexpectedly.

  5. B) Log a warning but continue processing - Missing players is a data quality issue but shouldn't halt processing; the system should degrade gracefully.

  6. B) Mechanism to slow data production when consumers can't keep up - Backpressure prevents system overload by coordinating producer/consumer rates.

  7. False - The tradeoff depends on the use case. For real-time decision support, timely (slightly less accurate) results are often more valuable than perfect results that arrive too late.

  8. B) Standardize coordinate system - Consistent coordinate systems are essential before any analysis.

  9. B) An event that arrives after subsequent events have been processed - Late events can occur due to network delays or data source timing.

  10. B) Ensuring reprocessing the same event doesn't cause duplicate effects - Idempotency allows safe retries and exactly-once semantics.

Section 3: Win Probability and Decision Support

  1. B) After every play or significant event - Win probability should reflect the current game state in real-time.

  2. A) How close the game is (importance of current situation) - High leverage means the current play has significant impact on win probability.

  3. Sample Answer: The three options are: (1) Go for it - affected by distance to gain, down, offensive capabilities; (2) Field goal - affected by distance/accuracy, score differential, time remaining; (3) Punt - affected by field position, punter ability, coverage team. Each option's expected value depends on success probability × win probability if successful + failure probability × win probability if failed.

  4. False - Home field advantage typically gives the home team a starting win probability around 53-57%.

  5. B) 0.25 - Win probabilities must sum to 1.00 (excluding ties, which are impossible in college football).

  6. B) Win probability after the play minus win probability before - WPA quantifies a play's impact on game outcome.

  7. True - This is the definition of calibration - predicted probabilities should match observed frequencies.

  8. B) Distance to gain - Conversion rates drop significantly as distance increases.

Section 4: Implementation and Deployment

  1. C) 100 milliseconds or less - Real-time systems should provide near-instant feedback.

  2. True - Docker provides consistent environments, reproducible deployments, and easy scaling.

  3. Sample Answer: Key metrics include: (1) Processing latency (avg, p95, p99); (2) Throughput (events per second); (3) Error rate; (4) Queue depth/backlog; (5) Memory usage; (6) CPU utilization; (7) Active connections; (8) Data quality scores.

  4. B) Check queue depth and processing backlog - High latency often indicates processing can't keep up with input rate.

  5. B) Automatically adjusts the number of pod replicas based on load - HPA scales capacity based on metrics like CPU or custom metrics.

  6. True - Health endpoints enable load balancers and monitoring systems to detect and respond to issues.

  7. B) Preventing cascading failures when a dependency fails - Circuit breakers stop calling failed services, allowing systems to degrade gracefully.


Scoring Guide

Score Grade Feedback
32-35 A Excellent real-time systems understanding
28-31 B Good grasp, review deployment concepts
25-27 C Satisfactory, focus on data processing
21-24 D Needs improvement in core concepts
<21 F Re-study chapter material