11 min read

The intersection of computer vision and basketball analytics represents one of the most rapidly evolving frontiers in sports technology. While traditional analytics relied on manually recorded statistics and structured tracking data, computer vision...

Chapter 27: Computer Vision and Video Analysis

Introduction

The intersection of computer vision and basketball analytics represents one of the most rapidly evolving frontiers in sports technology. While traditional analytics relied on manually recorded statistics and structured tracking data, computer vision enables the automatic extraction of rich, detailed information directly from video footage. This chapter explores how modern computer vision techniques are transforming basketball analysis, from automated player tracking to sophisticated pose estimation for biomechanical assessment.

Computer vision in basketball encompasses a broad range of applications: detecting and tracking players and the ball, recognizing specific actions and plays, analyzing shooting form through pose estimation, and generating insights that were previously impossible to capture at scale. The democratization of these technologies means that what was once available only to professional teams with million-dollar camera systems is increasingly accessible to college programs, high schools, and even individual players.

This chapter assumes familiarity with basic programming concepts and some exposure to Python. While we will discuss machine learning and deep learning concepts, our focus is on practical application rather than theoretical foundations. Readers seeking deeper understanding of the underlying algorithms should consult the references in the Further Reading section.


27.1 Overview of Tracking Technology

27.1.1 The Evolution of Player Tracking

The history of automated player tracking in basketball spans several decades, with each generation bringing increased accuracy and decreased cost.

Early Systems (1990s-2000s)

The earliest tracking systems relied on GPS or RFID tags worn by players. While groundbreaking at the time, these systems had significant limitations: - GPS systems lacked the precision needed for indoor sports - RFID systems required extensive infrastructure installation - Both approaches could only track players, not the ball - Sampling rates were insufficient for capturing rapid movements

Optical Tracking Systems (2010s)

The introduction of optical tracking systems, particularly SportVU (later acquired by STATS and now part of Second Spectrum), revolutionized basketball analytics. These systems use multiple cameras mounted in arena rafters to track players and the ball at 25 frames per second.

Key characteristics of optical tracking: - Six synchronized cameras capture the entire court - Computer vision algorithms identify and track each player - The ball is tracked separately using its distinctive shape and color - Position data is accurate to within a few inches - No wearable devices required for players

Modern Hybrid Systems

Current state-of-the-art systems often combine multiple technologies: - High-resolution cameras (60+ fps) for detailed analysis - Machine learning for improved tracking accuracy - Real-time processing capabilities - Integration with broadcast video - Portable systems for practice facilities

27.1.2 Data Generated by Tracking Systems

Modern tracking systems generate enormous volumes of data. A single NBA game produces:

Data Type Approximate Volume
Player positions ~2.5 million data points
Ball positions ~250,000 data points
Derived metrics ~10,000 calculated values
Raw video ~500 GB (all cameras)

This data enables analysis at multiple levels: - Frame-level: Individual positions at each moment - Event-level: Specific plays, shots, passes - Possession-level: Offensive and defensive sets - Game-level: Aggregate statistics and patterns - Season-level: Longitudinal trends and development

27.1.3 Challenges in Basketball Tracking

Basketball presents unique challenges for computer vision systems:

Occlusion: Players frequently obstruct views of other players and the ball. A single camera perspective inevitably misses key moments.

Speed: The ball can travel over 50 mph on fast passes, and players can accelerate and decelerate rapidly, requiring high frame rates to capture accurately.

Visual Similarity: Players on the same team wear identical uniforms, making individual identification challenging without additional features like jersey numbers.

Environmental Variability: Different arenas have different lighting conditions, court colors, and camera positions, requiring systems to generalize across venues.

Three-Dimensional Inference: Cameras capture 2D projections of 3D space, requiring sophisticated algorithms to infer depth and height.


27.2 Pose Estimation for Basketball

27.2.1 Fundamentals of Pose Estimation

Pose estimation refers to the task of identifying the positions of a person's body parts (joints or keypoints) from images or video. In basketball, pose estimation enables:

  • Analysis of shooting form
  • Assessment of defensive stance
  • Evaluation of jumping mechanics
  • Detection of potential injury risk movements
  • Comparison of technique across players

A typical pose estimation system identifies 17-25 keypoints on the human body:

Keypoint Map (COCO format):
0: Nose
1: Left Eye
2: Right Eye
3: Left Ear
4: Right Ear
5: Left Shoulder
6: Right Shoulder
7: Left Elbow
8: Right Elbow
9: Left Wrist
10: Right Wrist
11: Left Hip
12: Right Hip
13: Left Knee
14: Right Knee
15: Left Ankle
16: Right Ankle

27.2.2 OpenPose

OpenPose, developed at Carnegie Mellon University, was one of the first real-time multi-person pose estimation systems. It uses a bottom-up approach that first detects all body parts in an image and then associates them with individual people.

Key Features: - Detects body, hand, and face keypoints - Works with multiple people simultaneously - Real-time performance on GPU hardware - Well-documented and widely used

Architecture Overview:

OpenPose uses a two-branch multi-stage CNN: 1. Confidence Maps Branch: Predicts the probability of each keypoint being present at each location 2. Part Affinity Fields (PAFs) Branch: Predicts associations between body parts to enable multi-person parsing

Basketball Applications:

"""
Example: Basic OpenPose integration for basketball analysis
Note: Requires OpenPose installation and compatible GPU
"""

import cv2
import numpy as np
from openpose import pyopenpose as op

def setup_openpose():
    """Initialize OpenPose with basketball-appropriate settings."""
    params = {
        "model_folder": "models/",
        "model_pose": "BODY_25",  # More keypoints than COCO
        "net_resolution": "-1x368",  # Balance speed/accuracy
        "scale_number": 3,  # Multi-scale detection
        "scale_gap": 0.25
    }

    op_wrapper = op.WrapperPython()
    op_wrapper.configure(params)
    op_wrapper.start()

    return op_wrapper

def analyze_shooting_form(op_wrapper, frame):
    """
    Extract keypoints relevant to shooting form analysis.

    Returns:
        dict: Keypoint positions and calculated angles
    """
    datum = op.Datum()
    datum.cvInputData = frame
    op_wrapper.emplaceAndPop(op.VectorDatum([datum]))

    if datum.poseKeypoints is None:
        return None

    # Extract keypoints for first detected person
    keypoints = datum.poseKeypoints[0]

    # Calculate shooting arm angle (assuming right-handed)
    shoulder = keypoints[2][:2]  # Right shoulder
    elbow = keypoints[3][:2]     # Right elbow
    wrist = keypoints[4][:2]     # Right wrist

    elbow_angle = calculate_angle(shoulder, elbow, wrist)

    return {
        "keypoints": keypoints,
        "elbow_angle": elbow_angle,
        "shoulder_position": shoulder,
        "release_point": wrist
    }

def calculate_angle(p1, p2, p3):
    """Calculate angle at p2 formed by p1-p2-p3."""
    v1 = np.array(p1) - np.array(p2)
    v2 = np.array(p3) - np.array(p2)

    cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
    angle = np.arccos(np.clip(cos_angle, -1, 1))

    return np.degrees(angle)

27.2.3 MediaPipe

Google's MediaPipe has become the preferred choice for many basketball applications due to its ease of use, cross-platform support, and efficient performance on CPU.

Advantages over OpenPose: - Runs efficiently without GPU - Simpler installation and setup - Built-in hand and face detection - Active development and support - Mobile device compatibility

MediaPipe Pose provides 33 landmarks:

"""
MediaPipe pose estimation for basketball analysis.
"""

import cv2
import mediapipe as mp
import numpy as np

class BasketballPoseAnalyzer:
    """Analyze basketball movements using MediaPipe pose estimation."""

    def __init__(self):
        self.mp_pose = mp.solutions.pose
        self.mp_draw = mp.solutions.drawing_utils
        self.pose = self.mp_pose.Pose(
            static_image_mode=False,
            model_complexity=2,  # 0, 1, or 2 (higher = more accurate)
            smooth_landmarks=True,
            min_detection_confidence=0.5,
            min_tracking_confidence=0.5
        )

        # Define landmark indices for basketball analysis
        self.LANDMARKS = {
            'nose': 0,
            'left_shoulder': 11,
            'right_shoulder': 12,
            'left_elbow': 13,
            'right_elbow': 14,
            'left_wrist': 15,
            'right_wrist': 16,
            'left_hip': 23,
            'right_hip': 24,
            'left_knee': 25,
            'right_knee': 26,
            'left_ankle': 27,
            'right_ankle': 28
        }

    def process_frame(self, frame):
        """
        Process a single frame and extract pose landmarks.

        Args:
            frame: BGR image from OpenCV

        Returns:
            results: MediaPipe pose results object
        """
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = self.pose.process(rgb_frame)
        return results

    def get_landmark_positions(self, results, frame_shape):
        """
        Convert normalized landmarks to pixel coordinates.

        Args:
            results: MediaPipe pose results
            frame_shape: (height, width) of the frame

        Returns:
            dict: Landmark names mapped to (x, y) pixel coordinates
        """
        if not results.pose_landmarks:
            return None

        h, w = frame_shape[:2]
        positions = {}

        for name, idx in self.LANDMARKS.items():
            landmark = results.pose_landmarks.landmark[idx]
            positions[name] = (int(landmark.x * w), int(landmark.y * h))

        return positions

    def calculate_body_angles(self, positions):
        """
        Calculate key angles for basketball movement analysis.

        Args:
            positions: dict of landmark positions

        Returns:
            dict: Calculated angles in degrees
        """
        if positions is None:
            return None

        angles = {}

        # Right elbow angle (for shooting form)
        angles['right_elbow'] = self._angle_between_points(
            positions['right_shoulder'],
            positions['right_elbow'],
            positions['right_wrist']
        )

        # Left elbow angle
        angles['left_elbow'] = self._angle_between_points(
            positions['left_shoulder'],
            positions['left_elbow'],
            positions['left_wrist']
        )

        # Right knee angle (for defensive stance, jumping)
        angles['right_knee'] = self._angle_between_points(
            positions['right_hip'],
            positions['right_knee'],
            positions['right_ankle']
        )

        # Left knee angle
        angles['left_knee'] = self._angle_between_points(
            positions['left_hip'],
            positions['left_knee'],
            positions['left_ankle']
        )

        # Hip angle (trunk flexion)
        mid_shoulder = (
            (positions['left_shoulder'][0] + positions['right_shoulder'][0]) // 2,
            (positions['left_shoulder'][1] + positions['right_shoulder'][1]) // 2
        )
        mid_hip = (
            (positions['left_hip'][0] + positions['right_hip'][0]) // 2,
            (positions['left_hip'][1] + positions['right_hip'][1]) // 2
        )
        mid_knee = (
            (positions['left_knee'][0] + positions['right_knee'][0]) // 2,
            (positions['left_knee'][1] + positions['right_knee'][1]) // 2
        )
        angles['trunk_flexion'] = self._angle_between_points(
            mid_shoulder, mid_hip, mid_knee
        )

        return angles

    def _angle_between_points(self, p1, p2, p3):
        """Calculate angle at p2 formed by line segments p1-p2 and p2-p3."""
        v1 = np.array([p1[0] - p2[0], p1[1] - p2[1]])
        v2 = np.array([p3[0] - p2[0], p3[1] - p2[1]])

        cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2) + 1e-6)
        angle = np.arccos(np.clip(cos_angle, -1, 1))

        return np.degrees(angle)

    def analyze_shooting_phase(self, positions, angles):
        """
        Determine the phase of a shooting motion.

        Args:
            positions: Landmark positions
            angles: Calculated body angles

        Returns:
            str: Phase name ('preparation', 'loading', 'release', 'follow_through')
        """
        if positions is None or angles is None:
            return 'unknown'

        elbow_angle = angles.get('right_elbow', 180)
        wrist_y = positions['right_wrist'][1]
        shoulder_y = positions['right_shoulder'][1]

        # Simple phase detection based on arm position
        if wrist_y > shoulder_y:  # Wrist below shoulder
            return 'preparation'
        elif elbow_angle < 90:  # Elbow bent, ball loaded
            return 'loading'
        elif elbow_angle > 150:  # Arm extended
            return 'follow_through'
        else:
            return 'release'

    def draw_skeleton(self, frame, results):
        """Draw pose skeleton on frame for visualization."""
        if results.pose_landmarks:
            self.mp_draw.draw_landmarks(
                frame,
                results.pose_landmarks,
                self.mp_pose.POSE_CONNECTIONS,
                self.mp_draw.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=2),
                self.mp_draw.DrawingSpec(color=(0, 0, 255), thickness=2)
            )
        return frame

27.2.4 Comparing Pose Estimation Systems

Feature OpenPose MediaPipe DeepLabCut
Keypoints 25 (BODY_25) 33 Custom
GPU Required Yes (efficient) No Yes (training)
Real-time Yes Yes Yes (inference)
Multi-person Yes Limited Yes
3D Pose Limited Yes With calibration
Customization Moderate Limited High
Best For Research Applications Specialized analysis

27.3 Action Recognition in Basketball

27.3.1 The Action Recognition Task

Action recognition involves identifying what activity is occurring in a video sequence. In basketball, relevant actions include:

Player Actions: - Shooting (jump shot, layup, dunk, free throw) - Passing (chest pass, bounce pass, overhead pass) - Dribbling (crossover, between legs, behind back) - Defensive movements (slide, contest, block attempt) - Rebounding (box out, jump, secure)

Team Actions: - Pick and roll execution - Fast break - Zone defense rotation - Inbound plays

27.3.2 Approaches to Action Recognition

Rule-Based Systems

Early approaches relied on hand-crafted rules based on tracking data:

"""
Rule-based action detection example.
"""

def detect_shot_attempt(ball_positions, player_positions, hoop_position):
    """
    Detect shot attempts using trajectory analysis.

    Args:
        ball_positions: List of (x, y, z, timestamp) tuples
        player_positions: Dict mapping player_id to position list
        hoop_position: (x, y, z) coordinates of the basket

    Returns:
        list: Detected shot events with timestamps
    """
    shots = []

    # Parameters (would be tuned empirically)
    MIN_BALL_HEIGHT = 8.0  # feet - ball must reach this height
    MAX_DISTANCE_TO_HOOP = 30.0  # feet
    MIN_UPWARD_VELOCITY = 5.0  # feet/second

    for i in range(1, len(ball_positions) - 1):
        prev_pos = ball_positions[i - 1]
        curr_pos = ball_positions[i]
        next_pos = ball_positions[i + 1]

        # Calculate vertical velocity
        dt = curr_pos[3] - prev_pos[3]
        if dt <= 0:
            continue

        vertical_velocity = (curr_pos[2] - prev_pos[2]) / dt

        # Check if ball is moving upward with sufficient velocity
        if vertical_velocity < MIN_UPWARD_VELOCITY:
            continue

        # Check if ball will reach minimum height
        if curr_pos[2] < MIN_BALL_HEIGHT:
            continue

        # Check distance to hoop
        distance_to_hoop = np.sqrt(
            (curr_pos[0] - hoop_position[0])**2 +
            (curr_pos[1] - hoop_position[1])**2
        )

        if distance_to_hoop > MAX_DISTANCE_TO_HOOP:
            continue

        # Find nearest player (likely shooter)
        shooter_id = find_nearest_player(curr_pos, player_positions, curr_pos[3])

        shots.append({
            'timestamp': curr_pos[3],
            'position': curr_pos[:3],
            'shooter_id': shooter_id,
            'distance_to_hoop': distance_to_hoop
        })

    return merge_nearby_detections(shots)

Deep Learning Approaches

Modern action recognition typically uses deep neural networks trained on large datasets:

Two-Stream Networks: Process RGB frames and optical flow separately, then fuse predictions

3D Convolutional Networks (C3D, I3D): Apply 3D convolutions to capture spatiotemporal patterns

Transformer-Based Models: Use attention mechanisms to focus on relevant frames and spatial regions

"""
Simplified action recognition using pre-trained models.
Note: This example uses PyTorch and a pre-trained model.
"""

import torch
import torchvision.transforms as transforms
from torchvision.models.video import r3d_18

class BasketballActionRecognizer:
    """Recognize basketball actions using a pre-trained video model."""

    def __init__(self, model_path=None):
        # Load pre-trained model (would fine-tune on basketball data)
        self.model = r3d_18(pretrained=True)

        # Replace final layer for basketball actions
        self.action_classes = [
            'jump_shot', 'layup', 'dunk', 'free_throw',
            'pass', 'dribble', 'rebound', 'block',
            'screen', 'cut', 'other'
        ]

        num_classes = len(self.action_classes)
        self.model.fc = torch.nn.Linear(self.model.fc.in_features, num_classes)

        if model_path:
            self.model.load_state_dict(torch.load(model_path))

        self.model.eval()

        self.transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.Resize((112, 112)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.43216, 0.394666, 0.37645],
                               std=[0.22803, 0.22145, 0.216989])
        ])

    def preprocess_clip(self, frames):
        """
        Preprocess a clip of frames for the model.

        Args:
            frames: List of BGR frames from OpenCV

        Returns:
            torch.Tensor: Preprocessed clip tensor
        """
        processed = []
        for frame in frames:
            rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            processed.append(self.transform(rgb_frame))

        # Stack frames: (T, C, H, W) -> (C, T, H, W)
        clip = torch.stack(processed).permute(1, 0, 2, 3)
        return clip.unsqueeze(0)  # Add batch dimension

    def predict(self, frames):
        """
        Predict action from a clip of frames.

        Args:
            frames: List of 16+ BGR frames

        Returns:
            tuple: (predicted_action, confidence, all_probabilities)
        """
        with torch.no_grad():
            clip = self.preprocess_clip(frames[:16])  # Model expects 16 frames
            outputs = self.model(clip)
            probabilities = torch.nn.functional.softmax(outputs, dim=1)

            confidence, predicted = torch.max(probabilities, 1)

            return (
                self.action_classes[predicted.item()],
                confidence.item(),
                {self.action_classes[i]: probabilities[0][i].item()
                 for i in range(len(self.action_classes))}
            )

27.3.3 Temporal Action Detection

Beyond classifying isolated clips, temporal action detection identifies when actions occur within longer videos:

"""
Temporal action detection with sliding window approach.
"""

class TemporalActionDetector:
    """Detect and localize actions in continuous video."""

    def __init__(self, recognizer, window_size=16, stride=4, threshold=0.7):
        self.recognizer = recognizer
        self.window_size = window_size
        self.stride = stride
        self.threshold = threshold

    def detect_actions(self, video_path):
        """
        Detect all actions in a video with their temporal locations.

        Args:
            video_path: Path to video file

        Returns:
            list: Detected actions with start/end times
        """
        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)

        frames = []
        frame_idx = 0
        detections = []

        while True:
            ret, frame = cap.read()
            if not ret:
                break

            frames.append(frame)

            # Process when we have enough frames
            if len(frames) >= self.window_size:
                action, confidence, probs = self.recognizer.predict(
                    frames[-self.window_size:]
                )

                if confidence >= self.threshold and action != 'other':
                    start_frame = frame_idx - self.window_size + 1
                    detections.append({
                        'action': action,
                        'confidence': confidence,
                        'start_frame': start_frame,
                        'end_frame': frame_idx,
                        'start_time': start_frame / fps,
                        'end_time': frame_idx / fps
                    })

            frame_idx += 1

            # Slide window
            if len(frames) > self.window_size:
                frames = frames[self.stride:]

        cap.release()

        # Merge overlapping detections of same action
        return self._merge_detections(detections)

    def _merge_detections(self, detections):
        """Merge overlapping detections of the same action type."""
        if not detections:
            return []

        # Sort by start time
        detections.sort(key=lambda x: x['start_time'])

        merged = [detections[0]]

        for det in detections[1:]:
            last = merged[-1]

            # Check if same action and overlapping
            if (det['action'] == last['action'] and
                det['start_time'] <= last['end_time'] + 0.5):
                # Extend the previous detection
                last['end_frame'] = max(last['end_frame'], det['end_frame'])
                last['end_time'] = max(last['end_time'], det['end_time'])
                last['confidence'] = max(last['confidence'], det['confidence'])
            else:
                merged.append(det)

        return merged

27.4 Automated Play Classification

27.4.1 Understanding Basketball Plays

Basketball plays are coordinated sequences of player movements designed to create scoring opportunities. Automated play classification enables:

  • Scouts to quickly identify opponent tendencies
  • Coaches to review play execution efficiency
  • Analysts to quantify strategic patterns across games

27.4.2 Feature Extraction for Play Classification

Plays can be represented using various features extracted from tracking data:

"""
Feature extraction for play classification.
"""

import numpy as np
from scipy.spatial.distance import cdist

class PlayFeatureExtractor:
    """Extract features from player tracking data for play classification."""

    # Court dimensions (NBA)
    COURT_LENGTH = 94.0  # feet
    COURT_WIDTH = 50.0   # feet
    THREE_POINT_DISTANCE = 23.75  # feet (corner is 22 ft)

    def __init__(self):
        self.hoop_positions = {
            'left': np.array([5.25, 25.0]),
            'right': np.array([88.75, 25.0])
        }

    def extract_possession_features(self, tracking_data, offensive_team):
        """
        Extract features from a single possession.

        Args:
            tracking_data: DataFrame with columns [frame, player_id, team, x, y]
            offensive_team: Team identifier for offense

        Returns:
            dict: Feature dictionary
        """
        features = {}

        # Separate offensive and defensive players
        offense = tracking_data[tracking_data['team'] == offensive_team]
        defense = tracking_data[tracking_data['team'] != offensive_team]

        # Determine which hoop offense is attacking
        avg_x = offense.groupby('frame')['x'].mean().mean()
        target_hoop = 'right' if avg_x < self.COURT_LENGTH / 2 else 'left'
        hoop_pos = self.hoop_positions[target_hoop]

        # Spatial features
        features.update(self._spatial_features(offense, hoop_pos))

        # Movement features
        features.update(self._movement_features(offense))

        # Spacing features
        features.update(self._spacing_features(offense, defense))

        # Temporal features
        features.update(self._temporal_features(offense))

        return features

    def _spatial_features(self, offense, hoop_pos):
        """Calculate spatial distribution features."""
        features = {}

        # Average distance to hoop over possession
        offense_copy = offense.copy()
        offense_copy['dist_to_hoop'] = np.sqrt(
            (offense_copy['x'] - hoop_pos[0])**2 +
            (offense_copy['y'] - hoop_pos[1])**2
        )

        features['avg_dist_to_hoop'] = offense_copy['dist_to_hoop'].mean()
        features['min_dist_to_hoop'] = offense_copy['dist_to_hoop'].min()

        # Court region distribution
        features['pct_in_paint'] = (offense_copy['dist_to_hoop'] < 8).mean()
        features['pct_at_three'] = (offense_copy['dist_to_hoop'] > self.THREE_POINT_DISTANCE).mean()

        # Side distribution (left/right of court)
        features['pct_left_side'] = (offense_copy['y'] < self.COURT_WIDTH / 2).mean()

        return features

    def _movement_features(self, offense):
        """Calculate movement and velocity features."""
        features = {}

        # Calculate velocities for each player
        velocities = []
        for player_id in offense['player_id'].unique():
            player_data = offense[offense['player_id'] == player_id].sort_values('frame')

            if len(player_data) < 2:
                continue

            dx = player_data['x'].diff()
            dy = player_data['y'].diff()
            speed = np.sqrt(dx**2 + dy**2)
            velocities.extend(speed.dropna().tolist())

        if velocities:
            features['avg_speed'] = np.mean(velocities)
            features['max_speed'] = np.max(velocities)
            features['speed_variance'] = np.var(velocities)
        else:
            features['avg_speed'] = 0
            features['max_speed'] = 0
            features['speed_variance'] = 0

        # Total distance covered
        features['total_distance'] = sum(velocities)

        return features

    def _spacing_features(self, offense, defense):
        """Calculate spacing and separation features."""
        features = {}

        # Get positions at each frame
        frames = offense['frame'].unique()

        off_spreads = []
        def_separations = []

        for frame in frames:
            off_frame = offense[offense['frame'] == frame][['x', 'y']].values
            def_frame = defense[defense['frame'] == frame][['x', 'y']].values

            if len(off_frame) >= 2:
                # Offensive spread (average pairwise distance)
                off_dists = cdist(off_frame, off_frame)
                np.fill_diagonal(off_dists, np.nan)
                off_spreads.append(np.nanmean(off_dists))

            if len(off_frame) > 0 and len(def_frame) > 0:
                # Closest defender distance for each offensive player
                separations = cdist(off_frame, def_frame).min(axis=1)
                def_separations.extend(separations)

        features['avg_offensive_spread'] = np.mean(off_spreads) if off_spreads else 0
        features['avg_defender_distance'] = np.mean(def_separations) if def_separations else 0
        features['min_defender_distance'] = np.min(def_separations) if def_separations else 0

        return features

    def _temporal_features(self, offense):
        """Calculate time-based features."""
        features = {}

        frames = sorted(offense['frame'].unique())
        features['possession_length'] = len(frames)

        # Assuming 25 fps
        features['possession_duration'] = len(frames) / 25.0

        return features


class PlayClassifier:
    """Classify basketball plays from extracted features."""

    def __init__(self):
        self.play_types = [
            'pick_and_roll',
            'isolation',
            'post_up',
            'spot_up',
            'transition',
            'off_screen',
            'handoff',
            'cut',
            'putback',
            'miscellaneous'
        ]

        # In practice, this would be a trained model
        self.model = None

    def train(self, features_list, labels):
        """
        Train the classifier on labeled possessions.

        Args:
            features_list: List of feature dictionaries
            labels: List of play type labels
        """
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.preprocessing import StandardScaler

        # Convert to arrays
        X = self._features_to_array(features_list)
        y = np.array([self.play_types.index(l) for l in labels])

        self.scaler = StandardScaler()
        X_scaled = self.scaler.fit_transform(X)

        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(X_scaled, y)

    def predict(self, features):
        """
        Predict play type from features.

        Args:
            features: Feature dictionary for a possession

        Returns:
            tuple: (play_type, confidence)
        """
        X = self._features_to_array([features])
        X_scaled = self.scaler.transform(X)

        proba = self.model.predict_proba(X_scaled)[0]
        predicted_idx = np.argmax(proba)

        return self.play_types[predicted_idx], proba[predicted_idx]

    def _features_to_array(self, features_list):
        """Convert list of feature dicts to numpy array."""
        feature_names = sorted(features_list[0].keys())
        return np.array([[f[name] for name in feature_names] for f in features_list])

27.5 Ball and Player Detection

27.5.1 Object Detection Fundamentals

Object detection involves both locating objects within an image (localization) and classifying what each object is. For basketball, primary detection targets include:

  • Players (with team/jersey identification)
  • The basketball
  • The court and its markings
  • The hoop and backboard
  • Referees

27.5.2 Modern Detection Architectures

YOLO (You Only Look Once) family models are popular for real-time detection:

"""
Basketball object detection using YOLOv8.
"""

from ultralytics import YOLO
import cv2
import numpy as np

class BasketballDetector:
    """Detect players and ball in basketball footage."""

    def __init__(self, model_path='yolov8n.pt'):
        """
        Initialize detector with YOLO model.

        Args:
            model_path: Path to YOLO weights (use custom trained for best results)
        """
        self.model = YOLO(model_path)

        # Class mappings for a basketball-trained model
        self.class_names = {
            0: 'player',
            1: 'ball',
            2: 'referee',
            3: 'hoop'
        }

        # For standard COCO model, basketball is class 32
        self.coco_person_class = 0
        self.coco_ball_class = 32  # sports ball

    def detect(self, frame, conf_threshold=0.5):
        """
        Detect objects in a frame.

        Args:
            frame: BGR image
            conf_threshold: Minimum confidence threshold

        Returns:
            list: Detection dictionaries with bbox, class, confidence
        """
        results = self.model(frame, conf=conf_threshold, verbose=False)

        detections = []
        for result in results:
            boxes = result.boxes
            for box in boxes:
                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                conf = box.conf[0].cpu().numpy()
                cls = int(box.cls[0].cpu().numpy())

                detections.append({
                    'bbox': [int(x1), int(y1), int(x2), int(y2)],
                    'confidence': float(conf),
                    'class_id': cls,
                    'class_name': self.model.names[cls]
                })

        return detections

    def detect_and_track(self, video_path, output_path=None):
        """
        Detect and track objects through a video.

        Args:
            video_path: Input video path
            output_path: Optional output video path

        Returns:
            dict: Tracking results with trajectories
        """
        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

        if output_path:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

        trajectories = {'players': {}, 'ball': []}
        frame_idx = 0

        while True:
            ret, frame = cap.read()
            if not ret:
                break

            # Run tracking (uses ByteTrack internally)
            results = self.model.track(frame, persist=True, verbose=False)

            for result in results:
                boxes = result.boxes
                if boxes.id is None:
                    continue

                for box, track_id in zip(boxes, boxes.id):
                    x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                    cls = int(box.cls[0].cpu().numpy())
                    track_id = int(track_id.cpu().numpy())

                    center = ((x1 + x2) / 2, (y1 + y2) / 2)

                    # Store trajectory
                    if self.model.names[cls] == 'person':
                        if track_id not in trajectories['players']:
                            trajectories['players'][track_id] = []
                        trajectories['players'][track_id].append({
                            'frame': frame_idx,
                            'bbox': [x1, y1, x2, y2],
                            'center': center
                        })
                    elif 'ball' in self.model.names[cls].lower():
                        trajectories['ball'].append({
                            'frame': frame_idx,
                            'bbox': [x1, y1, x2, y2],
                            'center': center
                        })

                    # Draw on frame
                    cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)),
                                (0, 255, 0), 2)
                    cv2.putText(frame, f'ID:{track_id}', (int(x1), int(y1)-10),
                               cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

            if output_path:
                out.write(frame)

            frame_idx += 1

        cap.release()
        if output_path:
            out.release()

        return trajectories

27.5.3 Ball Detection Challenges

The basketball presents unique detection challenges:

  1. Small Size: The ball occupies a tiny portion of wide-angle footage
  2. Motion Blur: Fast-moving ball is blurred in standard frame rates
  3. Occlusion: Players frequently occlude the ball
  4. Color Similarity: Ball color may match court or uniform elements
  5. Deformation: Ball shape appears elliptical during fast motion

Specialized Ball Detection Strategies:

"""
Specialized basketball detection using color and shape analysis.
"""

class BasketballBallDetector:
    """Detect basketball using color segmentation and shape analysis."""

    def __init__(self):
        # Basketball color range in HSV (orange/brown)
        self.lower_orange = np.array([5, 100, 100])
        self.upper_orange = np.array([25, 255, 255])

        # Expected ball radius range (pixels, depends on camera)
        self.min_radius = 10
        self.max_radius = 50

    def detect_by_color(self, frame):
        """
        Detect basketball using color segmentation.

        Args:
            frame: BGR image

        Returns:
            list: Detected ball candidates [(x, y, radius), ...]
        """
        # Convert to HSV
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

        # Create mask for basketball color
        mask = cv2.inRange(hsv, self.lower_orange, self.upper_orange)

        # Morphological operations to clean up mask
        kernel = np.ones((5, 5), np.uint8)
        mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
        mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)

        # Find contours
        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL,
                                       cv2.CHAIN_APPROX_SIMPLE)

        candidates = []
        for contour in contours:
            # Fit minimum enclosing circle
            (x, y), radius = cv2.minEnclosingCircle(contour)

            # Check radius constraints
            if radius < self.min_radius or radius > self.max_radius:
                continue

            # Check circularity
            area = cv2.contourArea(contour)
            expected_area = np.pi * radius * radius
            circularity = area / expected_area if expected_area > 0 else 0

            if circularity > 0.6:  # Reasonably circular
                candidates.append((int(x), int(y), int(radius), circularity))

        return candidates

    def detect_by_motion(self, prev_frame, curr_frame, next_frame):
        """
        Detect ball using motion analysis between frames.

        The ball typically shows consistent motion distinct from players.
        """
        # Convert to grayscale
        prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
        curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)

        # Calculate frame differences
        diff1 = cv2.absdiff(prev_gray, curr_gray)
        diff2 = cv2.absdiff(curr_gray, next_gray)

        # Areas with motion in both differences likely contain moving objects
        motion_mask = cv2.bitwise_and(diff1, diff2)

        # Threshold
        _, motion_mask = cv2.threshold(motion_mask, 30, 255, cv2.THRESH_BINARY)

        # Combine with color detection for better accuracy
        color_candidates = self.detect_by_color(curr_frame)

        # Filter candidates by motion
        validated = []
        for x, y, radius, circ in color_candidates:
            # Check if candidate region shows motion
            region = motion_mask[max(0, y-radius):y+radius,
                                max(0, x-radius):x+radius]
            if region.size > 0 and region.mean() > 50:
                validated.append((x, y, radius, circ))

        return validated

27.5.4 Player Identification

Beyond detecting players, identifying individuals requires additional techniques:

Jersey Number Recognition:

"""
Jersey number recognition for player identification.
"""

class JerseyNumberRecognizer:
    """Recognize jersey numbers from player detections."""

    def __init__(self):
        # Load OCR model (using EasyOCR as example)
        import easyocr
        self.reader = easyocr.Reader(['en'])

    def extract_jersey_region(self, frame, player_bbox):
        """
        Extract the jersey region from a player bounding box.

        Args:
            frame: Full frame image
            player_bbox: [x1, y1, x2, y2] bounding box

        Returns:
            numpy.ndarray: Cropped jersey region
        """
        x1, y1, x2, y2 = [int(c) for c in player_bbox]

        # Jersey number typically in upper-middle portion
        width = x2 - x1
        height = y2 - y1

        jersey_x1 = x1 + int(width * 0.2)
        jersey_x2 = x2 - int(width * 0.2)
        jersey_y1 = y1 + int(height * 0.15)
        jersey_y2 = y1 + int(height * 0.45)

        return frame[jersey_y1:jersey_y2, jersey_x1:jersey_x2]

    def recognize_number(self, jersey_image):
        """
        Recognize the jersey number from cropped image.

        Args:
            jersey_image: Cropped jersey region

        Returns:
            tuple: (number_string, confidence) or (None, 0)
        """
        if jersey_image.size == 0:
            return None, 0

        # Preprocess for better OCR
        gray = cv2.cvtColor(jersey_image, cv2.COLOR_BGR2GRAY)

        # Enhance contrast
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        enhanced = clahe.apply(gray)

        # Run OCR
        results = self.reader.readtext(enhanced, allowlist='0123456789')

        if not results:
            return None, 0

        # Get highest confidence result
        best_result = max(results, key=lambda x: x[2])
        text, confidence = best_result[1], best_result[2]

        # Validate as jersey number (typically 0-99)
        try:
            number = int(text)
            if 0 <= number <= 99:
                return str(number), confidence
        except ValueError:
            pass

        return None, 0

27.6 Camera Calibration for Court Mapping

27.6.1 The Importance of Calibration

Camera calibration is essential for converting pixel coordinates from video to real-world court coordinates. This enables:

  • Accurate distance and speed calculations
  • Consistent analysis across different camera angles
  • Integration of multi-camera footage
  • Overlay of analytics visualizations on video

27.6.2 Homography Estimation

A homography is a transformation that maps points from one plane to another. For basketball, we map from the image plane to the court plane:

"""
Court calibration using homography estimation.
"""

class CourtCalibrator:
    """Calibrate camera view to basketball court coordinates."""

    # NBA court dimensions in feet
    COURT_LENGTH = 94.0
    COURT_WIDTH = 50.0

    # Key court points (in feet from bottom-left corner)
    COURT_POINTS = {
        'center_court': (47.0, 25.0),
        'left_free_throw': (19.0, 25.0),
        'right_free_throw': (75.0, 25.0),
        'left_three_corner_bottom': (5.25, 3.0),
        'left_three_corner_top': (5.25, 47.0),
        'right_three_corner_bottom': (88.75, 3.0),
        'right_three_corner_top': (88.75, 47.0),
        'left_basket': (5.25, 25.0),
        'right_basket': (88.75, 25.0),
        'half_court_bottom': (47.0, 0.0),
        'half_court_top': (47.0, 50.0)
    }

    def __init__(self):
        self.homography_matrix = None
        self.inverse_homography = None

    def calibrate_from_points(self, image_points, court_point_names):
        """
        Calculate homography from corresponding point pairs.

        Args:
            image_points: List of (x, y) pixel coordinates
            court_point_names: List of court point names (must match COURT_POINTS keys)

        Returns:
            bool: True if calibration successful
        """
        if len(image_points) < 4:
            raise ValueError("Need at least 4 point correspondences")

        # Get court coordinates
        court_points = [self.COURT_POINTS[name] for name in court_point_names]

        # Convert to numpy arrays
        src_pts = np.array(image_points, dtype=np.float32)
        dst_pts = np.array(court_points, dtype=np.float32)

        # Calculate homography
        self.homography_matrix, mask = cv2.findHomography(src_pts, dst_pts,
                                                          cv2.RANSAC, 5.0)

        if self.homography_matrix is None:
            return False

        # Calculate inverse for court-to-image mapping
        self.inverse_homography = np.linalg.inv(self.homography_matrix)

        return True

    def calibrate_interactive(self, frame):
        """
        Interactive calibration by clicking on court points.

        Args:
            frame: Video frame showing the court
        """
        print("Click on the following court points in order:")
        points_needed = ['center_court', 'left_free_throw', 'right_free_throw',
                        'half_court_bottom', 'half_court_top']

        clicked_points = []

        def mouse_callback(event, x, y, flags, param):
            if event == cv2.EVENT_LBUTTONDOWN:
                clicked_points.append((x, y))
                print(f"Point {len(clicked_points)}: ({x}, {y})")

        cv2.namedWindow('Calibration')
        cv2.setMouseCallback('Calibration', mouse_callback)

        display_frame = frame.copy()

        for i, point_name in enumerate(points_needed):
            print(f"\nClick on: {point_name}")

            while len(clicked_points) <= i:
                cv2.imshow('Calibration', display_frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    cv2.destroyAllWindows()
                    return False

            # Draw clicked point
            cv2.circle(display_frame, clicked_points[-1], 5, (0, 255, 0), -1)
            cv2.putText(display_frame, point_name,
                       (clicked_points[-1][0] + 10, clicked_points[-1][1]),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

        cv2.destroyAllWindows()

        return self.calibrate_from_points(clicked_points, points_needed)

    def image_to_court(self, pixel_coords):
        """
        Convert pixel coordinates to court coordinates.

        Args:
            pixel_coords: (x, y) or array of pixel coordinates

        Returns:
            Court coordinates in feet
        """
        if self.homography_matrix is None:
            raise ValueError("Calibration required first")

        pts = np.array(pixel_coords, dtype=np.float32)
        if pts.ndim == 1:
            pts = pts.reshape(1, 1, 2)
        elif pts.ndim == 2:
            pts = pts.reshape(-1, 1, 2)

        transformed = cv2.perspectiveTransform(pts, self.homography_matrix)

        return transformed.reshape(-1, 2)

    def court_to_image(self, court_coords):
        """
        Convert court coordinates to pixel coordinates.

        Args:
            court_coords: (x, y) or array of court coordinates in feet

        Returns:
            Pixel coordinates
        """
        if self.inverse_homography is None:
            raise ValueError("Calibration required first")

        pts = np.array(court_coords, dtype=np.float32)
        if pts.ndim == 1:
            pts = pts.reshape(1, 1, 2)
        elif pts.ndim == 2:
            pts = pts.reshape(-1, 1, 2)

        transformed = cv2.perspectiveTransform(pts, self.inverse_homography)

        return transformed.reshape(-1, 2)

    def draw_court_overlay(self, frame, color=(0, 255, 0), thickness=2):
        """
        Draw court lines on video frame using calibration.

        Args:
            frame: Video frame
            color: Line color (BGR)
            thickness: Line thickness

        Returns:
            Frame with court overlay
        """
        if self.inverse_homography is None:
            return frame

        overlay = frame.copy()

        # Court boundary
        boundary = np.array([
            [0, 0], [self.COURT_LENGTH, 0],
            [self.COURT_LENGTH, self.COURT_WIDTH], [0, self.COURT_WIDTH]
        ])
        boundary_px = self.court_to_image(boundary).astype(np.int32)
        cv2.polylines(overlay, [boundary_px], True, color, thickness)

        # Center line
        center_line = np.array([
            [self.COURT_LENGTH/2, 0],
            [self.COURT_LENGTH/2, self.COURT_WIDTH]
        ])
        center_px = self.court_to_image(center_line).astype(np.int32)
        cv2.polylines(overlay, [center_px], False, color, thickness)

        # Center circle (approximate with polygon)
        angles = np.linspace(0, 2*np.pi, 36)
        center_circle = np.array([
            [self.COURT_LENGTH/2 + 6*np.cos(a),
             self.COURT_WIDTH/2 + 6*np.sin(a)]
            for a in angles
        ])
        circle_px = self.court_to_image(center_circle).astype(np.int32)
        cv2.polylines(overlay, [circle_px], True, color, thickness)

        # Three-point lines (simplified)
        # Left arc
        left_arc = self._generate_three_point_arc(5.25)
        left_arc_px = self.court_to_image(left_arc).astype(np.int32)
        cv2.polylines(overlay, [left_arc_px], False, color, thickness)

        # Right arc
        right_arc = self._generate_three_point_arc(88.75)
        right_arc_px = self.court_to_image(right_arc).astype(np.int32)
        cv2.polylines(overlay, [right_arc_px], False, color, thickness)

        return overlay

    def _generate_three_point_arc(self, basket_x):
        """Generate points along three-point arc."""
        # NBA three-point distance: 23.75 feet (22 in corners)
        center_y = self.COURT_WIDTH / 2
        radius = 23.75

        # Arc only, not corner sections
        angles = np.linspace(-np.pi/2 + 0.4, np.pi/2 - 0.4, 30)

        if basket_x < self.COURT_LENGTH / 2:
            # Left basket
            arc = [[basket_x + radius * np.cos(a), center_y + radius * np.sin(a)]
                   for a in angles]
        else:
            # Right basket
            arc = [[basket_x - radius * np.cos(a), center_y + radius * np.sin(a)]
                   for a in angles]

        return np.array(arc)

27.6.3 Automatic Court Detection

Advanced systems can automatically detect court lines to estimate calibration:

"""
Automatic court line detection for calibration.
"""

class AutomaticCourtDetector:
    """Automatically detect basketball court lines."""

    def __init__(self):
        self.court_color_lower = np.array([0, 0, 180])  # Adjust for court color
        self.court_color_upper = np.array([180, 50, 255])

    def detect_court_lines(self, frame):
        """
        Detect court lines using edge detection and Hough transform.

        Args:
            frame: Video frame

        Returns:
            list: Detected lines as [(x1, y1, x2, y2), ...]
        """
        # Convert to grayscale
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Edge detection
        edges = cv2.Canny(gray, 50, 150, apertureSize=3)

        # Hough line detection
        lines = cv2.HoughLinesP(edges, 1, np.pi/180, threshold=100,
                                minLineLength=100, maxLineGap=10)

        if lines is None:
            return []

        detected_lines = []
        for line in lines:
            x1, y1, x2, y2 = line[0]
            detected_lines.append((x1, y1, x2, y2))

        return detected_lines

    def find_court_corners(self, lines):
        """
        Find court corners from detected lines.

        Args:
            lines: List of detected lines

        Returns:
            list: Corner points
        """
        # Separate horizontal and vertical lines
        horizontal = []
        vertical = []

        for x1, y1, x2, y2 in lines:
            angle = np.arctan2(y2 - y1, x2 - x1)
            if abs(angle) < np.pi/6:  # Nearly horizontal
                horizontal.append((x1, y1, x2, y2))
            elif abs(angle) > np.pi/3:  # Nearly vertical
                vertical.append((x1, y1, x2, y2))

        # Find intersections
        corners = []
        for h_line in horizontal:
            for v_line in vertical:
                intersection = self._line_intersection(h_line, v_line)
                if intersection:
                    corners.append(intersection)

        return corners

    def _line_intersection(self, line1, line2):
        """Calculate intersection point of two lines."""
        x1, y1, x2, y2 = line1
        x3, y3, x4, y4 = line2

        denom = (x1 - x2) * (y3 - y4) - (y1 - y2) * (x3 - x4)
        if abs(denom) < 1e-6:
            return None

        t = ((x1 - x3) * (y3 - y4) - (y1 - y3) * (x3 - x4)) / denom

        x = x1 + t * (x2 - x1)
        y = y1 + t * (y2 - y1)

        return (x, y)

27.7 Video Analysis Workflows

27.7.1 End-to-End Processing Pipeline

A complete basketball video analysis pipeline integrates multiple components:

"""
Complete video analysis pipeline for basketball.
"""

class BasketballVideoAnalyzer:
    """End-to-end basketball video analysis pipeline."""

    def __init__(self, config=None):
        """
        Initialize analysis pipeline.

        Args:
            config: Configuration dictionary
        """
        self.config = config or {}

        # Initialize components
        self.detector = BasketballDetector()
        self.pose_analyzer = BasketballPoseAnalyzer()
        self.court_calibrator = CourtCalibrator()
        self.action_recognizer = BasketballActionRecognizer()

        # Storage for analysis results
        self.results = {
            'frames': [],
            'detections': [],
            'poses': [],
            'actions': [],
            'tracking': {'players': {}, 'ball': []}
        }

    def analyze_video(self, video_path, output_path=None,
                      calibration_frame=None):
        """
        Run complete analysis on a video.

        Args:
            video_path: Input video path
            output_path: Optional output video path
            calibration_frame: Frame number for calibration (or None for first)

        Returns:
            dict: Complete analysis results
        """
        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

        print(f"Processing video: {total_frames} frames at {fps} fps")

        if output_path:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

        # Calibration
        if calibration_frame is not None:
            cap.set(cv2.CAP_PROP_POS_FRAMES, calibration_frame)
            ret, frame = cap.read()
            if ret:
                self.court_calibrator.calibrate_interactive(frame)
            cap.set(cv2.CAP_PROP_POS_FRAMES, 0)

        frame_buffer = []
        frame_idx = 0

        while True:
            ret, frame = cap.read()
            if not ret:
                break

            # Store frame info
            frame_data = {
                'index': frame_idx,
                'timestamp': frame_idx / fps
            }

            # Object detection
            detections = self.detector.detect(frame)
            self.results['detections'].append(detections)

            # Pose estimation for each detected person
            frame_poses = []
            for det in detections:
                if det['class_name'] == 'person':
                    x1, y1, x2, y2 = det['bbox']
                    person_crop = frame[y1:y2, x1:x2]

                    if person_crop.size > 0:
                        pose_results = self.pose_analyzer.process_frame(person_crop)
                        if pose_results and pose_results.pose_landmarks:
                            positions = self.pose_analyzer.get_landmark_positions(
                                pose_results, person_crop.shape
                            )
                            angles = self.pose_analyzer.calculate_body_angles(positions)
                            frame_poses.append({
                                'bbox': det['bbox'],
                                'positions': positions,
                                'angles': angles
                            })

            self.results['poses'].append(frame_poses)

            # Buffer frames for action recognition
            frame_buffer.append(frame.copy())
            if len(frame_buffer) >= 16:
                action, confidence, probs = self.action_recognizer.predict(
                    frame_buffer[-16:]
                )
                if confidence > 0.5:
                    self.results['actions'].append({
                        'frame': frame_idx,
                        'action': action,
                        'confidence': confidence
                    })
                frame_buffer = frame_buffer[-8:]  # Overlap for continuity

            # Convert positions to court coordinates if calibrated
            if self.court_calibrator.homography_matrix is not None:
                for det in detections:
                    if det['class_name'] == 'person':
                        center = (
                            (det['bbox'][0] + det['bbox'][2]) / 2,
                            det['bbox'][3]  # Use bottom of bbox (feet position)
                        )
                        court_pos = self.court_calibrator.image_to_court(center)
                        det['court_position'] = court_pos[0].tolist()

            # Visualization
            annotated = self._draw_annotations(frame, detections, frame_poses)

            if output_path:
                out.write(annotated)

            # Progress update
            if frame_idx % 100 == 0:
                print(f"Processed frame {frame_idx}/{total_frames}")

            frame_idx += 1
            self.results['frames'].append(frame_data)

        cap.release()
        if output_path:
            out.release()

        # Post-processing
        self._post_process_results()

        return self.results

    def _draw_annotations(self, frame, detections, poses):
        """Draw detection and pose annotations on frame."""
        annotated = frame.copy()

        # Draw detections
        for det in detections:
            x1, y1, x2, y2 = det['bbox']
            color = (0, 255, 0) if det['class_name'] == 'person' else (0, 0, 255)
            cv2.rectangle(annotated, (x1, y1), (x2, y2), color, 2)

            label = f"{det['class_name']}: {det['confidence']:.2f}"
            cv2.putText(annotated, label, (x1, y1 - 10),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

        # Draw court overlay if calibrated
        if self.court_calibrator.homography_matrix is not None:
            annotated = self.court_calibrator.draw_court_overlay(
                annotated, color=(255, 255, 0), thickness=1
            )

        return annotated

    def _post_process_results(self):
        """Post-process analysis results."""
        # Aggregate action detections
        action_counts = {}
        for action_data in self.results['actions']:
            action = action_data['action']
            action_counts[action] = action_counts.get(action, 0) + 1

        self.results['summary'] = {
            'total_frames': len(self.results['frames']),
            'action_counts': action_counts,
            'average_players_detected': np.mean([
                sum(1 for d in dets if d['class_name'] == 'person')
                for dets in self.results['detections']
            ])
        }

    def export_results(self, output_path):
        """Export results to JSON file."""
        import json

        # Convert numpy arrays to lists for JSON serialization
        def convert_to_serializable(obj):
            if isinstance(obj, np.ndarray):
                return obj.tolist()
            elif isinstance(obj, dict):
                return {k: convert_to_serializable(v) for k, v in obj.items()}
            elif isinstance(obj, list):
                return [convert_to_serializable(i) for i in obj]
            return obj

        serializable_results = convert_to_serializable(self.results)

        with open(output_path, 'w') as f:
            json.dump(serializable_results, f, indent=2)

27.7.2 Batch Processing

For processing multiple games:

"""
Batch processing for multiple games.
"""

import os
from concurrent.futures import ProcessPoolExecutor, as_completed

class BatchVideoProcessor:
    """Process multiple basketball videos in batch."""

    def __init__(self, num_workers=4):
        self.num_workers = num_workers

    def process_directory(self, input_dir, output_dir,
                          video_extensions=('.mp4', '.avi', '.mov')):
        """
        Process all videos in a directory.

        Args:
            input_dir: Directory containing videos
            output_dir: Directory for output files
            video_extensions: Tuple of valid video extensions

        Returns:
            dict: Processing results for each video
        """
        os.makedirs(output_dir, exist_ok=True)

        # Find all video files
        video_files = [
            os.path.join(input_dir, f)
            for f in os.listdir(input_dir)
            if f.lower().endswith(video_extensions)
        ]

        print(f"Found {len(video_files)} videos to process")

        results = {}

        with ProcessPoolExecutor(max_workers=self.num_workers) as executor:
            future_to_video = {
                executor.submit(
                    self._process_single_video,
                    video_path,
                    output_dir
                ): video_path
                for video_path in video_files
            }

            for future in as_completed(future_to_video):
                video_path = future_to_video[future]
                try:
                    result = future.result()
                    results[video_path] = result
                    print(f"Completed: {os.path.basename(video_path)}")
                except Exception as e:
                    print(f"Error processing {video_path}: {e}")
                    results[video_path] = {'error': str(e)}

        return results

    def _process_single_video(self, video_path, output_dir):
        """Process a single video file."""
        video_name = os.path.splitext(os.path.basename(video_path))[0]

        analyzer = BasketballVideoAnalyzer()

        output_video = os.path.join(output_dir, f"{video_name}_analyzed.mp4")
        output_json = os.path.join(output_dir, f"{video_name}_results.json")

        results = analyzer.analyze_video(video_path, output_video)
        analyzer.export_results(output_json)

        return {
            'output_video': output_video,
            'output_json': output_json,
            'summary': results.get('summary', {})
        }

27.8 Integration with Tracking Data

27.8.1 Combining Video and Tracking Data

When both video and tracking data are available, integration enables rich analysis:

"""
Integration of video analysis with tracking data.
"""

import pandas as pd

class TrackingVideoIntegrator:
    """Integrate tracking data with video analysis."""

    def __init__(self, tracking_fps=25, video_fps=30):
        self.tracking_fps = tracking_fps
        self.video_fps = video_fps

    def align_tracking_to_video(self, tracking_data, video_timestamps):
        """
        Align tracking data timestamps to video frames.

        Args:
            tracking_data: DataFrame with 'timestamp' column
            video_timestamps: List of video frame timestamps

        Returns:
            DataFrame with added 'video_frame' column
        """
        tracking_data = tracking_data.copy()

        # Find nearest video frame for each tracking sample
        video_times = np.array(video_timestamps)

        tracking_data['video_frame'] = tracking_data['timestamp'].apply(
            lambda t: np.argmin(np.abs(video_times - t))
        )

        return tracking_data

    def synchronize_events(self, video_events, tracking_events,
                           tolerance_seconds=0.5):
        """
        Match events detected in video with tracking data events.

        Args:
            video_events: List of events from video analysis
            tracking_events: List of events from tracking data
            tolerance_seconds: Maximum time difference for matching

        Returns:
            list: Matched event pairs
        """
        matches = []
        used_tracking = set()

        for v_event in video_events:
            v_time = v_event.get('timestamp', v_event.get('time'))
            best_match = None
            best_diff = tolerance_seconds

            for i, t_event in enumerate(tracking_events):
                if i in used_tracking:
                    continue

                t_time = t_event.get('timestamp', t_event.get('time'))
                diff = abs(v_time - t_time)

                if diff < best_diff:
                    best_diff = diff
                    best_match = i

            if best_match is not None:
                used_tracking.add(best_match)
                matches.append({
                    'video_event': v_event,
                    'tracking_event': tracking_events[best_match],
                    'time_difference': best_diff
                })
            else:
                matches.append({
                    'video_event': v_event,
                    'tracking_event': None,
                    'time_difference': None
                })

        return matches

    def create_unified_dataset(self, video_detections, tracking_data,
                               court_calibrator):
        """
        Create unified dataset combining video and tracking information.

        Args:
            video_detections: Frame-by-frame detection results
            tracking_data: Tracking system data
            court_calibrator: Calibrated court mapper

        Returns:
            DataFrame: Unified dataset
        """
        unified_records = []

        for frame_idx, detections in enumerate(video_detections):
            frame_time = frame_idx / self.video_fps

            # Get tracking data for this time
            tracking_frame = tracking_data[
                np.abs(tracking_data['timestamp'] - frame_time) < 0.05
            ]

            for det in detections:
                if det['class_name'] != 'person':
                    continue

                # Convert video detection to court coordinates
                foot_position = (
                    (det['bbox'][0] + det['bbox'][2]) / 2,
                    det['bbox'][3]
                )
                video_court_pos = court_calibrator.image_to_court(foot_position)[0]

                # Find matching player in tracking data
                best_match = None
                best_dist = 5.0  # 5 feet tolerance

                for _, track_row in tracking_frame.iterrows():
                    dist = np.sqrt(
                        (video_court_pos[0] - track_row['x'])**2 +
                        (video_court_pos[1] - track_row['y'])**2
                    )
                    if dist < best_dist:
                        best_dist = dist
                        best_match = track_row

                record = {
                    'frame': frame_idx,
                    'timestamp': frame_time,
                    'video_bbox': det['bbox'],
                    'video_court_x': video_court_pos[0],
                    'video_court_y': video_court_pos[1],
                    'detection_confidence': det['confidence']
                }

                if best_match is not None:
                    record.update({
                        'player_id': best_match.get('player_id'),
                        'tracking_x': best_match['x'],
                        'tracking_y': best_match['y'],
                        'position_diff': best_dist
                    })

                unified_records.append(record)

        return pd.DataFrame(unified_records)

27.8.2 Enhancing Tracking with Video

Video analysis can fill gaps in tracking data:

"""
Using video to enhance tracking data.
"""

class TrackingEnhancer:
    """Enhance tracking data using video analysis."""

    def fill_tracking_gaps(self, tracking_data, video_detections,
                           court_calibrator, max_gap_frames=10):
        """
        Fill gaps in tracking data using video detections.

        Args:
            tracking_data: DataFrame with tracking data (may have gaps)
            video_detections: Video detection results
            court_calibrator: Court calibration object
            max_gap_frames: Maximum gap size to fill

        Returns:
            DataFrame: Enhanced tracking data
        """
        enhanced = tracking_data.copy()

        for player_id in tracking_data['player_id'].unique():
            player_data = enhanced[enhanced['player_id'] == player_id]
            frames = player_data['frame'].values

            # Find gaps
            gaps = []
            for i in range(len(frames) - 1):
                if frames[i + 1] - frames[i] > 1:
                    gaps.append((frames[i], frames[i + 1]))

            # Fill each gap
            for gap_start, gap_end in gaps:
                if gap_end - gap_start > max_gap_frames:
                    continue

                # Get video detections in gap
                for frame_idx in range(gap_start + 1, gap_end):
                    if frame_idx >= len(video_detections):
                        continue

                    detections = video_detections[frame_idx]

                    # Find best matching detection
                    prev_pos = player_data[player_data['frame'] == gap_start][['x', 'y']].values[0]
                    next_pos = player_data[player_data['frame'] == gap_end][['x', 'y']].values[0]

                    # Interpolate expected position
                    alpha = (frame_idx - gap_start) / (gap_end - gap_start)
                    expected_pos = prev_pos * (1 - alpha) + next_pos * alpha

                    best_det = None
                    best_dist = 10.0

                    for det in detections:
                        if det['class_name'] != 'person':
                            continue

                        foot_pos = (
                            (det['bbox'][0] + det['bbox'][2]) / 2,
                            det['bbox'][3]
                        )
                        court_pos = court_calibrator.image_to_court(foot_pos)[0]

                        dist = np.sqrt(
                            (court_pos[0] - expected_pos[0])**2 +
                            (court_pos[1] - expected_pos[1])**2
                        )

                        if dist < best_dist:
                            best_dist = dist
                            best_det = court_pos

                    if best_det is not None:
                        # Add filled data point
                        new_row = pd.DataFrame([{
                            'frame': frame_idx,
                            'player_id': player_id,
                            'x': best_det[0],
                            'y': best_det[1],
                            'source': 'video_fill'
                        }])
                        enhanced = pd.concat([enhanced, new_row], ignore_index=True)

        return enhanced.sort_values(['player_id', 'frame'])

27.9 Future Directions in Basketball CV

27.9.1 Emerging Technologies

Real-Time 3D Reconstruction

Future systems will create full 3D models of games in real-time: - Volumetric capture using multiple synchronized cameras - Neural radiance fields (NeRF) for novel view synthesis - Real-time rendering for immersive viewing experiences

Federated Learning for Privacy

As video analysis extends to youth and amateur basketball: - Models trained across multiple locations without sharing raw video - Privacy-preserving analytics - Compliance with regulations around minor athletes

Edge Computing

Moving processing closer to cameras: - Reduced latency for real-time feedback - Lower bandwidth requirements - Offline capability for practice facilities

27.9.2 Advanced Applications

Predictive Analytics

Combining CV with predictive models: - Real-time shot probability based on defender position and shooter form - Play success prediction during execution - Injury risk detection from movement patterns

Automated Coaching

Systems that provide actionable feedback: - Real-time technique correction - Personalized practice recommendations - Strategic suggestions during games

Enhanced Broadcasting

Improving viewer experience: - Automatic highlight generation - Augmented reality statistics overlays - Alternative camera angle reconstruction

27.9.3 Challenges Ahead

Despite rapid progress, significant challenges remain:

Generalization: Models trained on professional footage may not perform well on different court types, camera angles, or player populations.

Interpretability: Deep learning models often function as "black boxes," making it difficult to understand why specific predictions are made.

Data Quality: Video quality varies significantly across levels of basketball, affecting analysis reliability.

Computational Cost: State-of-the-art models require significant computational resources, limiting accessibility.

Ethical Considerations: Surveillance concerns, consent issues, and fair use of performance data require ongoing attention.


27.10 Practical Implementation Considerations

27.10.1 Hardware Requirements

Application Minimum Hardware Recommended Hardware
Pose estimation (MediaPipe) Modern CPU Any GPU
Object detection (YOLO) CPU (slow) GPU with 4GB+ VRAM
Action recognition GPU required GPU with 8GB+ VRAM
Real-time analysis High-end GPU Multiple GPUs or cloud

27.10.2 Software Stack

A typical basketball CV system uses:

Framework Layer:
- PyTorch or TensorFlow (deep learning)
- OpenCV (image processing)
- NumPy (numerical operations)

Model Layer:
- Ultralytics YOLO (detection)
- MediaPipe (pose estimation)
- Torchvision (action recognition)

Application Layer:
- Custom basketball-specific code
- Integration with team databases
- Visualization tools

27.10.3 Best Practices

  1. Start Simple: Begin with pre-trained models before attempting custom training
  2. Validate Thoroughly: Test systems across multiple games, venues, and conditions
  3. Document Limitations: Clearly communicate what the system can and cannot do
  4. Iterate Incrementally: Add complexity gradually based on actual needs
  5. Maintain Human Oversight: Use CV to augment, not replace, human analysis

Summary

Computer vision has transformed basketball analytics from manually intensive processes to automated, scalable systems. This chapter covered the fundamental technologies enabling this transformation:

  • Tracking systems capture player and ball positions at high frequency, generating massive datasets for analysis
  • Pose estimation tools like MediaPipe and OpenPose enable detailed biomechanical analysis
  • Action recognition automates the identification of specific plays and movements
  • Object detection provides the foundation for player and ball tracking from video
  • Camera calibration enables conversion between image and court coordinates
  • Integration workflows combine multiple data sources for comprehensive analysis

While significant challenges remain, the trajectory is clear: basketball analysis will increasingly rely on automated video processing to generate insights previously impossible to capture. Practitioners who understand both the capabilities and limitations of these technologies will be well-positioned to leverage them effectively.

The next chapter explores how these computer vision outputs feed into predictive models and decision-support systems, completing the picture of modern basketball analytics infrastructure.


References

  1. Cao, Z., et al. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  2. Lugaresi, C., et al. (2019). MediaPipe: A Framework for Building Perception Pipelines. arXiv preprint arXiv:1906.08172.

  3. Carreira, J., & Zisserman, A. (2017). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CVPR.

  4. Redmon, J., et al. (2016). You Only Look Once: Unified, Real-Time Object Detection. CVPR.

  5. Second Spectrum. (2023). NBA Tracking Technology Documentation.

  6. Hartley, R., & Zisserman, A. (2003). Multiple View Geometry in Computer Vision. Cambridge University Press.