footy-track System Overview¶
This document consolidates the architecture, pipeline stages, data models, time conventions, and module overview for footy-track (Footy Scan). It is the single authoritative reference for contributors and agents.
Architecture¶
footy-track is a three-stage video analysis pipeline:
Video Input
│
▼
┌─────────────────┐
│ InputConsumer │ Decodes frames, stamps each with GameTime + video metadata
└────────┬────────┘
│ Frames (image + timestamp)
▼
┌─────────────────────────────────────────────────────┐
│ Processor │
│ ┌─────────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Detection │→ │ Tracking │→ │ Event Extract │ │
│ └─────────────┘ └──────────┘ └───────────────┘ │
└────────────────────┬────────────────────────────────┘
│ Per-frame and per-event records (ContinuousTime)
▼
┌──────────────────┐
│ OutputProducer │ Writes JSON / CSV to disk, message bus, or API
└──────────────────┘
Each component is independently replaceable. Timestamps flow through all stages as ContinuousTime (see Time Conventions).
Pipeline Stages¶
Stage 1 — Detection¶
Locates objects (players, ball, referee, coach, substitutes) in each frame.
- Interface:
ObjectDetector.predict_from_path(image_path) → FrameDetections - Implementations (in
detectors/): UltralyticsObjectDetector— YOLO11-based detector; maps COCO class indices topersonandball.UltralyticsSam3Detector— SAM 3 text-prompted segmentation; uses natural-language prompts per object category with per-prompt confidence thresholds; derives bounding boxes from segmentation masks with optional padding; applies custom centre-distance NMS to suppress duplicates.- Output:
FrameDetections— list ofObjectDetectionwith normalized[x, y, w, h]boxes in[0, 1].
Stage 2 — Tracking (planned)¶
Associates detections across frames to produce persistent object IDs.
- Intended implementation: Hungarian-algorithm assignment (via the
laplibrary). - Not yet implemented as a standalone module; currently the detection output is consumed frame-by-frame.
Stage 3 — Classification / Event Extraction¶
Two sub-tasks:
- Broadcast classification — decides whether a frame is a broadcast/camera view suitable for detection. Implemented in
classifier.py. UltralyticsClassifieruses a fine-tuned YOLO classification model (yolo11n-cls).-
Returns
BroadcastClassification(Yes/No + confidence). -
Labelling / Roboflow upload —
labelling.pyorchestrates classification-filtered, pre-annotated uploads to Roboflow for dataset curation. This feeds the training loop, not the live inference pipeline.
Data Models¶
All models live in src/footy_track/schema.py and are Pydantic BaseModel subclasses (frozen by default where noted).
ObjectDetection¶
Single detected object in a frame.
| Field | Type | Description |
|---|---|---|
label |
str |
Class name (see Detection Classes) |
confidence |
float |
Model confidence in [0, 1] |
x |
float |
Top-left x, normalized [0, 1] |
y |
float |
Top-left y, normalized [0, 1] |
w |
float |
Box width, normalized [0, 1] |
h |
float |
Box height, normalized [0, 1] |
model |
str \| None |
Model identifier that produced this detection |
Subclasses: Person (label locked to "person"), Ball (label locked to "ball").
FrameDetections¶
All detections for a single frame.
| Field | Type | Description |
|---|---|---|
uri |
Path |
Path to the source image |
width |
int |
Image width in pixels |
height |
int |
Image height in pixels |
detections |
list[ObjectDetection] |
All detections in this frame |
FrameDetectionsWithMeta extends this with an optional clock: str for raw broadcast clock text.
BroadcastClassification¶
Result of classifying whether a frame is a broadcast view.
| Field | Type | Description |
|---|---|---|
label |
EnumBroadcastClassification |
"Yes" or "No" |
confidence |
float \| None |
Model confidence |
FrameClassifications¶
Wraps a BroadcastClassification for a specific frame URI.
| Field | Type | Description |
|---|---|---|
uri |
Path |
Path to the source image |
classification |
BroadcastClassification |
Classification result |
Provides .to_fiftyone_sample() for FiftyOne dataset integration.
Detection Classes¶
Defined in constants.py and referenced by DETECTION_CLASSES in schema.py:
| Constant | Value | Meaning |
|---|---|---|
PERSON_TAG |
"person" |
Generic person (pre-role assignment) |
BALL_TAG |
"ball" |
Ball (role unspecified) |
IN_PLAY_BALL_TAG |
"in_play_ball" |
Ball on the pitch |
OUT_OF_PLAY_BALL_TAG |
"out_of_play_ball" |
Ball off the pitch |
PLAYER_TAG |
"player" |
Outfield player or goalkeeper on team |
PLAYER_SUB_TAG |
"player_sub" |
Substitute waiting off the pitch |
REFEREE_TAG |
"referee" |
Match official |
COACH_TAG |
"coach" |
Coach on sideline |
Time Conventions¶
Full reference:
docs/timings.md
Two time systems are used throughout the pipeline:
ContinuousTime¶
- Seconds from first-half kickoff (
0.0). - Never resets — continuous across both halves and stoppage time.
- Canonical format for all stored records and downstream consumers.
GameTime¶
- The referee / broadcast clock. Resets to
0:00at the start of each half. - Used by video sources and broadcasters.
- Must be converted to ContinuousTime before storage.
Conversion (via GameMetadata)¶
| Half | Formula |
|---|---|
| First | ContinuousTime = GameTime_seconds |
| Second | ContinuousTime = GameTime_seconds + half_start_continuous |
half_start_continuous is the ContinuousTime at which the second half kicked off (accounts for first-half stoppage time).
Example: First half ran 48.5 minutes → half_start_continuous = 2910.0 s. A second-half GameTime of 05:00 → ContinuousTime = 300 + 2910 = 3210.0 s.
Module Overview¶
src/footy_track/
├── schema.py Pydantic data models for all pipeline objects
├── constants.py Detection class labels and Roboflow project names
├── classifier.py Broadcast-frame classification (YOLO cls)
├── labelling.py Roboflow dataset upload handlers (classification + detection)
├── logging.py Logging configuration
├── utils.py Project-root resolution and shared helpers
├── detectors/
│ ├── base.py Abstract ObjectDetector interface
│ ├── ultralytics.py YOLO and SAM3 detector implementations
│ ├── utils.py Device selection, bbox conversion, FiftyOne helpers, IoU
│ └── constants.py Detector-specific constants
└── scripts/
├── classify_frames.py Run broadcast classifier over a directory
├── detect_objects.py Run object detector over a directory
├── extract_frames.py Extract frames from a video file
├── split_video.py Split video into clips
├── upload_classifier_frames.py Upload frames to Roboflow classification project
└── upload_object_detector_frames.py Upload frames to Roboflow detection project
detectors/utils.py key functions¶
| Function | Purpose |
|---|---|
_available_device() |
Returns MPS → CUDA → CPU torch.device |
ultralytics_result_to_detections() |
Converts Ultralytics Results to list[ObjectDetection] |
top_left_wh_to_yolo_xywh() |
Converts normalized top-left box to YOLO center format |
visualise_detections_on_image() |
Draws detections on an image for debugging |
detection_to_fiftyone() |
Converts ObjectDetection to FiftyOne Detection |
calculate_iou() |
IoU between two ObjectDetection boxes |
metaflow/¶
Contains Metaflow workflow definitions for batch processing pipelines (e.g., processing large video archives). See metaflow/docker/ for containerisation support.
Key Design Decisions¶
- Normalized coordinates: All bounding boxes use
[0, 1]normalized[x, y, w, h](top-left origin) internally. Conversion to pixel space or COCO/YOLO formats happens at output boundaries. - Pluggable detectors: New detectors implement
ObjectDetector.predict_from_pathand returnFrameDetections— the rest of the pipeline is unchanged. - Broadcast filter: Frames are pre-screened by the broadcast classifier before object detection, reducing compute on non-pitch frames (replays, graphics, crowd shots).
- SAM3 for fine-grained detection: SAM3 is text-prompted per object category, enabling differentiation between in-play/out-of-play balls, substitutes, coaches, and referees that a standard COCO-trained YOLO model cannot distinguish.
- ContinuousTime is canonical: All output records carry ContinuousTime so downstream consumers can align, resample, or merge observations without knowing match structure.