Training¶

This document covers how to run the training scripts and documents baseline results.

Object Detector Training¶

This document covers how to run the object detector training script and documents baseline training results.

Script: `train_object_detector.py`¶

Location: src/footy_track/scripts/train_object_detector.py

Downloads a labelled dataset from Roboflow, fine-tunes a YOLO model on it, and saves the best weights.

Prerequisites¶

Environment variables — .envrc

A .envrc file (managed by direnv) is provided at the repo root to set local environment variables. It is listed in .gitignore and is not committed.

# .envrc (already present in the repo root)
export DATA_ROOT="$PWD/data"

DATA_ROOT points to the data/ directory at the project root, which is where downloaded datasets and other local data files are stored. Run direnv allow once after cloning to activate it.

Set the Roboflow API key:

export ROBOFLOW_API_KEY=<your-api-key>

The key can be found in your Roboflow workspace settings. Alternatively the SDK reads it from ~/.config/roboflow/config.json if you have previously run roboflow login.

Usage¶

The script reads DATA_ROOT from the environment (set by .envrc) and downloads the dataset to $DATA_ROOT/detection_dataset/roboflow_dataset_<version>/.

uv run python src/footy_track/scripts/train_object_detector.py \
  --model yolo11n \
  --dataset-version 3 \
  --freeze 9 \
  --epochs 5

To run without direnv active, pass DATA_ROOT inline:

DATA_ROOT="$PWD/data" uv run python src/footy_track/scripts/train_object_detector.py \
  --model yolo11n \
  --dataset-version 3 \
  --freeze 9 \
  --epochs 1

Arguments¶

Argument	Default	Description
`--model`	`yolo11n`	YOLO model variant (e.g. `yolo11n`, `yolo11m`). `.pt` is appended automatically.
`--dataset-version`	`3`	Roboflow dataset version to download.
`--freeze`	`9`	Number of backbone layers to freeze. Freezing early layers speeds up training when fine-tuning from a pretrained checkpoint.
`--epochs`	`50`	Number of training epochs.

Outputs¶

Weights and plots are saved to:

footy_scan_detection/<run-name>/weights/best.pt
footy_scan_detection/<run-name>/weights/last.pt

<run-name> encodes the timestamp and hyperparameters, for example:

2026-04-26_08-33_model_name=yolo11n_dataset_version=3_epochs=5_freeze_layers=9

Training metrics are also logged to Weights & Biases under the footy_scan_detection project.

Dataset¶

Source: Roboflow workspace egroeg121, project footy-track-detection
Version 3 split: 162 train / 49 val images
Classes (7): coach, in_play_ball, person, player, player_sub, referee, + keeper
Downloaded to: $DATA_ROOT/detection_dataset/roboflow_dataset_<version>/ (defaults to data/detection_dataset/... if DATA_ROOT is unset)

Baseline Run Results¶

Run: 2026-04-26_08-33_model_name=yolo11n_dataset_version=3_epochs=5_freeze_layers=9

Configuration¶

Parameter	Value
Model	yolo11n (YOLO11 nano)
Parameters	2,591,205 (2.6M)
GFLOPs	6.4
Dataset version	3
Epochs	5
Frozen layers	9
Image size	640
Device	MPS (Apple M4)
Optimizer	AdamW (lr=0.000909, momentum=0.9)
Augmentation	RandAugment + mosaic
Training time	~1.3 min (0.022 hours)

Epoch-by-Epoch Metrics (validation)¶

Epoch	box_loss	cls_loss	dfl_loss	mAP50	mAP50-95
1	1.286	4.163	0.926	0.0454	0.0135
2	1.223	3.500	0.882	0.0412	0.0111
3	1.251	3.026	0.886	0.0667	0.0214
4	1.251	2.610	0.876	0.0767	0.0379
5	1.240	2.276	0.897	0.0892	0.0482

Final Validation (best.pt)¶

Class	Images	Instances	Precision	Recall	mAP50	mAP50-95
all	49	921	0.022	0.267	0.091	0.047
player	49	778	0.122	0.730	0.481	0.241
coach	8	13	0.003	0.615	0.026	0.020
referee	47	77	0.004	0.130	0.021	0.009
player_sub	3	8	0.003	0.125	0.019	0.010
in_play_ball	44	44	0.000	0.000	0.000	0.000
person	1	1	0.000	0.000	0.000	0.000

Inference speed: 76.3 ms/image (preprocess: 0.1 ms, postprocess: 65.4 ms)

Observations¶

Player detection is the strongest class (mAP50 = 0.481), which is expected given it dominates the training set (778 of 921 validation instances).
Ball and person classes have zero mAP after 5 epochs — they are underrepresented or difficult at this scale. More epochs and/or a larger model will be needed for those classes.
Classification loss is still decreasing across all 5 epochs, indicating more training would improve results. This run is a baseline only.
5 epochs with 9 frozen layers is a minimal sanity-check run. For a production checkpoint, use more epochs (≥50) and consider unfreezing more layers or using a larger model variant (yolo11m, yolo11l).

W&B Run¶

Metrics, curves and model artifacts are synced to Weights & Biases:

Project: footy_scan_detection
Baseline run: ze990tv6

Classifier Training¶

This section covers the broadcast-frame classifier: a model that determines whether a video frame shows the game in play.

Script: `train_classifier.py`¶

Location: src/footy_track/scripts/train_classifier.py

Downloads a labelled dataset from Roboflow, fine-tunes a YOLO classification model on it, and saves the best weights.

Prerequisites¶

Same .envrc and ROBOFLOW_API_KEY setup as the object detector (see above).

Usage¶

uv run python src/footy_track/scripts/train_classifier.py \
  --model yolo11n-cls \
  --dataset-version 10 \
  --freeze 9 \
  --epochs 5

To run without direnv active, pass DATA_ROOT inline:

DATA_ROOT="$PWD/data" uv run python src/footy_track/scripts/train_classifier.py \
  --model yolo11n-cls \
  --dataset-version 10 \
  --freeze 9 \
  --epochs 5

Arguments¶

Argument	Default	Description
`--model`	`yolo11n-cls`	YOLO classification model variant. `.pt` is appended automatically.
`--dataset-version`	`10`	Roboflow dataset version to download.
`--freeze`	`9`	Number of backbone layers to freeze.
`--epochs`	`50`	Number of training epochs.

Outputs¶

Weights and plots are saved to:

footy_scan_classifier/<run-name>/weights/best.pt
footy_scan_classifier/<run-name>/weights/last.pt

<run-name> encodes the timestamp and hyperparameters, for example:

2026-04-26_09-16_model_name=yolo11n-cls_dataset_version=10_epochs=5_freeze_layers=9

Dataset¶

Source: Roboflow workspace egroeg121, project footy-track-broadcast-frame
Version 10 split:

Split	Images	Classes present
train	392	No (165), Unlabeled (5), Yes (222)
valid	104	No (46), Yes (58)
test	100	No (47), Yes (53)

Classes: No (frame not in play), Yes (frame in play), Unlabeled (ambiguous — train only)
Note: Unlabeled is absent from val and test splits. Accuracy metrics during training are reported on the two-class (No / Yes) val set.
Downloaded to: $DATA_ROOT/classifier_dataset/roboflow_dataset_<version>/

Baseline Run Results¶

Run: 2026-04-26_09-16_model_name=yolo11n-cls_dataset_version=10_epochs=5_freeze_layers=9

Configuration¶

Parameter	Value
Model	yolo11n-cls (YOLO11 nano classifier)
Parameters	1,534,947 (1.5M)
GFLOPs	3.3
Dataset version	10
Epochs	5
Frozen layers	9
Image size	224
Device	MPS (Apple M4)
Optimizer	AdamW (lr=0.001429, momentum=0.9)
Augmentation	RandAugment
Training time	~36 s (0.010 hours)

Epoch-by-Epoch Metrics (validation)¶

Epoch	train_loss	top1_acc	top5_acc
1	0.734	0.356	1.0
2	0.288	0.404	1.0
3	0.185	0.404	1.0
4	0.190	0.413	1.0
5	0.149	0.413	1.0

Final Validation (best.pt)¶

Metric	Value
top1_acc	0.413
top5_acc	1.0

Inference speed: 0.1 ms preprocess, 2.9 ms inference, 0.0 ms postprocess per image

Observations¶

top1_acc plateaus at 41.3% after epoch 2 — well below a useful threshold for production use. With only the classifier head unfrozen (9 of 10 backbone layers frozen), the model has limited capacity to adapt to this task.
top5_acc is trivially 1.0 — there are only 3 classes, so a top-5 prediction always includes the correct label. This metric is not meaningful here.
Val loss diverges while train loss falls (train: 0.149 vs W&B final val: ~5.97), indicating overfitting even at 5 epochs. The training set is small (392 images) and the val set is missing the Unlabeled class.
Unlabeled class has only 5 training samples — the model cannot learn this class reliably. Consider merging it into No or removing it from the dataset for future runs.
For a production checkpoint, unfreeze more layers (e.g. --freeze 0), train for more epochs (≥50), and consider a larger model (yolo11s-cls, yolo11m-cls). Data augmentation and a larger dataset would also help.

W&B Runs¶

Metrics, curves and model artifacts are synced to Weights & Biases:

Project: footy_scan_classifier
Baseline run (5 epochs): ds9q9dr6
50-epoch reproduction run: n5fh28pv

Reference Run: W&B `aakevy06` (January 2026)¶

The best previously-known classifier result is W&B run aakevy06, trained in January 2026 from the football-scan project path. A side-by-side W&B-API hyperparameter diff against the April reproduction (n5fh28pv) and root-cause analysis of the 0.981 → 0.433 regression are recorded in training/notable_runs.md.

Configuration (from W&B)¶

Parameter	Value
Model	yolo11n-cls
Epochs	50
Frozen layers	9
Dataset version	10
Image size	224
Optimizer	auto (AdamW)
Device	MPS
Runtime	~316 s (0.088 hours)

Results¶

Metric	Value
top1_acc	0.981
top5_acc	1.0
train/loss	0.024
val/loss	0.023

50-Epoch Reproduction Run¶

Attempted to reproduce aakevy06 using identical hyperparameters against the current dataset version 10.

Run: 2026-04-26_09-56_model_name=yolo11n-cls_dataset_version=10_epochs=50_freeze_layers=9

Configuration¶

Parameter	Value
Model	yolo11n-cls
Parameters	1,534,947 (1.5M)
GFLOPs	3.3
Dataset version	10
Epochs	50
Frozen layers	9
Image size	224
Device	MPS (Apple M4)
Optimizer	AdamW (lr=0.001429, momentum=0.9)
Training time	~0.095 hours (~5.7 min)

Epoch-by-Epoch Metrics (validation top1_acc, selected epochs)¶

Epoch	top1_acc
1	0.356
2	0.404
5	0.413
7	0.442 ← best
10–49	0.433 (flat)
50	0.433

Final Validation (best.pt)¶

Metric	Value
top1_acc	0.442
top5_acc	1.0

Inference speed: 0.1 ms preprocess, 1.8 ms inference, 0.0 ms postprocess per image

Results vs Reference¶

	Reference `aakevy06`	This run
top1_acc	0.981	0.442
train/loss	0.024	0.030
val/loss	0.023	8.018

Why the Results Differ¶

The reproduction run achieved 0.442 top1_acc, far below the reference run's 0.981. The cause is a dataset class mismatch:

Training set (current): 3 classes — No (165), Unlabeled (5), Yes (222)
Val/test set (current): 2 classes — No and Yes only; Unlabeled is absent

Ultralytics warns about this at startup (found 2 classes, requires 3) but proceeds. The 3-class model is evaluated against a 2-class val set, so any image predicted as Unlabeled is counted wrong. Additionally, 0.442 is below the majority-class baseline of 0.558 (58/104 val images are Yes), suggesting the frozen backbone's learned representations do not transfer well to this task with only 5 Unlabeled training samples distorting the class boundaries.

The January 2026 reference run (aakevy06) likely used a snapshot of dataset version 10 where the val/test splits also contained Unlabeled samples (making evaluation consistent), or an earlier version where the dataset was binary. The dataset has since diverged.

Recommended Fix¶

To reproduce reference-level accuracy: 1. Remove or merge Unlabeled — relabel Unlabeled images as No or drop them. This makes training and validation consistent (binary No/Yes). 2. Re-download a corrected dataset version from Roboflow once the split is fixed. 3. Consider using --freeze 0 to allow full fine-tuning for better adaptation to this domain.

Training¶

Object Detector Training¶

Script: train_object_detector.py¶

Prerequisites¶

Usage¶

Arguments¶

Outputs¶

Dataset¶

Baseline Run Results¶

Configuration¶

Epoch-by-Epoch Metrics (validation)¶

Final Validation (best.pt)¶

Observations¶

W&B Run¶

Classifier Training¶

Script: train_classifier.py¶

Prerequisites¶

Usage¶

Arguments¶

Outputs¶

Dataset¶

Baseline Run Results¶

Configuration¶

Epoch-by-Epoch Metrics (validation)¶

Final Validation (best.pt)¶

Observations¶

W&B Runs¶

Reference Run: W&B aakevy06 (January 2026)¶

Configuration (from W&B)¶

Results¶

50-Epoch Reproduction Run¶

Configuration¶

Epoch-by-Epoch Metrics (validation top1_acc, selected epochs)¶

Final Validation (best.pt)¶

Results vs Reference¶

Why the Results Differ¶

Recommended Fix¶

Script: `train_object_detector.py`¶

Script: `train_classifier.py`¶

Reference Run: W&B `aakevy06` (January 2026)¶