Training¶
This document covers how to run the training scripts and documents baseline results.
Object Detector Training¶
This document covers how to run the object detector training script and documents baseline training results.
Script: train_object_detector.py¶
Location: src/footy_track/scripts/train_object_detector.py
Downloads a labelled dataset from Roboflow, fine-tunes a YOLO model on it, and saves the best weights.
Prerequisites¶
Environment variables — .envrc
A .envrc file (managed by direnv) is provided at the repo root to set local environment variables. It is listed in .gitignore and is not committed.
# .envrc (already present in the repo root)
export DATA_ROOT="$PWD/data"
DATA_ROOT points to the data/ directory at the project root, which is where downloaded datasets and other local data files are stored. Run direnv allow once after cloning to activate it.
Set the Roboflow API key:
export ROBOFLOW_API_KEY=<your-api-key>
The key can be found in your Roboflow workspace settings. Alternatively the SDK reads it from ~/.config/roboflow/config.json if you have previously run roboflow login.
Usage¶
The script reads DATA_ROOT from the environment (set by .envrc) and downloads the dataset to $DATA_ROOT/detection_dataset/roboflow_dataset_<version>/.
uv run python src/footy_track/scripts/train_object_detector.py \
--model yolo11n \
--dataset-version 3 \
--freeze 9 \
--epochs 5
To run without direnv active, pass DATA_ROOT inline:
DATA_ROOT="$PWD/data" uv run python src/footy_track/scripts/train_object_detector.py \
--model yolo11n \
--dataset-version 3 \
--freeze 9 \
--epochs 1
Arguments¶
| Argument | Default | Description |
|---|---|---|
--model |
yolo11n |
YOLO model variant (e.g. yolo11n, yolo11m). .pt is appended automatically. |
--dataset-version |
3 |
Roboflow dataset version to download. |
--freeze |
9 |
Number of backbone layers to freeze. Freezing early layers speeds up training when fine-tuning from a pretrained checkpoint. |
--epochs |
50 |
Number of training epochs. |
Outputs¶
Weights and plots are saved to:
footy_scan_detection/<run-name>/weights/best.pt
footy_scan_detection/<run-name>/weights/last.pt
<run-name> encodes the timestamp and hyperparameters, for example:
2026-04-26_08-33_model_name=yolo11n_dataset_version=3_epochs=5_freeze_layers=9
Training metrics are also logged to Weights & Biases under the footy_scan_detection project.
Dataset¶
- Source: Roboflow workspace
egroeg121, projectfooty-track-detection - Version 3 split: 162 train / 49 val images
- Classes (7):
coach,in_play_ball,person,player,player_sub,referee,+ keeper - Downloaded to:
$DATA_ROOT/detection_dataset/roboflow_dataset_<version>/(defaults todata/detection_dataset/...ifDATA_ROOTis unset)
Baseline Run Results¶
Run: 2026-04-26_08-33_model_name=yolo11n_dataset_version=3_epochs=5_freeze_layers=9
Configuration¶
| Parameter | Value |
|---|---|
| Model | yolo11n (YOLO11 nano) |
| Parameters | 2,591,205 (2.6M) |
| GFLOPs | 6.4 |
| Dataset version | 3 |
| Epochs | 5 |
| Frozen layers | 9 |
| Image size | 640 |
| Device | MPS (Apple M4) |
| Optimizer | AdamW (lr=0.000909, momentum=0.9) |
| Augmentation | RandAugment + mosaic |
| Training time | ~1.3 min (0.022 hours) |
Epoch-by-Epoch Metrics (validation)¶
| Epoch | box_loss | cls_loss | dfl_loss | mAP50 | mAP50-95 |
|---|---|---|---|---|---|
| 1 | 1.286 | 4.163 | 0.926 | 0.0454 | 0.0135 |
| 2 | 1.223 | 3.500 | 0.882 | 0.0412 | 0.0111 |
| 3 | 1.251 | 3.026 | 0.886 | 0.0667 | 0.0214 |
| 4 | 1.251 | 2.610 | 0.876 | 0.0767 | 0.0379 |
| 5 | 1.240 | 2.276 | 0.897 | 0.0892 | 0.0482 |
Final Validation (best.pt)¶
| Class | Images | Instances | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|---|---|
| all | 49 | 921 | 0.022 | 0.267 | 0.091 | 0.047 |
| player | 49 | 778 | 0.122 | 0.730 | 0.481 | 0.241 |
| coach | 8 | 13 | 0.003 | 0.615 | 0.026 | 0.020 |
| referee | 47 | 77 | 0.004 | 0.130 | 0.021 | 0.009 |
| player_sub | 3 | 8 | 0.003 | 0.125 | 0.019 | 0.010 |
| in_play_ball | 44 | 44 | 0.000 | 0.000 | 0.000 | 0.000 |
| person | 1 | 1 | 0.000 | 0.000 | 0.000 | 0.000 |
Inference speed: 76.3 ms/image (preprocess: 0.1 ms, postprocess: 65.4 ms)
Observations¶
- Player detection is the strongest class (mAP50 = 0.481), which is expected given it dominates the training set (778 of 921 validation instances).
- Ball and person classes have zero mAP after 5 epochs — they are underrepresented or difficult at this scale. More epochs and/or a larger model will be needed for those classes.
- Classification loss is still decreasing across all 5 epochs, indicating more training would improve results. This run is a baseline only.
- 5 epochs with 9 frozen layers is a minimal sanity-check run. For a production checkpoint, use more epochs (≥50) and consider unfreezing more layers or using a larger model variant (
yolo11m,yolo11l).
W&B Run¶
Metrics, curves and model artifacts are synced to Weights & Biases:
- Project:
footy_scan_detection - Baseline run:
ze990tv6
Classifier Training¶
This section covers the broadcast-frame classifier: a model that determines whether a video frame shows the game in play.
Script: train_classifier.py¶
Location: src/footy_track/scripts/train_classifier.py
Downloads a labelled dataset from Roboflow, fine-tunes a YOLO classification model on it, and saves the best weights.
Prerequisites¶
Same .envrc and ROBOFLOW_API_KEY setup as the object detector (see above).
Usage¶
uv run python src/footy_track/scripts/train_classifier.py \
--model yolo11n-cls \
--dataset-version 10 \
--freeze 9 \
--epochs 5
To run without direnv active, pass DATA_ROOT inline:
DATA_ROOT="$PWD/data" uv run python src/footy_track/scripts/train_classifier.py \
--model yolo11n-cls \
--dataset-version 10 \
--freeze 9 \
--epochs 5
Arguments¶
| Argument | Default | Description |
|---|---|---|
--model |
yolo11n-cls |
YOLO classification model variant. .pt is appended automatically. |
--dataset-version |
10 |
Roboflow dataset version to download. |
--freeze |
9 |
Number of backbone layers to freeze. |
--epochs |
50 |
Number of training epochs. |
Outputs¶
Weights and plots are saved to:
footy_scan_classifier/<run-name>/weights/best.pt
footy_scan_classifier/<run-name>/weights/last.pt
<run-name> encodes the timestamp and hyperparameters, for example:
2026-04-26_09-16_model_name=yolo11n-cls_dataset_version=10_epochs=5_freeze_layers=9
Dataset¶
- Source: Roboflow workspace
egroeg121, projectfooty-track-broadcast-frame - Version 10 split:
| Split | Images | Classes present |
|---|---|---|
| train | 392 | No (165), Unlabeled (5), Yes (222) |
| valid | 104 | No (46), Yes (58) |
| test | 100 | No (47), Yes (53) |
- Classes:
No(frame not in play),Yes(frame in play),Unlabeled(ambiguous — train only) - Note:
Unlabeledis absent from val and test splits. Accuracy metrics during training are reported on the two-class (No/Yes) val set. - Downloaded to:
$DATA_ROOT/classifier_dataset/roboflow_dataset_<version>/
Baseline Run Results¶
Run: 2026-04-26_09-16_model_name=yolo11n-cls_dataset_version=10_epochs=5_freeze_layers=9
Configuration¶
| Parameter | Value |
|---|---|
| Model | yolo11n-cls (YOLO11 nano classifier) |
| Parameters | 1,534,947 (1.5M) |
| GFLOPs | 3.3 |
| Dataset version | 10 |
| Epochs | 5 |
| Frozen layers | 9 |
| Image size | 224 |
| Device | MPS (Apple M4) |
| Optimizer | AdamW (lr=0.001429, momentum=0.9) |
| Augmentation | RandAugment |
| Training time | ~36 s (0.010 hours) |
Epoch-by-Epoch Metrics (validation)¶
| Epoch | train_loss | top1_acc | top5_acc |
|---|---|---|---|
| 1 | 0.734 | 0.356 | 1.0 |
| 2 | 0.288 | 0.404 | 1.0 |
| 3 | 0.185 | 0.404 | 1.0 |
| 4 | 0.190 | 0.413 | 1.0 |
| 5 | 0.149 | 0.413 | 1.0 |
Final Validation (best.pt)¶
| Metric | Value |
|---|---|
| top1_acc | 0.413 |
| top5_acc | 1.0 |
Inference speed: 0.1 ms preprocess, 2.9 ms inference, 0.0 ms postprocess per image
Observations¶
- top1_acc plateaus at 41.3% after epoch 2 — well below a useful threshold for production use. With only the classifier head unfrozen (9 of 10 backbone layers frozen), the model has limited capacity to adapt to this task.
- top5_acc is trivially 1.0 — there are only 3 classes, so a top-5 prediction always includes the correct label. This metric is not meaningful here.
- Val loss diverges while train loss falls (train: 0.149 vs W&B final val: ~5.97), indicating overfitting even at 5 epochs. The training set is small (392 images) and the val set is missing the
Unlabeledclass. Unlabeledclass has only 5 training samples — the model cannot learn this class reliably. Consider merging it intoNoor removing it from the dataset for future runs.- For a production checkpoint, unfreeze more layers (e.g.
--freeze 0), train for more epochs (≥50), and consider a larger model (yolo11s-cls,yolo11m-cls). Data augmentation and a larger dataset would also help.
W&B Runs¶
Metrics, curves and model artifacts are synced to Weights & Biases:
- Project:
footy_scan_classifier - Baseline run (5 epochs):
ds9q9dr6 - 50-epoch reproduction run:
n5fh28pv
Reference Run: W&B aakevy06 (January 2026)¶
The best previously-known classifier result is W&B run aakevy06, trained in January 2026 from the football-scan project path. A side-by-side W&B-API hyperparameter diff against the April reproduction (n5fh28pv) and root-cause analysis of the 0.981 → 0.433 regression are recorded in training/notable_runs.md.
Configuration (from W&B)¶
| Parameter | Value |
|---|---|
| Model | yolo11n-cls |
| Epochs | 50 |
| Frozen layers | 9 |
| Dataset version | 10 |
| Image size | 224 |
| Optimizer | auto (AdamW) |
| Device | MPS |
| Runtime | ~316 s (0.088 hours) |
Results¶
| Metric | Value |
|---|---|
| top1_acc | 0.981 |
| top5_acc | 1.0 |
| train/loss | 0.024 |
| val/loss | 0.023 |
50-Epoch Reproduction Run¶
Attempted to reproduce aakevy06 using identical hyperparameters against the current dataset version 10.
Run: 2026-04-26_09-56_model_name=yolo11n-cls_dataset_version=10_epochs=50_freeze_layers=9
Configuration¶
| Parameter | Value |
|---|---|
| Model | yolo11n-cls |
| Parameters | 1,534,947 (1.5M) |
| GFLOPs | 3.3 |
| Dataset version | 10 |
| Epochs | 50 |
| Frozen layers | 9 |
| Image size | 224 |
| Device | MPS (Apple M4) |
| Optimizer | AdamW (lr=0.001429, momentum=0.9) |
| Training time | ~0.095 hours (~5.7 min) |
Epoch-by-Epoch Metrics (validation top1_acc, selected epochs)¶
| Epoch | top1_acc |
|---|---|
| 1 | 0.356 |
| 2 | 0.404 |
| 5 | 0.413 |
| 7 | 0.442 ← best |
| 10–49 | 0.433 (flat) |
| 50 | 0.433 |
Final Validation (best.pt)¶
| Metric | Value |
|---|---|
| top1_acc | 0.442 |
| top5_acc | 1.0 |
Inference speed: 0.1 ms preprocess, 1.8 ms inference, 0.0 ms postprocess per image
Results vs Reference¶
Reference aakevy06 |
This run | |
|---|---|---|
| top1_acc | 0.981 | 0.442 |
| train/loss | 0.024 | 0.030 |
| val/loss | 0.023 | 8.018 |
Why the Results Differ¶
The reproduction run achieved 0.442 top1_acc, far below the reference run's 0.981. The cause is a dataset class mismatch:
- Training set (current): 3 classes —
No(165),Unlabeled(5),Yes(222) - Val/test set (current): 2 classes —
NoandYesonly;Unlabeledis absent
Ultralytics warns about this at startup (found 2 classes, requires 3) but proceeds. The 3-class model is evaluated against a 2-class val set, so any image predicted as Unlabeled is counted wrong. Additionally, 0.442 is below the majority-class baseline of 0.558 (58/104 val images are Yes), suggesting the frozen backbone's learned representations do not transfer well to this task with only 5 Unlabeled training samples distorting the class boundaries.
The January 2026 reference run (aakevy06) likely used a snapshot of dataset version 10 where the val/test splits also contained Unlabeled samples (making evaluation consistent), or an earlier version where the dataset was binary. The dataset has since diverged.
Recommended Fix¶
To reproduce reference-level accuracy:
1. Remove or merge Unlabeled — relabel Unlabeled images as No or drop them. This makes training and validation consistent (binary No/Yes).
2. Re-download a corrected dataset version from Roboflow once the split is fixed.
3. Consider using --freeze 0 to allow full fine-tuning for better adaptation to this domain.