RaceLabs Research · Accuracy, Validation & mAP Scores for Race Photo Sorting

Q: Do you need the bib or number plate to be visible?

No. RaceLabs identifies each rider by appearance, the helmet design, riding kit and machine livery, and tracks that identity across frames and through video. Hidden, blurred, dirty or missing numbers still land in the right folder. A readable number simply lets the system name the folder; it is not what identity depends on.

Q: How is this different from RaceTagger and other number-recognition sorters?

Tools built around reading the bib or number plate, such as RaceTagger and similar OCR-based taggers, only work when that number is legible. In motorsport that fails constantly: packs, cornering, motion blur, mud, full-face helmets and free-practice sessions with no numbers fitted. RaceLabs uses appearance-first matching, which keeps working in those conditions. It is a structural difference in method, not a tuning advantage.

Q: What does 99.6% same-rider precision mean for my delivery?

Of every photo RaceLabs places in a rider's folder, 99.6% genuinely belong to that rider, so a client almost never finds someone else's bike in their gallery. RaceLabs deliberately tunes toward precision: when in doubt, a frame goes to the Unclear bin rather than into the wrong folder.

Q: Why is recall lower than precision on new riders?

Recall here is a clustering metric, not a count of lost photos. B-cubed recall measures whether all of a rider's photos gathered into a single folder. When it dips it is almost always over-segmentation, the same rider split across two folders because, for example, a bike head-on and from behind look genuinely different, not photos discarded or misfiled. With precision at 99.6 percent those folders are clean and usually just need a one-click merge, which a readable race number performs automatically.

Q: Can these numbers be trusted, or are they cherry-picked?

They are measured on held-out identities excluded from training, using standard retrieval metrics: mAP, B-cubed precision and recall, and Adjusted Rand Index. RaceLabs publishes every validation split, including the harder cross-championship and unseen-rider sets where scores are lower, across 176 held-out riders and 6,505 photos.

Abstract

RaceLabs identifies each rider by appearance, helmet livery, suit and machine, tracked from photo to photo and through video, rather than by reading a bib or number plate. On held-out validation across multiple championships, this reaches a mean Average Precision (mAP) above 0.99 and a same-rider precision of 99.6%: when RaceLabs places two photos in one rider's folder, they are the same rider 99.6% of the time. Because identity does not depend on a legible number, accuracy holds when the number is occluded, blurred, dirty, angled away or absent entirely, the exact frames where bib- and number-recognition taggers drop or misfile a photo.

01 · Methodology

Measured on riders
it has never seen.

Scoring a system on riders it has trained on is easy and tells you little. The real test is riders it has never seen. Every number on this page is measured on held-out identities: riders that appear nowhere in training. We report the standard metrics used in peer-reviewed person re-identification research.

mAP, mean Average Precision

The reference metric in academic image retrieval. For every photo of a rider, it asks how cleanly the system ranks that rider's other photos above everyone else's. 1.000 is perfect retrieval. RaceLabs sits at 0.99+, and up to 0.998 on a hard cross-championship split.

Same-rider precision

Of every photo placed into a rider's folder, the share that genuinely belongs there. This is the number that decides whether a client ever opens their gallery and finds a stranger's bike. RaceLabs holds 99.6%, and we deliberately tune toward precision.

What recall really measures

B³ recall asks whether all of one rider's photos landed in a single folder. When it dips it's almost never photos lost, it's a rider split across two folders (a bike head-on and from behind genuinely look different), which is a one-click merge. Precision stays at 99.6%, so nothing is misfiled, and a readable number merges the split automatically.

Held-out, across championships

Validation spans clean single events, an unseen set of brand-new riders, and a harder cross-championship split (MotoGP / WSBK-class footage). One model, multiple domains, so the score reflects a real race weekend, not a cherry-picked clip.

02 · Results

Held-out validation,
every split.

No averaging away the hard cases. Below is the deployed production model measured on each validation set, the easy ones and the brutal ones. Higher is better on all columns.

Validation split	Held-out riders	mAP	Precision	Recall	Cluster agreement (ARI)
Single event, clean conditionsbaseline split	25	1.000	1.000	1.000	1.000
Cross-championship, hardMotoGP / WSBK-class	78	0.998	0.993	0.961	0.981
Unseen new ridersstrangers, conservative by design	73	0.982	0.996	0.722	0.819
Full mixed benchmarkProductionall riders, all conditions	176	0.991	0.996	0.863	0.919

mAP mean Average Precision · Precision / Recall B³ cluster metrics · ARI Adjusted Rand Index, agreement with ground-truth grouping.
On the full production benchmark, 123 of 176 rider identities are reconstructed perfectly end-to-end. Lower recall on unseen riders is over-segmentation, not loss, a rider's photos occasionally span two clean folders that merge in one click; with precision at 0.996, nothing is misfiled.

03 · Why the approach wins

Number-reading breaks
where racing lives.

Most sorting tools, number-plate readers, bib-recognition taggers, face-match galleries, work by reading the race number or face in the frame, a technique called optical character recognition (OCR). If that number or face isn't legible, they have nothing. In motorsport, illegible is not an edge case; it's most of the weekend. RaceLabs identifies by appearance and tracks identity across frames, so it holds where token-reading collapses. This is a structural difference, not a tuning gap.

Real race-weekend frame

Number / face OCR taggers

RaceLabs

Number plate hidden behind another bike in a pack

✕ nothing to read → dropped or misfiled

✓ identity held by helmet, suit & machine

Rider mid-corner, number angled away from the lens

✕ no readable digits

✓ recognised from livery, tracked across the sequence

Motion blur on a straight at full speed

✕ garbled OCR → wrong rider

✓ appearance is robust to blur

Free practice or testing, no race numbers fitted

✕ no token exists at all

✓ still sorts every rider into a folder

Rain & mud over the plate

✕ obscured → fails

✓ unaffected

Full-face helmet, no visible face

✕ face-match has nothing to match

✓ doesn't need a face

A readable number is welcome, RaceLabs uses it to name the folder. It just isn't what the identity depends on.

04 · The metric that pays

Precision is what
the rider feels.

A rider knows their own session. They lived every corner. So when a number-OCR tool slips someone else's bike into their gallery, the rider spots it instantly, and that gallery becomes the reason they don't buy, and the story they tell other riders. Same-rider precision is the number standing between you and that email. At 99.6%, it isn't sent.

Photographers who switch tell us the same thing: with number-only tools, clients sometimes found the wrong bike in their gallery before the photographer did.

Field feedback, motorsport photographers on RaceLabs

05 · Scope & limits

What we don't
pretend.

Hard cases go to a human

When a frame is genuinely ambiguous, even to the photographer, RaceLabs doesn't guess: it sends the shot to the Unclear bin for a person to resolve in seconds. And when a rider changes bikes mid-session, their photos can split into two separate clusters; the pair finder surfaces those so you merge the pair in one step. That's why precision stays at 99.6% instead of inventing confidence it doesn't have.

Measured on our own domain

These results are on motorsport footage, bikes and cars on track, which is exactly what we build for. We report the splits as they are, including the harder cross-championship and unseen-rider sets, so the numbers describe a real weekend rather than a friendly one.

06 · Questions

Accuracy, answered.

What does an mAP of 0.99+ actually mean?

mAP (mean Average Precision) is the standard accuracy metric used in academic image retrieval. For each photo of a rider, it measures how cleanly the system ranks that rider's other photos above everyone else's, then averages across all riders. A score of 1.000 is perfect. RaceLabs measures above 0.99 on held-out riders, identities that appear nowhere in training, and up to 0.998 on a hard cross-championship split.

Do you need the bib or number plate to be visible?

No. RaceLabs identifies each rider by appearance, helmet design, riding kit and machine livery, and tracks that identity across frames and through video. Hidden, blurred, dirty or missing numbers still land in the right folder. A readable number simply lets the system name the folder; it is not what identity depends on.

How is this different from RaceTagger and other number-recognition sorters?

Tools built around reading the bib or number plate (such as RaceTagger and similar OCR-based taggers) only work when that number is legible in the frame. In motorsport that fails constantly, packs, cornering, motion blur, mud, full-face helmets, and free-practice sessions with no numbers fitted. RaceLabs uses appearance-first matching, which keeps working in exactly those conditions. It's a structural difference in method, not a tuning advantage.

What does 99.6% same-rider precision mean for my delivery?

Of every photo RaceLabs places in a rider's folder, 99.6% genuinely belong to that rider. In practice that means a client almost never opens their gallery and finds someone else's bike, the single failure that costs a sale and a reputation. We deliberately tune toward precision: when in doubt, a frame goes to the Unclear bin rather than into the wrong folder.

Why is recall lower than precision on new riders?

Because recall here is a clustering metric, not a count of lost photos. B³ recall measures whether all of a rider's photos gathered into one folder. When it dips, it's almost always over-segmentation, the same rider split across two folders because, say, a bike head-on and from behind look genuinely different, not photos discarded or misfiled. With precision at 99.6% those folders are clean; they just sometimes need a one-click merge, which a readable race number performs automatically.

Can these numbers be trusted, or are they cherry-picked?

They are measured on held-out identities, riders excluded from training, using the standard retrieval metrics (mAP, B³ precision/recall, Adjusted Rand Index). We publish every validation split, including the harder cross-championship and unseen-rider sets where scores are lower, rather than only the easy one. The benchmark covers 176 held-out riders across 6,505 photos.

Has it been tested across different championships and machines?

Yes. Validation spans clean single-event footage, an unseen set of brand-new riders, and a harder cross-championship split drawn from MotoGP / WSBK-class racing, with one model evaluated across all of them. RaceLabs runs in production on motorcycles, cars and karts and has sorted over six million photos to date.

Test it on your
hardest event.

Bring the weekend that breaks your current tool, the pack laps, the wet session, the practice with no numbers. That's where the gap shows.

Test RaceLabs See it work ↓

Measured on ridersit has never seen.