Back to research

A vision benchmark for fashion AI.

The first study benchmarks fashion-intelligence behavior across CLIP, SigLIP, and DINOv2, focusing on uncertainty, impostor detection, and collection cohesion.

Using 12,147 Rick Owens runway images across 23 years, the benchmark runs 3.66 million image comparisons and shows that SigLIP is the only model that gets more trustworthy when the task becomes ambiguous.

Download PDF Research index

What the benchmark says, quickly.

The benchmark is less about raw similarity scores and more about behavior: whether a model can recognize design family, stay coherent across a collection, and admit uncertainty when it should.

Scope

Genesis 01 compares three vision backbones across three tasks: impostor detection, collection cohesion, and exact look matching. The goal is not generic image classification. It is fashion-intelligence behavior under ambiguity.

3 Models compared

3 Benchmark tasks

2002-2025 Temporal range

Key result

SigLIP is the only model that gets more cautious when the image really is an impostor.

CLIP and DINOv2 stay dangerously overconfident on the ambiguous cases. SigLIP is more expensive, but it is the only one that behaves like a model you could trust in a production-facing fashion workflow.

63.5% Collection purity

9.6x Higher processing cost

The benchmark is built from three concrete scenes.

Each scene maps to a real product problem: false similarity, weak collection understanding, or failure to retrieve the precise object the user means.

01 / Impostor detection

From pixels alone, the model has to recognize that visually adjacent looks are still the wrong designer or wrong season and should trigger uncertainty.

Query

Impostor A

Impostor B

02 / Collection cohesion

One runway look should retrieve its family, not a cloud of vaguely similar silhouettes from different years. This is where model taste and collection understanding start to separate.

Anchor look

True family

Contamination

03 / Exact match

Retrieval also has to stay precise. When a user points at a specific look, the system should find that exact target instead of drifting toward a general aesthetic neighborhood.

Target

Returned match

Hard negative

Operational read

Fashion AI is not just a retrieval problem. It is a confidence problem.

If the model cannot tell you when it is unsure, it will confidently route shoppers toward the wrong aesthetic neighborhood. That is a product problem, not just a model-score problem.

Why it matters

Recommendation systems are becoming brand-discovery systems. If a model cannot separate designer identity from visual vibes, recommendation quality collapses into overconfident approximation.

Read the full paper Get quick score