Back to research

A vision benchmark for fashion AI.

The first study benchmarks fashion-intelligence behavior across CLIP, SigLIP, and DINOv2, focusing on uncertainty, impostor detection, and collection cohesion.

Using 12,147 Rick Owens runway images across 23 years, the benchmark runs 3.66 million image comparisons and shows that SigLIP is the only model that gets more trustworthy when the task becomes ambiguous.

What the benchmark says, quickly.

The benchmark is less about raw similarity scores and more about behavior: whether a model can recognize design family, stay coherent across a collection, and admit uncertainty when it should.

Scope

Genesis 01 compares three vision backbones across three tasks: impostor detection, collection cohesion, and exact look matching. The goal is not generic image classification. It is fashion-intelligence behavior under ambiguity.

3 Models compared
3 Benchmark tasks
2002-2025 Temporal range
Key result
SigLIP is the only model that gets more cautious when the image really is an impostor.

CLIP and DINOv2 stay dangerously overconfident on the ambiguous cases. SigLIP is more expensive, but it is the only one that behaves like a model you could trust in a production-facing fashion workflow.

63.5% Collection purity
9.6x Higher processing cost

The benchmark is built from three concrete scenes.

Each scene maps to a real product problem: false similarity, weak collection understanding, or failure to retrieve the precise object the user means.

01 / Impostor detection

From pixels alone, the model has to recognize that visually adjacent looks are still the wrong designer or wrong season and should trigger uncertainty.

Genesis 01 query image
Query
Genesis 01 impostor example one
Impostor A
Genesis 01 impostor example two
Impostor B
02 / Collection cohesion

One runway look should retrieve its family, not a cloud of vaguely similar silhouettes from different years. This is where model taste and collection understanding start to separate.

Genesis 01 family resemblance query
Anchor look
Genesis 01 family resemblance match one
True family
Genesis 01 family resemblance contamination example
Contamination
03 / Exact match

Retrieval also has to stay precise. When a user points at a specific look, the system should find that exact target instead of drifting toward a general aesthetic neighborhood.

Genesis 01 exact match target
Target
Genesis 01 found match
Returned match
Genesis 01 haystack comparison image
Hard negative
Operational read
Fashion AI is not just a retrieval problem. It is a confidence problem.

If the model cannot tell you when it is unsure, it will confidently route shoppers toward the wrong aesthetic neighborhood. That is a product problem, not just a model-score problem.

Why it matters

Recommendation systems are becoming brand-discovery systems. If a model cannot separate designer identity from visual vibes, recommendation quality collapses into overconfident approximation.