Creative intelligence.

Where AI models compete on real creative work. Bespoke battles and deep dive research, powered by the network for creative intelligence.

The Human Creativity BenchmarkThe first eval that scores AI models the way creative experts do.
Latest
Best Performing Models

Bradley-Terry leaderboard pooled across 9 image-generation studies and 240 completed tournaments. Higher Elo means stronger aggregate head-to-head performance across the studies in the pool.

BT Elo · 9 image studies
1128
  1. 1GPT Image 21128
  2. 2GPT Image 1.51033
  3. 3Seedream 5.0 Lite1014
  4. 4Gemini 3 Pro Image Preview1012
  5. 5Gemini 3.1 Flash Image Preview989
  6. 6Krea 2 Large975
  7. 7Seedream 4.5955
  8. 8FLUX.2946
Latest research16 studies

Every battle, model profile, and field note, sorted by newest first, tagged by domain.

  1. 05/27/2026ImageProfile
    Can Adobe Firefly edit like Photoshop?8 targeted edits across 4 designer sessions. Firefly cleanly resolved 1, drifted on 5, and missed 2.Read
    1 / 8Edits cleanly resolved
    5 / 8Edits landed partial
    2 / 8Edits unresolved
    Adobe Firefly Edit · 8 attempts across 4 sessions (Hero + Social per session)
  2. 05/22/2026Cross-cuttingProfile
    The only prompt that got videos to production-ready in Adobe Firefly.4 designers, 3 deliverables each. The prompts that landed described physical direction, not aesthetic mood.Read
    1 / 4Videos production-ready, first pass
    2 / 4Social stills production-ready, first pass
    20Photoshop mentions across 4 sessions
    Adobe Firefly designer evaluation · 4 designers, 3 deliverables each
  3. 05/21/2026Web DesignField Note
    In Claude Design, your opening prompt decides the ceiling.5 designers, 5 openings, 1 luxury brief. The first prompt set what each session could reach.Read
    Specificity score (0–1)
    0.00.20.40.60.81.00P1Confirmation0.40P2Asset swap0.45P3Framework0.55P4Full brief0.75P5Hybrid
    First-prompt specificity by participant, Claude-scored · 5 designers, 1 brief
  4. 05/20/2026Web DesignProfile
    Claude Design gets you 40%, Figma gets the rest.5 sessions, 5 designers, 1 real-world client brief. Strong as a starting structure, breaks under precision edits.Read
    60 → 100%Designers flagging layout & spacing, Edit 1 → Edits 4–5
    5 / 5Sessions where layout was the recurring failure mode
    ≤40%Designer verdict: use it to here, then hand off
    Claude Design designer evaluation · 5 designers, 1 real-world client brief
  5. 05/18/2026ImageField Note
    With ChatGPT Images 2.0, "Text is solved." Typography isn't.42 sessions, 7 designers. ChatGPT Images 2.0 nails the typographic system, then breaks on the individual characters.Read
    +3 / +3 / +1Macro themes (hierarchy, brand fit, fonts) net positive
    −1 / −3Micro themes (legibility, size & weight) net negative
    0Designer mentions of size & weight as a strength
    Typography sentiment · 42 sessions, 7 designers, 6 briefs
  6. 05/14/2026ImageProfile
    ChatGPT Images 2.0 won every head-to-head. Here's where it still breaks.41 sessions, 7 designers, 6 briefs. GPT Image 2 nails the concept, then breaks at production.Read
    5 / 41Sessions shipped from GPT alone
    33 → 59%Typography "no issues" climb
    60-65%Realism plateau, every iteration
    ChatGPT Images 2.0 production readiness
  7. 05/13/2026ImageBattle
    The image-model leaderboard flips by brief.Four frontier image models, six brand campaigns, ranked blind by working creatives. GPT Image 2 wins the aggregate. Every other model owns a category.Read
    1st-place rate
    GPT Image 2
    1st
    40.7%
    Seedream 5.0 Lite
    2nd
    22.3%
    FLUX.2 [pro]
    3rd
    22.2%
    Gemini 3.1 Flash
    4th
    14.8%
    Share of 1st-place rankings across 6 brand campaigns · 4 frontier image models
  8. 05/12/2026ImageBattle
    Krea 2 Large is the #2 style-transfer model, closing on GPT Image 2.Four-model style-transfer evaluation. Krea took #2 on style fidelity, 0.14 points behind GPT Image 2.Read
    Style Fidelity (avg)
    GPT Image 2
    1st
    3.53
    Krea 2 Large
    2nd
    3.39
    Gemini 3 Pro
    3rd
    2.74
    Seedream 5.0 Lite
    4th
    2.42
    Style Fidelity average rating · Krea 2 Large takes #2, 0.14 points behind GPT Image 2
  9. 05/06/2026ImageBattle
    Seedream 5.0 Lite swept the field on product detail shots.A blind head-to-head against the leading image models from Google, OpenAI, and Black Forest Labs, evaluated by professional creatives.Read
    Win rate
    Seedream 5.0 Lite
    1st
    63.9%
    Gemini 3 Pro
    2nd
    52.8%
    GPT Image 1.5
    3rd
    44.4%
    FLUX.2 [max]
    4th
    38.9%
    Pairwise win rate · 4 leading image models · March 2026
  10. 05/05/2026Cross-cuttingField Note
    Creatives keep telling us the same thing about AI: every output looks the same.12 models, 5 creative domains. One repeated complaint from working evaluators: the work all looks the same.Read
    LOW BEST-PRACTICEHIGH BEST-PRACTICEHIGH STEERABILITYLOW STEERABILITYUnreliableOpinionated engineCreative partnerFull-spectrum tool
    Convergence (best-practice) and divergence (steerability) as orthogonal signals.
  11. 04/28/2026VideoProfile
    Grok Imagine is the "Polisher" model. Hand off the early rounds, bring it in for refinement.The biggest phase-over-phase climb of any video model in the study. 3rd at ideation, 1st at refinement.Read
    Win rate
    Ideation
    3rd place
    46%
    Mockup
    44%
    Refinement
    1st place
    56%
    Grok Imagine win rate · +10pp climb from ideation to refinement
  12. 04/23/2026Cross-cuttingField Note
    The creative process has 3 phases. AI performs very differently in each.Ideation, mockup, refinement. AI fits differently at each phase, and the best creatives know where to hand off.Read
    Ideation
    Loose grip
    Mockup
    Narrowed
    Refinement
    Firm grip
    How tightly creatives hold control across phases · Qualitative
  13. 04/22/2026VideoProfile
    Veo 3.1 is the "Creative Director" model. Use it early, but hand off before refinement.61% win rate at ideation. 39% at refinement. The clearest model profile in our video evaluation.Read
    Win rate
    Ideation
    Peak
    61.1%
    Mockup
    55.6%
    Refinement
    Last place
    38.9%
    Veo 3.1 win rate · −22.2pp drop from ideation to refinement
  14. 04/21/2026Cross-cuttingField Note
    Solo creatives are earning more with AI and staying independent.Higher earning potential, more projects, no new hires. The survey from working independents.Read
    66%
    of independent creatives report higher earning potential since adopting AI
    26% no · 8% other
    Survey · Independent creatives on Contra
  15. 04/14/2026Web DesignBattle
    We tested 4 AI models with professional web designers. Claude won, but not the way you'd expect.Claude Opus 4.6, Gemini 3.1 Pro, ChatGPT 5.3 Codex, Qwen 3.5. The winner shifted at every phase.Read
    Leading win rate
    Ideation
    Claude leads
    79.8%
    Mockup
    Gemini takes over
    68.9%
    Refinement
    Claude narrows gap
    60%
    Per-phase leader shifts · Preview of the Human Creativity Benchmark
  16. 04/08/2026Cross-cuttingField Note
    AI isn't replacing creative professionals. It's making the best ones better.Survey of high-earning independent creatives. What they actually do with AI on real client work.Read
    <25%
    of AI output makes it to final deliverables
    The rest is stripped, reworked, or scrapped
    Dominant survey response · Independent creatives

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings