Creative intelligence.
Where AI models compete on real creative work. Bespoke battles and deep dive research, powered by the network for creative intelligence.
Bradley-Terry leaderboard pooled across 9 image-generation studies and 240 completed tournaments. Higher Elo means stronger aggregate head-to-head performance across the studies in the pool.
- 1GPT Image 21128
- 2GPT Image 1.51033
- 3Seedream 5.0 Lite1014
- 4Gemini 3 Pro Image Preview1012
- 5Gemini 3.1 Flash Image Preview989
- 6Krea 2 Large975
- 7Seedream 4.5955
- 8FLUX.2946
Every battle, model profile, and field note, sorted by newest first, tagged by domain.
05/27/2026ImageProfileCan Adobe Firefly edit like Photoshop?8 targeted edits across 4 designer sessions. Firefly cleanly resolved 1, drifted on 5, and missed 2.Read1 / 8Edits cleanly resolved5 / 8Edits landed partial2 / 8Edits unresolvedAdobe Firefly Edit · 8 attempts across 4 sessions (Hero + Social per session)
05/22/2026Cross-cuttingProfileThe only prompt that got videos to production-ready in Adobe Firefly.4 designers, 3 deliverables each. The prompts that landed described physical direction, not aesthetic mood.Read1 / 4Videos production-ready, first pass2 / 4Social stills production-ready, first pass20Photoshop mentions across 4 sessionsAdobe Firefly designer evaluation · 4 designers, 3 deliverables each
05/21/2026Web DesignField NoteIn Claude Design, your opening prompt decides the ceiling.5 designers, 5 openings, 1 luxury brief. The first prompt set what each session could reach.ReadSpecificity score (0–1)First-prompt specificity by participant, Claude-scored · 5 designers, 1 brief
05/20/2026Web DesignProfileClaude Design gets you 40%, Figma gets the rest.5 sessions, 5 designers, 1 real-world client brief. Strong as a starting structure, breaks under precision edits.Read60 → 100%Designers flagging layout & spacing, Edit 1 → Edits 4–55 / 5Sessions where layout was the recurring failure mode≤40%Designer verdict: use it to here, then hand offClaude Design designer evaluation · 5 designers, 1 real-world client brief
05/18/2026ImageField NoteWith ChatGPT Images 2.0, "Text is solved." Typography isn't.42 sessions, 7 designers. ChatGPT Images 2.0 nails the typographic system, then breaks on the individual characters.Read+3 / +3 / +1Macro themes (hierarchy, brand fit, fonts) net positive−1 / −3Micro themes (legibility, size & weight) net negative0Designer mentions of size & weight as a strengthTypography sentiment · 42 sessions, 7 designers, 6 briefs
05/14/2026ImageProfileChatGPT Images 2.0 won every head-to-head. Here's where it still breaks.41 sessions, 7 designers, 6 briefs. GPT Image 2 nails the concept, then breaks at production.Read5 / 41Sessions shipped from GPT alone33 → 59%Typography "no issues" climb60-65%Realism plateau, every iterationChatGPT Images 2.0 production readiness
05/13/2026ImageBattleThe image-model leaderboard flips by brief.Four frontier image models, six brand campaigns, ranked blind by working creatives. GPT Image 2 wins the aggregate. Every other model owns a category.Read1st-place rateGPT Image 21st40.7%Seedream 5.0 Lite2nd22.3%FLUX.2 [pro]3rd22.2%Gemini 3.1 Flash4th14.8%Share of 1st-place rankings across 6 brand campaigns · 4 frontier image models
05/12/2026ImageBattleKrea 2 Large is the #2 style-transfer model, closing on GPT Image 2.Four-model style-transfer evaluation. Krea took #2 on style fidelity, 0.14 points behind GPT Image 2.ReadStyle Fidelity (avg)GPT Image 21st3.53Krea 2 Large2nd3.39Gemini 3 Pro3rd2.74Seedream 5.0 Lite4th2.42Style Fidelity average rating · Krea 2 Large takes #2, 0.14 points behind GPT Image 2
05/06/2026ImageBattleSeedream 5.0 Lite swept the field on product detail shots.A blind head-to-head against the leading image models from Google, OpenAI, and Black Forest Labs, evaluated by professional creatives.ReadWin rateSeedream 5.0 Lite1st63.9%Gemini 3 Pro2nd52.8%GPT Image 1.53rd44.4%FLUX.2 [max]4th38.9%Pairwise win rate · 4 leading image models · March 2026
05/05/2026Cross-cuttingField NoteCreatives keep telling us the same thing about AI: every output looks the same.12 models, 5 creative domains. One repeated complaint from working evaluators: the work all looks the same.ReadConvergence (best-practice) and divergence (steerability) as orthogonal signals.
04/28/2026VideoProfileGrok Imagine is the "Polisher" model. Hand off the early rounds, bring it in for refinement.The biggest phase-over-phase climb of any video model in the study. 3rd at ideation, 1st at refinement.ReadWin rateIdeation3rd place46%Mockup44%Refinement1st place56%Grok Imagine win rate · +10pp climb from ideation to refinement
04/23/2026Cross-cuttingField NoteThe creative process has 3 phases. AI performs very differently in each.Ideation, mockup, refinement. AI fits differently at each phase, and the best creatives know where to hand off.ReadIdeationLoose gripMockupNarrowedRefinementFirm gripHow tightly creatives hold control across phases · Qualitative
04/22/2026VideoProfileVeo 3.1 is the "Creative Director" model. Use it early, but hand off before refinement.61% win rate at ideation. 39% at refinement. The clearest model profile in our video evaluation.ReadWin rateIdeationPeak61.1%Mockup55.6%RefinementLast place38.9%Veo 3.1 win rate · −22.2pp drop from ideation to refinement
04/21/2026Cross-cuttingField NoteSolo creatives are earning more with AI and staying independent.Higher earning potential, more projects, no new hires. The survey from working independents.Readof independent creatives report higher earning potential since adopting AI26% no · 8% otherSurvey · Independent creatives on Contra
04/14/2026Web DesignBattleWe tested 4 AI models with professional web designers. Claude won, but not the way you'd expect.Claude Opus 4.6, Gemini 3.1 Pro, ChatGPT 5.3 Codex, Qwen 3.5. The winner shifted at every phase.ReadLeading win rateIdeationClaude leads79.8%MockupGemini takes over68.9%RefinementClaude narrows gap60%Per-phase leader shifts · Preview of the Human Creativity Benchmark
04/08/2026Cross-cuttingField NoteAI isn't replacing creative professionals. It's making the best ones better.Survey of high-earning independent creatives. What they actually do with AI on real client work.Readof AI output makes it to final deliverablesThe rest is stripped, reworked, or scrappedDominant survey response · Independent creatives
Connecting with the missing signal: taste
Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.
Designers
Writers
Marketers
Engineers
Social Media Experts
Video Editors & Animators
Music & Audio Engineers
1.5M+
creative experts
400+
Skills and tools represented
$250M+
verified expert earnings
Connecting with the missing signal: taste
Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.
Designers
Writers
Marketers
Engineers
Social Media Experts
Video Editors & Animators
Music & Audio Engineers
1.5M+
creative experts
400+
Skills and tools represented
$250M+
verified expert earnings
Connecting with the missing signal: taste
Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.
Designers
Writers
Marketers
Engineers
Social Media Experts
Video Editors & Animators
Music & Audio Engineers
1.5M+
creative experts
400+
Skills and tools represented
$250M+
verified expert earnings