CREATIVE ARENA

Methodology

In the Creative Arena, AI models are evaluated on real professional use cases, driven by actual client deliverables from Contra's marketplace and voted on by our global network of 1.5M+ creative professionals.

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Overview

Real work.
Real professionals.

The Creative Arena by Contra Labs compares AI models on tasks that mirror real professional use cases via paid projects commissioned on Contra. We convert anonymized deliverables into prompts, run controlled tournaments with four models at a time, and update overall and per-category Elo ratings after every battle.

Unlike synthetic benchmarks, every prompt originates from an actual client project. And unlike crowd-sourced preference tests, every vote comes from a verified creative professional: designers, developers, and video editors who do this work for a living.

Top skills

Graphic Designer14.84%
Web Developer10.55%
Web Designer10.1%
UI Designer9.04%
Brand Designer8.46%
Video Editor8.29%

Top tools

Adobe Suite48.22%
Figma27.03%
Canva18.25%
WordPress11.75%
React9.3%
JavaScript8.22%

Categories

Professional use cases

We evaluate models across the categories that matter to working creatives. These are the actual deliverables clients commission on Contra, organized by modality.

Image

Ad Design Brand Assets Logo

Code

</> Landing Page</> Desktop App</> UI Component

Video

Ad Designs Product Shots

Evaluation Depth

Three phases of
creative work

Within each category, we evaluate models across the phases of the creative process, from first spark to final polish.

Phase 01

Ideation

Generating the initial creative concept from a prompt. Models produce directional output that captures tone, mood, and creative intent.

Goal: Direction, not precision

Phase 02

Mockup

Translating that concept into a structured, composed layout. Models must execute against a clear creative brief with proper hierarchy and composition.

Goal: Execution against a brief

Phase 03

Refinement

Fine-tuning a near-final output with precise edits. Models must demonstrate control, consistency, and attention to production-level detail.

Goal: Polish & production readiness

Data Sourcing & Prompt Generation

From real projects
to controlled prompts
01

Collect Deliverables

We sample deliverables from real, completed paid projects commissioned on Contra's marketplace.

02

Anonymize & Sanitize

We remove personally identifiable information, trademarks, and client-specific terms that would reveal identity or confidential details.

03

Category Classification

Deliverables are run through a classifier (LLM-assisted) to map to one of the Arena categories and creative phases.

04

Prompt Drafting

From the anonymized deliverable, we generate a prompt that captures the intent, constraints, and style of the original request while remaining generic and safe.

05

Generation

An output is generated for the given prompt for each active model: images, code, or video depending on category.

Tournament Format

4 models. 6 battles.
Full ranking.

Each tournament samples four models from the active pool, runs a fixed six-battle bracket, and yields a complete 1st–4th ordering per prompt.

Tournament Flow6 Battles

Initial

A vs B
C vs D

Middle

Winners
Losers
1-win cross

Final

2-win tie
1st
2nd
3rd
4th

Fairness & Bias Controls

Blind, balanced, audited

Left/Right Randomization

Each battle randomizes side assignment so no model benefits from position bias.

Blind Judging

No model names, vendors, prompts, or metadata are shown to judges. Only the outputs.

Prompt Hygiene

Prompts are anonymized, policy-compliant, and category-consistent before entering the system.

Balanced Exposure

Scheduler ensures broad coverage across models and pairings over time.

Audit Sampling

A subset of matches is reviewed by humans for quality control and consistency checks.

Ratings

Elo scoring

We maintain two Elo ratings per model: an overall Elo and a per-category Elo. All models start at 1500. After every battle, we apply a standard Elo update.

1500

Starting Elo Rating

K = 32

Update Factor Per Battle

Frequently asked
questions
What makes the Human Creativity Benchmark different?

Every prompt starts from an anonymized deliverable of a real, paid client project on Contra, not synthetic tasks or toy examples. And every vote comes from a verified creative professional from our network of 1.5M+ members across 150+ countries.

How are models selected for tournaments?

Four distinct models are sampled from the active pool. Side assignment (left/right) is randomized every battle to eliminate position bias.

Do ratings change after every battle?

Yes. Elo ratings update after each individual battle. We maintain both overall and per-category ratings so you can see how models perform on specific types of work.

What modalities do you support?

We currently evaluate across three modalities: Image (ad design, brand assets, logo), Code (landing page, desktop app, UI component), and Video (ad designs, product shots). Each is tested across three creative phases: ideation, mockup, and refinement.

Ready to see how models stack up?

Explore the Creative Arena leaderboard or request access to run your own evaluations.

Ready to see how models stack up?

Explore the Creative Arena leaderboard or request access to run your own evaluations.

Ready to see how models stack up?

Explore the Creative Arena leaderboard or request access to run your own evaluations.

Ready to see how models stack up?

Explore the Creative Arena leaderboard or request access to run your own evaluations.