Benchmark·April 28, 2026·5 min read

Grok Imagine is the "Polisher" model. Hand off the early rounds, bring it in for refinement.

The biggest phase-over-phase climb of any video model in the study. 3rd at ideation, 1st at refinement.

  1. 013rd at ideation. 1st at refinement (56% win rate).
  2. 02Biggest phase-over-phase climb in the study.
  3. 03Refinement themes flip positive: Motion +16, Usability +23, Realism +20, Prompt Adherence +11.
  4. 04High variance: 42% firsts, 25% fourths at refinement.
  5. 05Mirror image of Veo 3.1. Hand off early, bring it in for polish.
Contra Labs
Contra Labs
Research

Contra Labs ran xAI's Grok Imagine through every phase of ad video production: ideation, mockup, refinement. It produced the most dramatic phase-over-phase improvement of any video model in the study.

Grok Imagine climbs from 3rd at ideation to 1st at refinement: 46% → 44% → 56% win rate.
Share

Surge at refinement

By refinement, every major theme flipped positive. Motion Quality (+16), Usability (+23), Realism (+20), Prompt Adherence (+11). Win rate: 56%, ahead of every other video model we tested. Evaluators kept reaching for the same four words: organic, smooth, natural, controlled.

Every major refinement-phase theme flips positive for Grok Imagine: motion, usability, realism, prompt adherence.
Share
The most usable and quality generation, free of visual oddities and unrealistic elements.
The camera movement showcases the product well. The movements of the clouds in the sky and the breeze are well-matched.
The neon edge illumination is beautifully lit, the camera movement pans in slowly. The close-up shot is golden. A nice ad.

Where it earns the win

Refinement is the phase where creatives stop generating and start polishing. Smooth camera moves, natural product framing, controlled lighting transitions. Grok handles all three with consistency. Motion quality (30 vs 18 across competitors), usability (24 vs 18), and lighting (17 vs ~3) all sit clearly in the green.

Grok Imagine wins refinement head-to-head against every other video model in the evaluation.
Share

Where the next gains pay off

Two signals from the data point to where additional work compounds.

Ideation is rougher. Net Realism (−15) and Scene Coherence (−8) take hits from technical errors: object duplication, hands appearing out of frame, physics that doesn't quite hold. Grok seems to need material to iterate on before it shines.

There are two hands appearing in the shot instead of one, and they are interacting with themselves in a very unnatural way.
The portafilter is duplicated upon lock-in. Coffee pours in the device's sink.

Refinement variance is high. 42% first-place finishes alongside 25% fourth-place. When Grok hits, it leads the field. When it misses, it misses by a lot. Smoothing that variance is where the gains compound.

The practical implication

Bring Grok Imagine in once you have a draft worth iterating on. It takes you from "almost there" to "client-ready" better than anything else we tested.

Knowing which model to use at each phase is now a real creative skill.

Continue reading3 studies
All research
  1. May 13, 2026Methodology
    Methodology: human-panel evaluation of generative models at Contra Labs.The standard playbook behind every Contra Labs battle, profile, and field note: blinded panels of practicing creatives, forced-choice rankings paired with scalar ratings and rationale, and a reliability battery that travels with every number we publish.Read
  2. May 29, 2026Benchmark
    Cursor took 60% of head-to-heads. Claude Code took 63% of client meetings.Four coding tools, 24 outputs, five working designers. The tool designers preferred to look at and the tool they'd put their name on turned out to be different.Read
  3. June 3, 2026Benchmark
    Ideogram v4 won 47.9% of typography matchups.10 designers, 4 models, 240 images. Spelling is solved. Typographic craft and client-readiness are where Ideogram v4 pulls away.Read

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings