Benchmark·April 28, 2026·5 min read

Grok Imagine is the "Polisher" model. Hand off the early rounds, bring it in for refinement.

The biggest phase-over-phase climb of any video model in the study. 3rd at ideation, 1st at refinement.

  1. 013rd at ideation. 1st at refinement (56% win rate).
  2. 02Biggest phase-over-phase climb in the study.
  3. 03Refinement themes flip positive: Motion +16, Usability +23, Realism +20, Prompt Adherence +11.
  4. 04High variance: 42% firsts, 25% fourths at refinement.
  5. 05Mirror image of Veo 3.1. Hand off early, bring it in for polish.
Contra Labs
Contra Labs
Research

Contra Labs ran xAI's Grok Imagine through every phase of ad video production: ideation, mockup, refinement. It produced the most dramatic phase-over-phase improvement of any video model in the study.

Grok Imagine climbs from 3rd at ideation to 1st at refinement: 46% → 44% → 56% win rate.
Share

Surge at refinement

By refinement, every major theme flipped positive. Motion Quality (+16), Usability (+23), Realism (+20), Prompt Adherence (+11). Win rate: 56%, ahead of every other video model we tested. Evaluators kept reaching for the same four words: organic, smooth, natural, controlled.

Every major refinement-phase theme flips positive for Grok Imagine: motion, usability, realism, prompt adherence.
Share
The most usable and quality generation, free of visual oddities and unrealistic elements.
The camera movement showcases the product well. The movements of the clouds in the sky and the breeze are well-matched.
The neon edge illumination is beautifully lit, the camera movement pans in slowly. The close-up shot is golden. A nice ad.

Where it earns the win

Refinement is the phase where creatives stop generating and start polishing. Smooth camera moves, natural product framing, controlled lighting transitions. Grok handles all three with consistency. Motion quality (30 vs 18 across competitors), usability (24 vs 18), and lighting (17 vs ~3) all sit clearly in the green.

Grok Imagine wins refinement head-to-head against every other video model in the evaluation.
Share

Where the next gains pay off

Two signals from the data point to where additional work compounds.

Ideation is rougher. Net Realism (−15) and Scene Coherence (−8) take hits from technical errors: object duplication, hands appearing out of frame, physics that doesn't quite hold. Grok seems to need material to iterate on before it shines.

There are two hands appearing in the shot instead of one, and they are interacting with themselves in a very unnatural way.
The portafilter is duplicated upon lock-in. Coffee pours in the device's sink.

Refinement variance is high. 42% first-place finishes alongside 25% fourth-place. When Grok hits, it leads the field. When it misses, it misses by a lot. Smoothing that variance is where the gains compound.

The practical implication

Bring Grok Imagine in once you have a draft worth iterating on. It takes you from "almost there" to "client-ready" better than anything else we tested.

Knowing which model to use at each phase is now a real creative skill.

Continue reading3 studies
All research
  1. May 5, 2026Research
    Creatives keep telling us the same thing about AI: every output looks the same.12 models, 5 creative domains. One repeated complaint from working evaluators: the work all looks the same.Read
  2. May 13, 2026Benchmark
    The image-model leaderboard flips by brief.Four frontier image models, six brand campaigns, ranked blind by working creatives. GPT Image 2 wins the aggregate. Every other model owns a category.Read
  3. May 14, 2026Benchmark
    ChatGPT Images 2.0 won every head-to-head. Here's where it still breaks.41 sessions, 7 designers, 6 briefs. GPT Image 2 nails the concept, then breaks at production.Read

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings