Benchmark·April 22, 2026·6 min read

Veo 3.1 is the "Creative Director" model. Use it early, but hand off before refinement.

61% win rate at ideation. 39% at refinement. The clearest model profile in our video evaluation.

  1. 01Dominates ideation: 61% win rate. Highest scores of any model in any phase.
  2. 02Collapses in refinement: 39% win rate, last place. Below the 3.0 neutral midpoint.
  3. 03Only model in the study that degraded monotonically across every metric.
  4. 04It keeps generating instead of iterating. Use it early, hand off before refinement.
Contra Labs
Contra Labs
Research

Contra Labs ran Google Veo 3.1 through every phase of ad video production: ideation, mockup, refinement. The data produced the clearest model profile we've recorded.

Peak at ideation

At ideation, Google DeepMind's Veo 3.1 posted scalars of 3.81 / 3.81 / 3.78 across quality, creativity, and brand fit. Those are the highest scores of any model across any phase in our entire experiment. Win rate: 61%. Given an open brief and maximum creative latitude, it generated with range, tonal confidence, and visual variety. At a stage where breadth matters more than precision, it delivered exactly what that stage demands.

One flag even at its peak: Motion & Blur criticism showed up in roughly 27% of evaluations. Even the best phase for Veo carried a persistent motion quality weakness. Worth noting for video-heavy briefs.

Veo 3.1 monotonic performance decline across creative phases.
Share

Cliff at refinement

By refinement, those scalars had dropped below 3.0, the neutral midpoint where outputs tip from useful to net-negative. Win rate: 39%. Last place. Evaluators were calling the outputs counterproductive.

Veo 3.1 theme-level strengths vs weaknesses across phases.
Share

What makes this unusual is the shape of the decline. Veo 3.1 is the only model in the experiment that degraded monotonically across every single metric as phases progressed. Grok Imagine, Kling AI, and ByteDance's Seedance 2.0 each improved in at least some dimensions by refinement. Veo got worse across all of them.

Why: it keeps generating

The qualitative data points to the mechanism. Veo keeps generating. When asked to iterate on an existing cut, it introduces new visual artifacts and unwanted transitions rather than preserving what already worked. It reads an iteration prompt as a generation prompt. The more you ask it to tighten, the more it creates.

The properties that make a model strong at open-ended generation: novelty, creative latitude, output diversity. Those same properties work against constrained iteration, where the job is to hold a direction stable and sand down details toward production readiness.

The practical implication

Veo 3.1 performs like a creative director. You want it at the start of a project, generating raw concepts and setting visual direction. By mockup and refinement, hand off to a model built for iteration.

Knowing which model to use at each phase is now a real creative skill.

Continue reading3 studies
All research
  1. May 5, 2026Research
    Creatives keep telling us the same thing about AI: every output looks the same.12 models, 5 creative domains. One repeated complaint from working evaluators: the work all looks the same.Read
  2. May 13, 2026Benchmark
    The image-model leaderboard flips by brief.Four frontier image models, six brand campaigns, ranked blind by working creatives. GPT Image 2 wins the aggregate. Every other model owns a category.Read
  3. May 14, 2026Benchmark
    ChatGPT Images 2.0 won every head-to-head. Here's where it still breaks.41 sessions, 7 designers, 6 briefs. GPT Image 2 nails the concept, then breaks at production.Read

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings