Benchmark·April 22, 2026·6 min read

Veo 3.1 is the "Creative Director" model. Use it early, but hand off before refinement.

61% win rate at ideation. 39% at refinement. The clearest model profile in our video evaluation.

  1. 01Dominates ideation: 61% win rate. Highest scores of any model in any phase.
  2. 02Collapses in refinement: 39% win rate, last place. Below the 3.0 neutral midpoint.
  3. 03Only model in the study that degraded monotonically across every metric.
  4. 04It keeps generating instead of iterating. Use it early, hand off before refinement.
Contra Labs
Contra Labs
Research

Contra Labs ran Google Veo 3.1 through every phase of ad video production: ideation, mockup, refinement. The data produced the clearest model profile we've recorded.

Peak at ideation

At ideation, Google DeepMind's Veo 3.1 posted scalars of 3.81 / 3.81 / 3.78 across quality, creativity, and brand fit. Those are the highest scores of any model across any phase in our entire experiment. Win rate: 61%. Given an open brief and maximum creative latitude, it generated with range, tonal confidence, and visual variety. At a stage where breadth matters more than precision, it delivered exactly what that stage demands.

One flag even at its peak: Motion & Blur criticism showed up in roughly 27% of evaluations. Even the best phase for Veo carried a persistent motion quality weakness. Worth noting for video-heavy briefs.

Veo 3.1 monotonic performance decline across creative phases.
Share

Cliff at refinement

By refinement, those scalars had dropped below 3.0, the neutral midpoint where outputs tip from useful to net-negative. Win rate: 39%. Last place. Evaluators were calling the outputs counterproductive.

Veo 3.1 theme-level strengths vs weaknesses across phases.
Share

What makes this unusual is the shape of the decline. Veo 3.1 is the only model in the experiment that degraded monotonically across every single metric as phases progressed. Grok Imagine, Kling AI, and ByteDance's Seedance 2.0 each improved in at least some dimensions by refinement. Veo got worse across all of them.

Why: it keeps generating

The qualitative data points to the mechanism. Veo keeps generating. When asked to iterate on an existing cut, it introduces new visual artifacts and unwanted transitions rather than preserving what already worked. It reads an iteration prompt as a generation prompt. The more you ask it to tighten, the more it creates.

The properties that make a model strong at open-ended generation: novelty, creative latitude, output diversity. Those same properties work against constrained iteration, where the job is to hold a direction stable and sand down details toward production readiness.

The practical implication

Veo 3.1 performs like a creative director. You want it at the start of a project, generating raw concepts and setting visual direction. By mockup and refinement, hand off to a model built for iteration.

Knowing which model to use at each phase is now a real creative skill.

Continue reading3 studies
All research
  1. May 13, 2026Methodology
    Methodology: human-panel evaluation of generative models at Contra Labs.The standard playbook behind every Contra Labs battle, profile, and field note: blinded panels of practicing creatives, forced-choice rankings paired with scalar ratings and rationale, and a reliability battery that travels with every number we publish.Read
  2. May 29, 2026Benchmark
    Cursor took 60% of head-to-heads. Claude Code took 63% of client meetings.Four coding tools, 24 outputs, five working designers. The tool designers preferred to look at and the tool they'd put their name on turned out to be different.Read
  3. June 3, 2026Benchmark
    Ideogram v4 won 47.9% of typography matchups.10 designers, 4 models, 240 images. Spelling is solved. Typographic craft and client-readiness are where Ideogram v4 pulls away.Read

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings