Benchmark·May 20, 2026·5 min read

Claude Design gets you 40%, Figma gets the rest.

5 sessions, 5 designers, 1 real-world client brief. Strong as a starting structure, breaks under precision edits.

  1. 01Strong as a fast first draft. Designers praised the system-level coherence, hover states, scroll animations, and logo and wordmark fixes.
  2. 02Layout and spacing was the only theme that worsened with iteration. 60% of designers flagged it at Edit 1, every active designer by Edits 4–5.
  3. 03Output regressed under precise edits. The tool ignored instructions or undid correct work, sending designers reaching for Figma for predictability.
  4. 04Color came back monotone, leaning on the wrong neutrals. Tone read too playful for the client brief.
  5. 05Verdict from the designer who almost walked: use it to roughly 40% completion, then take over manually.
Contra Labs
Contra Labs
Research

"RIP page builders." The hype around Claude Design runs hot. "It's over for designers, Figma is dead," Anthropic itself is more measured. It positions Claude Design as a tool for designer exploration and non-designer visual production, with "polished" as the quality bar. We wanted to test pushing past the stated bar. So we handed five designers a real-world client brief, a landing page designed from scratch, and judged it against a deliberately higher standard: client-ready.

Designers called the execution incredibly strong on most tasks: it read visual cues, design systems, and written briefs together to produce a coherent final system, and it was notably good at fixing logos and wordmarks. The small details drew praise too: interactions, hover effects, a scroll variant that makes the nav more visible as you move down the page. One designer said the final landing page generated from the system exceeded every one of their expectations. But that was the ceiling, not the floor: sessions kept hitting the same wall, the layout and the style.

Claude Design session, captured end-to-end.

Three things came up repeatedly. Claude's color choices felt off. The output came back monotone, leaning on the wrong neutrals. Two designers flagged layout and structure problems, naming clutter and broken padding. And the tone read too playful for the client brief.

Layout and spacing example: clutter and broken padding in a Claude Design output.
Share

The moment they almost left

Layout and spacing is the only theme that rises with iteration, flagged by 60% of designers at the first edit and by every active designer by edits four and five. Later rounds reflect fewer designers still iterating.
Share

Frustration with the wait was real, and several designers hit a point where they nearly walked. One said that for the first half hour it felt faster to open Framer than to keep explaining the design to an AI.

This layout-and-spacing failure, showing up in a live session, is structural. The breaking point was the regression in output quality. Designers fed the tool specific edits, only to watch it ignore them or undo work that was already correct. The tool created the structure, and then failed to keep it stable.

Generic output: Claude Design defaulting to a familiar visual treatment regardless of brief.
Share

The designers reaching for Figma and Framer said the opposite of what you'd guess: they reached for them because they're predictable. A deliberate change produces exactly the intended result, where another prompt round was a coin flip that could make things worse.

The real cost was the wait with no guarantee the output moved forward.

The designers who didn't abandon it did one thing differently: they stopped treating it as a Figma substitute. Held to "generate a fast starting structure," it delivered. Pushed toward "make precise edits to a near-final layout," it broke.

What they'd change

When asked what they'd fix, designers described rebuilds. The change requests were structural: redo the layout, rethink the concept, reposition text throughout, fix a type hierarchy where font weight made copy unreadable, replace placeholder lettering with real elements. That's the work of getting a draft to something a client would accept.

What makes that damning rather than expected is the second half: every designer believed the tool should have handled exactly these things. Layout, color, text positioning, structure: not aspirational asks, but the baseline they walked in assuming an AI design tool would cover. The gap is between what the tool did and what designers considered table stakes.

Designer expectations against delivered output, theme by theme.
Share

The tool had capability to spare. Designers praised the hover states, scroll animation on the nav, and small text behaviors: work nobody had asked for and nobody needed at this stage. That's the failure restated. A tool that polishes interaction details while the layout underneath won't hold is a misdirected one. The tool aimed at the surface edits while the structural fundamentals it was actually asked for went missing.

The verdict

The honest takeaway came from the designer who almost walked: use it to get to 40% quickly, then take over manually. Don't try to push it to 80% through prompting. That's where it breaks down, and where the time loss starts.

Claude Design is a fast way to a starting structure: a strong first draft of the system. It is not yet a precision instrument. Generate the skeleton in Claude. Set the details in Figma.

Methodology

5 sessions across 5 designers and one real-world client brief: a landing page supplied with a written brief, inspirational imagery, reference websites, and candidate color palettes. Each designer ran one session and iterated through multiple rounds of edits, prompting Claude Design as they would in a normal workflow. After each session, designers reported on tone and style fit, layout and structure, what they would change, and whether the output met their expectations. Open-ended responses were coded into the five themes shown in the edit-round chart and throughout this article using OpenAI's gpt-4o model.

  1. Layout & spacing (structure, padding, alignment, composition)
  2. Visual & brand (color palette, neutrals, logo and wordmark, overall brand feel)
  3. Typography (typeface choice, hierarchy, weight, readability)
  4. Components & interactivity (hover states, scroll and nav behaviors, component-level treatments)
  5. Copy & content (placeholder versus real text, content fit)

Each comment was also coded for sentiment (positive, negative, neutral). We then read the themes against the edit rounds to see which issues cleared with iteration and which held.

Limitations

Feedback was self-reported from the same designers who produced the work, carrying the usual self-assessment bias. The sample is small, and we treat these findings as directional, not calibrated. Only one brief was tested, so brief-specific effects can't be cleanly separated from the tool's general behavior. The patterns, however, hold across designers and across edit rounds in the same direction: a strong system-level start, weak precision execution under iteration.

Continue reading3 studies
All research
  1. May 5, 2026Research
    Creatives keep telling us the same thing about AI: every output looks the same.12 models, 5 creative domains. One repeated complaint from working evaluators: the work all looks the same.Read
  2. May 13, 2026Benchmark
    The image-model leaderboard flips by brief.Four frontier image models, six brand campaigns, ranked blind by working creatives. GPT Image 2 wins the aggregate. Every other model owns a category.Read
  3. May 14, 2026Benchmark
    ChatGPT Images 2.0 won every head-to-head. Here's where it still breaks.41 sessions, 7 designers, 6 briefs. GPT Image 2 nails the concept, then breaks at production.Read

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings