Research·May 18, 2026·5 min read

With ChatGPT Images 2.0, "Text is solved." Typography isn't.

42 sessions, 7 designers. ChatGPT Images 2.0 nails the typographic system, then breaks on the individual characters.

01ChatGPT Images 2.0 really did get better at rendering text. Typography across 42 sessions and 7 designers is a separate story.
02Every macro theme came in net positive: hierarchy and composition (+3), brand fit and character (+3), font choice and style (+1).
03Every micro theme came in net negative: legibility and readability (−1), size and weight (−3). Size and weight had zero positives across the whole study.
04Editing clears the one problem you can ask for. Lacking contrast falls from 3 mentions at Edit 1 to zero by Edit 3. Broken text formatting holds at 3 through every edit.
05When typography isn't the point of the brief the model is fine. When typography is the deliverable, it breaks.

Contra Labs

Research

"Flawless typography in multiple languages." "Text rendering on another level." "Near perfect." The loudest reaction to ChatGPT Images 2.0 is about one thing. Text is solved. It's the feature everyone screenshotted.

On the narrow claim, the hype mostly holds up. Can the model render legible, correctly spelled letterforms, even on a curved surface, even in Japanese? Mostly, yes. That problem really did get better.

But rendering text and doing typography are two different questions. Rendering just means drawing the letters correctly. Typography is the harder problem: picking the right fonts at the right size and weight, held in a hierarchy that survives a real brief across multiple edits. The 42 sessions with 7 designers surfaced a clear pattern. The model is better than anything before it, but still not good at the typography that ships, and the gap is specific enough to predict.

Text rendering really did get better with ChatGPT Images 2.0. Typography is a separate problem.

The system is right, the characters are wrong

ChatGPT Images 2.0 gets the typographic system right and the characters wrong. Every macro theme landed positive, but every micro theme landed negative.

In the deep dive, the designers consistently iterated on the typography. They fixed some of it in GPT and handed the rest to Figma.

Typography themes by sentiment. Every macro theme came in net positive. Every micro theme came in net negative.

GPT is good at the typographic system and bad at typographic execution. Sort the feedback by theme and the divide is clean. Hierarchy and composition (net +3), brand fit and character (net +3), font choice and style (net +1). All decisions that shape the whole image come in positive. Legibility and readability (net −1), size and weight (net −3), the properties the model has to hold character by character, come in negative.

Every macro theme landed positive. Every micro theme landed negative. On this sample, the split is clean with no single theme crossing the line.

GPT picks on-brand fonts, lays out a hierarchy that reads, and holds composition together across a dense layout. Then it stumbles on the individual characters. A designer on the Sonic Collective brief said:

The text below the SONIC is hard to read, especially "EXPERIMENTAL MUSIC • PERFORMANCE • INSTALLATION • COMMUNITY" at the bottom. Same with the schedule typography — the text looks very generic and not legible.

Iteration fixes one thing and leaves the rest

Iteration fixes contrast, the one thing you can ask for. Every structural problem stays where it started.

Typography issues by edit phase. Lacking contrast clears by Edit 3. Broken text formatting holds at 3 the whole way.

The edit data shows the same split holding over time. Editing reliably clears one problem. Lacking contrast, the issue where text sits too faint against its background to read cleanly, falls from 3 mentions at Edit 1 to zero by Edit 3. On the Ridge campaign you can watch it happen. The headline starts washed out, the "U" half-erased and hard to read. A designer asks for it to be sharper and match the visual style, and the next round delivers it. This is the one problem designers can fix by asking.

Ridge headline across edits. The washed-out "U" sharpens by Edit 3 when the designer asks for it.

The structural problems don't move. Broken text formatting holds at 3 mentions through every edit, three rounds for zero improvement. On the same campaign, a designer flagged that the "D" in RIDGE renders as an "O". That is what broken text formatting means, and three rounds of editing never fixed it.

Ridge: the "D" in RIDGE renders as an "O" through three rounds of editing.

Incorrect copy stays at 1 the whole way. And on Sonic Collective, the problem grew. Illegible font showed up at Edit 2, flagged by two designers independently, on a problem the first generation didn't have.

Sonic Collective: illegible font emerges at Edit 2, on a problem the first generation didn't have.

Legibility itself splits along the same line. Contrast is the macro side of it, with one decision per image, and the model fixes it on request, which is why it clears by Edit 3. Size and weight are the micro side, where individual characters matter, and they don't move. Size and weight has zero positives across the whole study. Three designers flagged it as a weakness, none as a strength. The model can make the text stand out but it can't make it hold up.

The climb from 33% to 59% isn't GPT getting better at typography. It's designers fixing the one thing they can, while the structural issues stay put.

Decoration versus deliverable

When type is decoration the model handles it. When type is the deliverable, it breaks.

The pattern also tracks the briefs themselves. When typography isn't the point, the model is fine. When typography is the point, the model breaks.

Typography performance by brief. ASAGI and Studio Citrine net positive. Ridge and Sonic Collective net zero or negative.

ASAGI and Studio Citrine are net positive, with zero negative mentions between them. Neither brief was built around typography. ASAGI tested subtle stylistic constraints. Studio Citrine tested realism and texture. Two briefs, Plunge Artesian Pops and Kinetix, drew no typography feedback at all. When the brief doesn't lean on type, designers don't raise it.

That flips when typography is the job.

Sonic Collective was designed to stress typography, iconography, and dense layouts. It nets zero with three positives canceled by three negatives. Ridge, built around dense layouts and small text, came out negative with two positives against three.

The verdict

The decoration-versus-deliverable line is the whole handoff. GPT delivers the typographic system with good visual hierarchy, brand fit, and font choice. That's the macro layer, and it lands early. Figma is for the execution: font size, weight, legibility, down to the individual characters. That's the micro layer, where iteration in GPT runs out of room. Use GPT to design the type system. Take it to Figma to set the type.

Methodology

42 sessions across 7 designers and 6 fictional brand briefs, each run through ChatGPT Images 2.0 with three rounds of edits and recorded end-to-end with Rollout. For the typography analysis, we filtered to mentions tagged as typography-related in the original 16-category prompt tagging, plus typography ratings from the per-edit quality reviews. Each typography mention was coded for sentiment (positive, negative, neutral) and theme.

Themes were derived from individual designer comments: hierarchy and composition, brand fit and character, font choice and style, legibility and readability, size and weight, headline and title treatment, hierarchy of info text, effects and decoration, and a residual "other / general typography" bucket. We then grouped themes into two layers. Macro covers themes that describe the typographic system: hierarchy, composition, brand fit, font choice. Micro covers themes that describe execution: legibility, size, weight. The grouping was applied after coding, not before, and the split fell out cleanly. Every macro theme came in net positive, every micro theme came in net negative.

For the edit-by-edit issue breakdown, we used the typography error categories from the deep dive's per-edit reviews: Incorrect or Incoherent Copy, Broken Text Formatting, Incorrect or Illegible Font, Lacking Contrast or Not Readable, and a residual "Other" category for one-off issues that didn't recur across iterations.

Limitations

Typography sentiment was coded from the same designer transcripts and self-ratings used in the deep dive, with the same self-assessment bias. The sample is small, and we treat these as directional, not calibrated. The patterns, however, hold across edits, themes, and briefs in the same direction.

How we ran this study → Methodology

Continue reading3 studies

All research

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Creative Human Data

Human Creativity Benchmark

Creative Arena

Jobs

Creative Human Data

Creative Arena

Jobs

Creative Human Data

Human Creativity Benchmark

Creative Arena

Jobs

The world's leading independent

human data & creative evaluation lab

Get In Touch

Creative Human Data

Human Creativity Benchmark

Creative Arena

Jobs

The world's leading independent

human data & creative evaluation lab

Get In Touch

Creative Human Data

Human Creativity Benchmark

Creative Arena

Jobs

The world's leading independent

human data & creative evaluation lab

Get In Touch

Creative Human Data

Creative Arena

Jobs

The world's leading independent

human data & creative evaluation lab

Get In Touch

Creative Human Data

Creative Arena

Jobs