Benchmark·May 12, 2026·4 min read

Krea 2 Large is the #2 style-transfer model, closing on GPT Image 2.

Four-model style-transfer evaluation. Krea took #2 on style fidelity, 0.14 points behind GPT Image 2.

  1. 01#2 on Style Fidelity at 3.39. GPT Image 2 leads at 3.53.
  2. 0223 percent of Krea's outputs rated 5 on Style Fidelity. GPT Image 2: 22 percent.
  3. 0326 percent landed in the bottom two ratings. GPT Image 2: 20 percent. The floor is the problem.
  4. 044th on H2H win rate at 39.5 percent. Tournaments reward ceiling.
  5. 05Only non-GPT model above the 3.0 useful threshold on Style Fidelity.
Contra Labs
Contra Labs
Research

Krea 2 Large is the #2 style-transfer model. The gap to GPT Image 2 is smaller than the rest of the data would suggest.

Contra Labs ran Krea 2 Large through a four-model style-transfer evaluation against OpenAI's GPT Image 2, Google DeepMind's Gemini 3 Pro Image Preview, and BytePlus's Seedream 5.0 Lite. Evaluators ranked outputs head-to-head and rated each one against four rubric dimensions: prompt adherence, visual quality, usability, and style fidelity. The data produced a clear positioning for Krea, but only on one dimension.

Style-transfer evaluation announcement. Krea 2 Large vs GPT Image 2, Gemini 3 Pro Image Preview, and Seedream 5.0 Lite.
Share

Style Fidelity

On Style Fidelity, Krea 2 Large averages 3.39 out of 5. GPT Image 2 averages 3.53. The gap is 0.14 points. Gemini 3 Pro sits at 2.74. Seedream 5.0 Lite at 2.42. On the dimension Krea was built for, the closest comparison is GPT Image 2, and the gap to the rest of the field is more than 4x the gap to first place.

Scalar average rating across all four dimensions. Krea leads on Style Fidelity (3.39) and Prompt Adherence (4.0).
Share

The top end of the distribution is even tighter. 23% of Krea's outputs were rated "5" on Style Fidelity. GPT Image 2 earned a "5" on 22%. When Krea hits, it hits at the same rate as the frontier model. The difference shows up further down the scale. 26% of Krea's outputs landed in the bottom two ratings on Style Fidelity, versus 20% for GPT Image 2 and 46% for Gemini 3 Pro. Krea matches GPT at the peak. It does not match GPT at the floor.

Style Fidelity rating distribution. Krea ties GPT at the top of the scale (23% vs 22%) but trails on the floor (26% in the bottom two ratings vs 20%).
Share

Style Fidelity is also the hardest dimension in the evaluation. Every model other than GPT averages below 3.4. The neutral midpoint, where outputs tip from useful to net-negative, is 3.0. Gemini and Seedream both sit below it on this dimension. Krea is the only non-GPT model in the study that holds above the line on style transfer. For a model built around style, that is the result that matters.

Same prompt. Same reference. Four models. Krea (winner) holds the painterly atmosphere of the reference; the others drift toward photorealism.
Share

The rest of the evaluation

The rest of the evaluation tells a more measured story. Krea finishes fourth on overall pairwise win rate at 39.5%. Fourth on Prompt Adherence at 3.31. Fourth on Visual Quality at 3.35. Across head-to-head matchups, Krea loses 58% of the time to Seedream and 55% to Gemini, both of which underperform on absolute scalar ratings. Krea places fourth on 42% of prompts and rarely produces the single best image in a four-way comparison. Tournaments reward ceiling over floor.

Head-to-head win rate heatmap. Krea loses 58% of matchups to Seedream and 55% to Gemini despite outscoring both on absolute Style Fidelity.
Share

The qualitative data points to the mechanism. Evaluators consistently describe Krea's outputs as faithful to the reference image: matching color palette, brushwork, and tonal qualities, while staying coherent across the full composition. The same evaluators flag Krea for being less ambitious on prompt-driven generation, where the brief asks for something the reference image does not literally show. Krea treats the reference as a constraint. The other models treat it as a starting point.

Krea 2 Large strengths and weaknesses, drawn from evaluator rationales. Faithful to the reference; less ambitious on prompt-driven generation.
Share

The properties that make a model strong at style fidelity (restraint, faithfulness, conservative interpretation of the reference) are not the same properties that win pairwise tournaments. Compelling and faithful are different decisions, and a four-way comparison rewards compelling. Krea optimizes for faithful.

Practical takeaway

The practical takeaway is narrow on purpose. Use Krea 2 Large when the brief is a style brief. When a reference image carries the visual direction and the job is to apply that style to a new subject without drift, Krea is the closest non-frontier option and competitive with GPT Image 2 at the top of the rating scale. For prompt-led work, where the text carries more of the creative load, the data points elsewhere.

Continue reading3 studies
All research
  1. May 5, 2026Research
    Creatives keep telling us the same thing about AI: every output looks the same.12 models, 5 creative domains. One repeated complaint from working evaluators: the work all looks the same.Read
  2. May 6, 2026Benchmark
    Seedream 5.0 Lite swept the field on product detail shots.A blind head-to-head against the leading image models from Google, OpenAI, and Black Forest Labs, evaluated by professional creatives.Read
  3. April 28, 2026Benchmark
    Grok Imagine is the "Polisher" model. Hand off the early rounds, bring it in for refinement.The biggest phase-over-phase climb of any video model in the study. 3rd at ideation, 1st at refinement.Read

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings

Connecting with the missing signal: taste

Contra connects top creative minds with AI teams training models to understand taste. This is expert input, not crowd labor. It's the creative layer powering the next generation of AI.

Designers

Writers

Marketers

Engineers

Social Media Experts

Video Editors & Animators

Music & Audio Engineers

1.5M+

creative experts

400+

Skills and tools represented

$250M+

verified expert earnings