Krea 2 Large is the #2 style-transfer model. The gap to GPT Image 2 is smaller than the rest of the data would suggest.
Contra Labs ran Krea 2 Large through a four-model style-transfer evaluation against OpenAI's GPT Image 2, Google DeepMind's Gemini 3 Pro Image Preview, and BytePlus's Seedream 5.0 Lite. Evaluators ranked outputs head-to-head and rated each one against four rubric dimensions: prompt adherence, visual quality, usability, and style fidelity. The data produced a clear positioning for Krea, but only on one dimension.

Style Fidelity
On Style Fidelity, Krea 2 Large averages 3.39 out of 5. GPT Image 2 averages 3.53. The gap is 0.14 points. Gemini 3 Pro sits at 2.74. Seedream 5.0 Lite at 2.42. On the dimension Krea was built for, the closest comparison is GPT Image 2, and the gap to the rest of the field is more than 4x the gap to first place.

The top end of the distribution is even tighter. 23% of Krea's outputs were rated "5" on Style Fidelity. GPT Image 2 earned a "5" on 22%. When Krea hits, it hits at the same rate as the frontier model. The difference shows up further down the scale. 26% of Krea's outputs landed in the bottom two ratings on Style Fidelity, versus 20% for GPT Image 2 and 46% for Gemini 3 Pro. Krea matches GPT at the peak. It does not match GPT at the floor.

Style Fidelity is also the hardest dimension in the evaluation. Every model other than GPT averages below 3.4. The neutral midpoint, where outputs tip from useful to net-negative, is 3.0. Gemini and Seedream both sit below it on this dimension. Krea is the only non-GPT model in the study that holds above the line on style transfer. For a model built around style, that is the result that matters.

The rest of the evaluation
The rest of the evaluation tells a more measured story. Krea finishes fourth on overall pairwise win rate at 39.5%. Fourth on Prompt Adherence at 3.31. Fourth on Visual Quality at 3.35. Across head-to-head matchups, Krea loses 58% of the time to Seedream and 55% to Gemini, both of which underperform on absolute scalar ratings. Krea places fourth on 42% of prompts and rarely produces the single best image in a four-way comparison. Tournaments reward ceiling over floor.

The qualitative data points to the mechanism. Evaluators consistently describe Krea's outputs as faithful to the reference image: matching color palette, brushwork, and tonal qualities, while staying coherent across the full composition. The same evaluators flag Krea for being less ambitious on prompt-driven generation, where the brief asks for something the reference image does not literally show. Krea treats the reference as a constraint. The other models treat it as a starting point.

The properties that make a model strong at style fidelity (restraint, faithfulness, conservative interpretation of the reference) are not the same properties that win pairwise tournaments. Compelling and faithful are different decisions, and a four-way comparison rewards compelling. Krea optimizes for faithful.
Practical takeaway
The practical takeaway is narrow on purpose. Use Krea 2 Large when the brief is a style brief. When a reference image carries the visual direction and the job is to apply that style to a new subject without drift, Krea is the closest non-frontier option and competitive with GPT Image 2 at the top of the rating scale. For prompt-led work, where the text carries more of the creative load, the data points elsewhere.

