15× vs. ~1.37×: Recalculating GPT-5.3-Codex-Spark on SWE-Bench Pro

> The narrative from AI companies hasn’t really changed, but the reaction has. The same claims get repeated so often that they start to feel like baseline reality, and people begin to assume the models are far more capable than they actually are.

This has been the case for people who buy into hype and don’t actually use the products, but I’m pretty sure people who do are pretty disillusioned by all the claims. The only somewhat reliable method is to test the things for your own use case.

That said: I always expected the tradeoff of Spark to be accuracy vs. speed. That it’s still significantly faster at the same accuracy is wild. I never expected that.

I believe a lot of the speed-up is due to a new chip they use [1] so the fact that the speedup didn't reduce the number of operations is likely why the accuracy has changed little.

1. https://www.cerebras.ai/blog/openai-codexspark

>The fair comparison is where the models are basically equivalent in intelligence

I don't agree with this premise. I think it is fair to say that Haiku is a faster model than Opus.

[deleted]

Method: I used OpenAI’s published SWE-Bench Pro chart points and matched GPT-5.3-Codex-Spark to the baseline model at comparable accuracy levels by reasoning effort. At similar accuracy, the effective speedup is closer to ~1.37× rather than 15×.

efficiency per token has tanked but it's still faster. given this is the first generation for Cerberas hardware this is the worst it's ever going to be.

when it reaches the main 5.3 codex efficiency at this token rate this kind of articles will seem silly in retrospect

[dead]