GPT-5.2 and GPT-5.2-Codex are now 40% faster

tmaly · 2026-02-04T18:56:55 1770231415

Over the weekend I was running the same prompt across GPT-5.2, Gemini 3, and Grok. Both Gemini 3 and Grok on thinking mode finished within 2 minutes. GPT-5.2 was just spinning its wheels for like 6 minutes.

prodigycorp · 2026-02-04T01:04:39 1770167079

This is great.

In the past month, OpenAI has released for codex users:

- subagents support

- a better multi agent interface (codex app)

- 40% faster inference

No joke, with the first two my productivity is already up like 3x. I am so stoked to try this out.

jswny · 2026-02-04T04:10:10 1770178210

How do you get sub agents to work?

prodigycorp · 2026-02-04T12:40:39 1770208839

Add this to config.toml:

  [features]
  collab = true

wahnfrieden · 2026-02-04T01:27:50 1770168470

this is for api only

walletdrainer · 2026-02-04T12:09:18 1770206958

Is it even possible to actually use codex any other way? Every time I’ve tried logging in instead of using the API, I’ve hit the usage limits within a couple of hours.

prodigycorp · 2026-02-04T02:08:57 1770170937

Shoot me

wahnfrieden · 2026-02-04T03:48:26 1770176906

looks like i'm wrong

wahnfrieden · 2026-02-04T06:23:42 1770186222

wrong again: https://x.com/embirico/status/2018928763040665702

prodigycorp · 2026-02-04T12:41:07 1770208867

Thanks for updating this!

brianwawok · 2026-02-04T01:09:07 1770167347

Try Claude and you can get x^2 performance. OpenAI is sweating

viraptor · 2026-02-04T01:48:42 1770169722

May be a bit different depending on what kind of work you're doing, but for me 5.2-codex finally reached higher level than opus.

klipklop · 2026-02-04T01:25:05 1770168305

5.2-codex is pretty solid and you get dramatically higher usage rates with cheap plans. I would assume API use is much cheaper as well.

jerkstate · 2026-02-04T03:03:44 1770174224

people are sleeping on openai right now but codex 5.2 xhigh is at least as good as opus and you get a TON more usage out of the OpenAI $20/mo plan than Claude's $20/mo plan. I'm always hitting the 5 hour quota with Opus but never have with Codex. Codex tool itself is not quite as good but close.

indemnity · 2026-02-04T06:47:08 1770187628

Is there a plan like the $100 Claude Max? $200 for ChatGPT Pro is a little bit too much for me.

Whereas Claude Max 5x is enough that I don’t really run out with my usage patterns.

jerkstate · 2026-02-04T15:03:33 1770217413

If $20/mo Claude is not enough for you but 5x Claude at $100/mo is, the $20 chatgpt plus subscription might give you enough codex for your usage

p2hari · 2026-02-04T10:17:55 1770200275

I do not think so. I have been using both for a long time and with Claude I keep hitting the limits quickly and also most of the time arguing. The latest GPT is just getting things done and does it fast. I also agree with most of them that the limits are more generous. (context, do lot of web, backend development and mobile dev)

akmarinov · 2026-02-04T06:28:17 1770186497

If i could use GPT-5.2 with Claude Code - yeah. Otherwise slOpus requires too much steering to get things done. GPT-5.2 just works

ramon156 · 2026-02-04T09:42:46 1770198166

4.1 or 4.5? I did not need to steer Opus 4.5 at many points. A good description was more than enough

thadk · 2026-02-04T02:48:33 1770173313

It was probably from the other day when roon realized that normal people have it slower than staff.

Then from that they realized they could just run API calls more like staff, fast, not at capacity.

Then they leave the billion other people's calls at remaining capacity.

https://thezvi.substack.com/i/185423735/choose-your-fighter

> Ohqay: Do you get faster speeds on your work account?

> roon: yea it’s super fast bc im sure we’re not running internal deployment at full load

OutOfHere · 2026-02-04T01:05:15 1770167115

OpenAI in my estimation has the habit of dropping a model's quality after its introduction. I definitely recall the web ChatGPT 5.2 being a lot better when it was introduced. A week or two later, its quality suddenly dropped. The initial high looked to be to throw off journalists and benchmarks. As such, nothing that OpenAI says in terms of model speed can be trusted. All they have to do is lower the reasoning effort on average, and boom, it becomes 40% faster. I hope I am wrong, because if I am right, it's a con game.

Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.

tedsanders · 2026-02-04T01:35:59 1770168959

It's good to be skeptical, but I'm happy to share that we don't pull shenanigans like this. We actually take quite a bit of care to report evals fairly, keep API model behavior constant, and track down reports of degraded performance in case we've accidentally introduced bugs. If we were degrading model behavior, it would be pretty easy to catch us with evals against our API.

In this particular case, I'm happy to report that the speedup is time per token, so it's not a gimmick from outputting fewer tokens at lower reasoning effort. Model weights and quality remain the same.

deaux · 2026-02-04T02:44:15 1770173055

It looks like you do pull shenanigans like these [0]. The person you're replying to even mentioned "ChatGPT 5.2", but you're specifically talking only about the API, while making it sound like it applies across the board. Also appreciate the attempt to further hide this degradation of the product they paid for from users by blocking the prompt used to figure this out.

Happy to retract if you can state [0] is false.

[0] https://x.com/btibor91/status/2018754586123890717

tedsanders · 2026-02-04T15:38:54 1770219534

Yes, independent of the API speedup, we also recently reduced the thinking effort in ChatGPT. Our intent here was purely user experience, not cost savings. People have complained about the slow speeds of the Thinking models for a long time (myself included), so we recently retuned it to be faster, at the expense of less thoroughness.

I won't BS you that costs are never part of our decision making. If costs didn't matter, we'd have unlimited rate limits and 10M token context windows and subscription pricing of $0. But as someone in the room where these decisions are made, I can honestly report that our goal is almost always trying to figure out how to make people happier, not trick them. We're trying to fairly earn subscriptions, not scam anyone. In the cases where we have accidentally misled people (e.g., saying voice mode was weeks away), it was optimistic planning, not nefarious intent.

API model behavior is guaranteed to nearly stay the same (modulo standard non-determinism, bugs, etc.). ChatGPT is harder to promise, not because we pull more shenanigans there, but just because we might tweak system prompts, add/remove tools, run A/B tests, etc. that vary performance a bit. But we definitely don't do things like quantize during busy parts of the day or nerf models after publishing evals - that would feel pretty shady.

empath75 · 2026-02-04T16:47:05 1770223625

Chatgpt 5.2 in the past couple of weeks has gotten noticeably worse for me to the point that I stopped using it and just ask claude code questions instead.

virgildotcodes · 2026-02-04T03:55:31 1770177331

Would love a direct response to this.

zamadatix · 2026-02-04T01:54:51 1770170091

Hey Ted, can you confirm whether this 40% improvement is specific to API customers or if that's just a wording thing because this is the OpenAI Developers account posting?

tedsanders · 2026-02-04T15:41:01 1770219661

It's specific to the API.

8note · 2026-02-04T03:01:25 1770174085

so what actually happens if it isnt shenanigans?

its worth you guys doing on your end, some analysis of why customers are getting worse results a week or two later, and putting out some guidelines about what context is poisonous and the like

OutOfHere · 2026-02-04T03:00:41 1770174041

Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.

jiggawatts · 2026-02-04T03:48:38 1770176918

I've seen Sam Altman make similar claims in interviews, and I now interpret every statement from an Open AI employee (and especially Sam) as if an Aes Sedai had said it.

I.e.: "keep API model behavior constant" says nothing about the consumer ChatGPT web app, mobile apps, third-party integrations, etc.

Similarly, it might mean very specifically that a "certain model timestamp" remains constant but the generic "-latest" or whatever model name auto-updates "for your convenience" to the new faster performance achieved through quantisation or reduced thinking time.

You might be telling the full, unvarnished truth, but after many similar claims from OpenAI that turned out to be only technically true, I remain sceptical.

tedsanders · 2026-02-04T15:45:37 1770219937

That's a fair suspicion - I'll freely acknowledge that I am biased towards saying things that are simple and known, and I steer away from topics that feel too proprietary, messy, etc.

ChatGPT model behavior can definitely change over time. We share release notes here (https://help.openai.com/en/articles/6825453-chatgpt-release-...), and we also make changes or run A/B tests that aren't reported there. Plus, ChatGPT has memory, so as you use it, its behavior can technically change even with no changes on our end.

That said, I do my best to be honest and communicate the way that I would want someone to communicate with me.

wahnfrieden · 2026-02-04T01:45:10 1770169510

You're confirming you don't alter "juice" levels..?

tedsanders · 2026-02-04T15:46:49 1770220009

No, we did adjust the thinking levels in ChatGPT recently, but it was motivated by trying to improve the product based on what users told us, not cost savings. I wrote a bit more here: https://news.ycombinator.com/item?id=46887150

scrollop · 2026-02-04T14:20:41 1770214841

OpenAI isn't the only one:

Anthropic:

https://marginlab.ai/trackers/claude-code/

jxmesth · 2026-02-04T05:40:41 1770183641

Someone should create a daily benchmark site for Codex like they did for Claude

OutOfHere · 2026-02-04T14:28:51 1770215331

I see https://marginlab.ai/trackers/codex/

bethekidyouwant · 2026-02-04T01:15:40 1770167740

I mean you can just run the benchmark again

OutOfHere · 2026-02-04T13:49:46 1770212986

How are you going to benchmark the web ChatGPT Plus, which is where a reduction was suspected?

simianwords · 2026-02-04T00:33:23 1770165203

It’s interesting that they kept the price the same while doing inference on Cerebras is much more expensive.

diwank · 2026-02-04T00:56:07 1770166567

I dont think this is Cerebras. Running on cerebras would change model behavior a bit and it could potentially get a ~10x speedup and it'd be more expensive. So most likely this is them writing new more optimized kernels for Blackwell series maybe?

simianwords · 2026-02-04T01:03:00 1770166980

Fair point but it remains to answer - why isn’t this speed up available in ChatGPT and only in the api?

chillee · 2026-02-04T00:54:31 1770166471

this is almost certainly not being done on cerebras

thebigspacefuck · 2026-02-04T05:19:25 1770182365

Speed was always my main complaint, these models always felt really good but too slow. I’ll have to give them a try again.

riku_iki · 2026-02-04T02:45:56 1770173156

tons of posts on reddit that they also significantly dropped quality

samusiam · 2026-02-04T13:00:39 1770210039

There are always people on reddit saying such-and-such model quality significantly dropped. Every single day there's a post like this in one of the Claude sub-reddits. It's virtually never substantiated with reliable evidence.