I think people's opinion of "marginal improvement" is based on their relative ab...

nostrebored · 2026-05-09T16:37:45 1778344665

2024-2025 was filled with huge improvements. 2025-2026 has not been, outside of open source.

The idea that we’re at the point where it’s superseded our ability to tell just makes no sense. I’ll be happy if we can get to a point where I don’t have to tell Claude not to tail every bash command or make a job that writes throughout instead of once at the end. I’ll be happy if “continue this interaction naturally, you are taking over from an independent subagent” works.

But I’m not holding my breath. It’s still really cool that any of this stuff is possible.

miki123211 · 2026-05-09T19:09:42 1778353782

Claude in feb of 2025 was barely able to code. Sure, it could write you a nice function, it could even write you a complex 200-line algorithm, but give it a codebase, and it would quickly get overwhelmed.

Claude in feb of 2026? Still far from perfect, but there's definitely a huge improvement here.

dang · 2026-05-09T18:52:46 1778352766

> I think this is a pretty ridiculous take.

This falls in the category of swipes/name-calling in https://news.ycombinator.com/newsguidelines.html - can you please edit those out?

You're a good contributor - it's just all too easy for unintentional sharpness to downgrade the conversation, and when it's a good conversation like this one, that's especially regrettable.

nostrebored · 2026-05-10T03:04:14 1778382254

Noted, doesn’t seem like I’m able to edit anymore though

dang · 2026-05-10T03:35:42 1778384142

I've re-opened it for editing if you want to. For us the main point is just to fix things going forward!

spwa4 · 2026-05-09T18:47:25 1778352445

The correct way to estimate this is exactly what people do. Measure the distance between ChatGPT's best public model and state of the art, the best humans. And there is very little difference between those versions from that perspective. It is very far away from peak human performance, and not getting noticeably closer for over a year now. There's lots of progress, but if you're OpenAI/Anthropic/Google, exactly the wrong kind of progress: the difference between ChatGPT 5.5 and a 27B/4B model (you need to try Gemma4-26B-A4B, wtf, it runs acceptably on CPU) is now reduced to ELO 1501 vs ELO 1434, generously a 70 ELO point difference, down from over 400, data from Arena.ai.

(in fact I find that Qwen-35B-A3B and Gemma4-26B-A4B very rarely "know" the answer, and so use first principles thinking, or go out and look for the answer where GPT-5.4 does not and simply assumes it knows. Which leads to now, in some cases, the small models far outperforming the big ones. Huge context + training quality seem to be the determining factors now, and neither of those are the strengths of SOTA models. If this continues ...)

While I agree this is a training problem, it is not a solvable one. ML models learn from examples. This is even true for their newest tricks like GRPO. They cannot train against things humans don't yet know.

And that's great, but you're forever locked at the peak of what you can be taught in widely available courses (which they download without paying) (even that is best case scenario: it assumes your ability to distinguish bullshit from reality somehow becomes perfect during training, or even before). The only way to exceed peak human performance is to start experimenting with math, physics, chemistry, even humans, yourself. And that has, even for humans, a massively higher cost than learning from examples, or from a course.

The reason they don't go further is the worst possible reason: the cost. It requires a 100x increase in training expense. Think of it like this: to exceed SOTA in physics or chemistry, training the next version of ChatGPT requires a particle accelerator, and a chemistry laboratory. This cannot be bypassed. Oh and not just any particle accelerator, right? A better one than the best currently existing one. Same for Chemistry labs. Same for ... So 100x is conservative.

But without doing it, ML models (LLM or otherwise) are forever limited at the level an army of first year university students achieve, ON AVERAGE. Maybe they can make that 2nd or even 4th year, at the end of the curve. But that's the limit. Phd level is the level you have to come up with new discoveries, and that ... just isn't possible with current training, even at the end of the improvement curve.

And ... is there budget to increase training cost another 100x? No ... there isn't. Not even with this totally absurd level of investment there isn't. And if small models keep this up, there's no way the investment is even remotely worth it.