Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think people's opinion of "marginal improvement" is based on their relative ability. A 2000 elo chess player is going to think the jump from 500 to 1000 is marginal. They're both floundering around not doing anything resembling common sense. A 1000 elo chess player is going to find the jump from 2000 to 2500 marginal. They're both playing far better moves for incomprehensible reasons, and the only reason you know the 2500 player is better is due to benchmarking. It is only when you are evaluating systems about at your level that you can feel the improvement.

I, personally, found the past two years to be a much larger improvement than the previous two years.



2024-2025 was filled with huge improvements. 2025-2026 has not been, outside of open source.

The idea that we’re at the point where it’s superseded our ability to tell just makes no sense. I’ll be happy if we can get to a point where I don’t have to tell Claude not to tail every bash command or make a job that writes throughout instead of once at the end. I’ll be happy if “continue this interaction naturally, you are taking over from an independent subagent” works.

But I’m not holding my breath. It’s still really cool that any of this stuff is possible.


Claude in feb of 2025 was barely able to code. Sure, it could write you a nice function, it could even write you a complex 200-line algorithm, but give it a codebase, and it would quickly get overwhelmed.

Claude in feb of 2026? Still far from perfect, but there's definitely a huge improvement here.


> I think this is a pretty ridiculous take.

This falls in the category of swipes/name-calling in https://news.ycombinator.com/newsguidelines.html - can you please edit those out?

You're a good contributor - it's just all too easy for unintentional sharpness to downgrade the conversation, and when it's a good conversation like this one, that's especially regrettable.


Noted, doesn’t seem like I’m able to edit anymore though


I've re-opened it for editing if you want to. For us the main point is just to fix things going forward!


The correct way to estimate this is exactly what people do. Measure the distance between ChatGPT's best public model and state of the art, the best humans. And there is very little difference between those versions from that perspective. It is very far away from peak human performance, and not getting noticeably closer for over a year now. There's lots of progress, but if you're OpenAI/Anthropic/Google, exactly the wrong kind of progress: the difference between ChatGPT 5.5 and a 27B/4B model (you need to try Gemma4-26B-A4B, wtf, it runs acceptably on CPU) is now reduced to ELO 1501 vs ELO 1434, generously a 70 ELO point difference, down from over 400, data from Arena.ai.

(in fact I find that Qwen-35B-A3B and Gemma4-26B-A4B very rarely "know" the answer, and so use first principles thinking, or go out and look for the answer where GPT-5.4 does not and simply assumes it knows. Which leads to now, in some cases, the small models far outperforming the big ones. Huge context + training quality seem to be the determining factors now, and neither of those are the strengths of SOTA models. If this continues ...)

While I agree this is a training problem, it is not a solvable one. ML models learn from examples. This is even true for their newest tricks like GRPO. They cannot train against things humans don't yet know.

And that's great, but you're forever locked at the peak of what you can be taught in widely available courses (which they download without paying) (even that is best case scenario: it assumes your ability to distinguish bullshit from reality somehow becomes perfect during training, or even before). The only way to exceed peak human performance is to start experimenting with math, physics, chemistry, even humans, yourself. And that has, even for humans, a massively higher cost than learning from examples, or from a course.

The reason they don't go further is the worst possible reason: the cost. It requires a 100x increase in training expense. Think of it like this: to exceed SOTA in physics or chemistry, training the next version of ChatGPT requires a particle accelerator, and a chemistry laboratory. This cannot be bypassed. Oh and not just any particle accelerator, right? A better one than the best currently existing one. Same for Chemistry labs. Same for ... So 100x is conservative.

But without doing it, ML models (LLM or otherwise) are forever limited at the level an army of first year university students achieve, ON AVERAGE. Maybe they can make that 2nd or even 4th year, at the end of the curve. But that's the limit. Phd level is the level you have to come up with new discoveries, and that ... just isn't possible with current training, even at the end of the improvement curve.

And ... is there budget to increase training cost another 100x? No ... there isn't. Not even with this totally absurd level of investment there isn't. And if small models keep this up, there's no way the investment is even remotely worth it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: