Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Gemini 2.5 Pro set the SOTA on the aider polyglot coding leaderboard [0] with a score of 73%.

This is well ahead of thinking/reasoning models. A huge jump from prior Gemini models. The first Gemini model to effectively use efficient diff-like editing formats.

[0] https://aider.chat/docs/leaderboards/



Am I correct in assuming that accuracy < using correct edit format? i.e. it made mistakes in 27% of the problems, 11% of which were due to (at least) messing up the diff format?

In which case, google should be working on achieving better output format following, as Claude and R1 are able to hit nearly 100% accuracy on the format.


It does have fairly low adherence to the edit format, compared to the other frontier models. But it is much better than any previous Gemini model in this regard.

Aider automatically asks models to retry malformed edits, so it recovers. And goes on to produce a SOTA score.


Ok, thanks for clearing that up.


The only benchmark I care about. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: