Gemini 2.5 Pro set the SOTA on the aider polyglot coding leaderboard [0] with a ...

aoeusnth1 · on March 26, 2025

Am I correct in assuming that accuracy < using correct edit format? i.e. it made mistakes in 27% of the problems, 11% of which were due to (at least) messing up the diff format?

In which case, google should be working on achieving better output format following, as Claude and R1 are able to hit nearly 100% accuracy on the format.

anotherpaulg · on March 26, 2025

It does have fairly low adherence to the edit format, compared to the other frontier models. But it is much better than any previous Gemini model in this regard.

Aider automatically asks models to retry malformed edits, so it recovers. And goes on to produce a SOTA score.

aoeusnth1 · on March 26, 2025

Ok, thanks for clearing that up.

sagarpatil · on March 26, 2025

The only benchmark I care about. Thanks!