Am I correct in assuming that accuracy < using correct edit format? i.e. it made... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

aoeusnth1 on March 26, 2025 | parent | context | favorite | on: Gemini 2.5

Am I correct in assuming that accuracy < using correct edit format? i.e. it made mistakes in 27% of the problems, 11% of which were due to (at least) messing up the diff format?

In which case, google should be working on achieving better output format following, as Claude and R1 are able to hit nearly 100% accuracy on the format.

anotherpaulg on March 26, 2025 [–]

It does have fairly low adherence to the edit format, compared to the other frontier models. But it is much better than any previous Gemini model in this regard.

Aider automatically asks models to retry malformed edits, so it recovers. And goes on to produce a SOTA score.

aoeusnth1 on March 26, 2025 | [–]

Ok, thanks for clearing that up.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact