How well can LLMs reason about avoiding UB? This seems like one of those things ...

adamzwasserman · 2025-12-11T03:27:30 1765423650

Fair point on UB — LLMs absolutely do not reason about it (or anything else). They just reproduce the lowest-common-denominator patterns that happened to survive in the wild.

I’m not claiming the generated C is “safe” or even close. I am sure that in practice it still has plenty of time-bombs, but empirically, for the narrow WASM tasks I tried, the raw C suggestions were dramatically less wrong than the equivalent JavaScript ones — fewer obvious foot-guns, better idioms, etc.

So my original “noticeably better” was really about “fewer glaring mistakes per 100 lines” rather than “actually correct.” I still end up rewriting or heavily massaging almost everything, but it’s a better starting point than the JS ever was.