Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's an obvious exaggeration. The competition is using smaller weights already, some of which are floating point and some of which aren't.

And they use full size floats for training.



That means their paper is actually worse than SOTA, which is concerned with training in fp4 natively without full precision [0] for QAT.

[0] "full precision" in ML usually means 16 bit floats like bfloat16


I wouldn't say "worse". It's focusing on inference cost and leaving training at a default for now.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: