Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> so that the model isn't required to compress a large proportion of the internet into their weights.

The knowledge compressed into an LLM is a byproduct of training, not a goal. Training on internet data teaches the model to talk at all. The knowledge and ability to speak are intertwined.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: