Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven’t looked at the structure carefully, but It’s hard to guess there are shared layers between models. Likely the input layers for sure, since there is no need to tokenize separately for each model (unless different models have specialized vocabulary).


This is my understanding as well. Also includes the parameters of the expert-routing gating network.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: