I haven’t looked at the structure carefully, but It’s hard to guess there are sh... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		pilotneko on Dec 11, 2023 \| parent \| context \| favorite \| on: Mixtral of experts I haven’t looked at the structure carefully, but It’s hard to guess there are shared layers between models. Likely the input layers for sure, since there is no need to tokenize separately for each model (unless different models have specialized vocabulary).

lordswork on Dec 11, 2023 [–]

This is my understanding as well. Also includes the parameters of the expert-routing gating network.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact