I haven’t looked at the structure carefully, but It’s hard to guess there are shared layers between models. Likely the input layers for sure, since there is no need to tokenize separately for each model (unless different models have specialized vocabulary).