I’m not sure the origin, but using ViT-L (the encoder shared with SD1. x) for what you might call the main prompt and ViT-G (the new SDXL encoder, and also a successor to the encoder used as the single encoder in SD 2.x) for a style prompt was a common idea shortly after SDXL launched, so its understandable.