I wonder if you can't just put a bunch of results with and without artifacts into two different bins, and do another round of training on them. But I don't know enough about how style transfer and retraining of these nets and all that modern stuff works to tell if that is feasible.
I suspect because there's a lot of entropy in hair and because of the shape of the optimization function, (which might even have a spatial term) a regular pattern in such a noisy and hard to learn region falls into a local minimum while the rest of the image converges to the true minimum. There's a little meat left to optimize here, but you need to do it cleverly because there's no reason for a neural network to learn all the many combinations of hair pixels in this application. That could require as many parameters all the neurons involved in generating the faces, I'd bet.
Thinking more about it, the shape of the solution space is sufficiently different for hair vs faces that any given combination of {optimization function, hyperparameters, training data} is unlikely to optimize for both. You probably need some other sort of special tuning, like a spatially local adaptive gradient for regions of hair.
Also the some teeth are getting drawn in front of the lips and the pupils are not round. I think the not-round pupils make the eyes point two different directions, kinda unsettling.
It also seems to have given some faces contact lenses! https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-..., https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-...