And they use full size floats for training.
[0] "full precision" in ML usually means 16 bit floats like bfloat16
And they use full size floats for training.