I actually took the 314B from Grok's HF page [1] which describes the model as "314B parameters" when explaining why it needs a multi-GPU machine.
I certainly agree that parameter count isn't everything, though; clearly things like training data quality and fine tuning count for a lot.
[1] https://huggingface.co/xai-org/grok-1
I actually took the 314B from Grok's HF page [1] which describes the model as "314B parameters" when explaining why it needs a multi-GPU machine.
I certainly agree that parameter count isn't everything, though; clearly things like training data quality and fine tuning count for a lot.
[1] https://huggingface.co/xai-org/grok-1