Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

large language models are large and must be loaded into memory to train or to use for inference if we want to keep them fast. older models like gpt3 have around 175 billion parameters. at float32s that comes out to something like 700GB of memory. newer models are even larger. and openai wants to run them as consumer web services.




I mean, I know that much. The numbers still don't make sense to me. How is my internal model this wrong?

For one, if this was about inference, wouldn't the bottleneck be the GPU computation part?


Concurrency?

Suppose some some parallelized, distributed task requires 700GB of memory (I don't know if it does or does not) per node to accomplish, and that speed is a concern.

A singular pile of memory that is 700GB is insufficient not because it lacks capacity, but instead because it lacks scalability. That pile is only enough for 1 node.

If more nodes were added to increase speed but they all used that same single 700GB pile, then RAM bandwidth (and latency) gets in the way.


This "memory shortage" is not about AI companies needing main memory (which you plug into mainboards), but manufacturers are shifting their production capacities to other types of memory that will go onto GPUs. That brings supply for other memory products down, increasing their market price.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: