Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Although Llama 4 is too big for mere mortals to run without many caveats, the economics of call a dedicated-hosting Llama 4 are more interesting than expected.

$0.11 per 1M tokens, a 10 million content window (not yet implemented in Groq), and faster inference due to fewer activated parameters allows for some specific applications that were not cost-feasible to be done with GPT-4o/Claude 3.7 Sonnet. That's all dependent on whether the quality of Llama 4 is as advertised, of course, particularly around that 10M context window.



It's possible that we'll see smaller Llama 4-based models in the future, though. Similar to Llama 3.2 1B, which was released later than other Llama 3.x models.


Yeah, I too am looking forward to their small text only models at 3B and 1B.


> Llama 4 is too big for mere mortals to run without many caveats

AMD MI300x has day zero support to run it using vLLM. Easy enough to rent them for decent pricing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: