Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
mike_hearn
34 days ago
|
parent
|
context
|
favorite
| on:
Accelerating Gemma 4: faster inference with multi-...
Maybe at very high level of abstraction, but there's no branching involved.
lossolo
34 days ago
[–]
Well, there are multiple token proposals processed in parallel, from which only one is picked, seems like branching to me. The only difference is that in case of CPU there is always only one possible branch that is correct.
monster_truck
34 days ago
|
parent
[–]
Well, not exactly, but that was the dream we were sold (here be dragons)
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: