Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you read the whole article?

"In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism."



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: