Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

even with t=0 they are stochastic. e.g., non associative nature of floating point operations


That is an artifact of implementation. You can absolutely implement it using strict FP. But even if not, any given implementation will still do things in a specific order which can be documented. And then if you're running quantized (including KV cache), there's a lot less floating point involved.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: