Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only thing I learned in the last year that you can't really benchmark llms at all. Above a certain level it's just edge case against edge case or script kiddies and multi billion corps optimizing their fine tune against the test.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: