Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

swe-REbench does this. They gather real issues from github repos on a ~monthly basis, and test the models. On their leaderboard you can use a slider to select issues created after a model was released, and see the stats. It works for open models, a bit uncertain on closed models. Not perfect, but best we have for this idea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: