Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
goldenarm
7 days ago
|
parent
|
context
|
favorite
| on:
GLM-5: Targeting complex systems engineering and l...
If you're tired of cross-referencing the cherry-picked benchmarks, here's the geometric mean of SWE-bench Verified & HLE-tools :
Claude Opus 4.6: 65.5%
GLM-5: 62.6%
GPT-5.2: 60.3%
Gemini 3 Pro: 59.1%
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Claude Opus 4.6: 65.5%
GLM-5: 62.6%
GPT-5.2: 60.3%
Gemini 3 Pro: 59.1%