If you're tired of cross-referencing the cherry-picked benchmarks, here's the ge...

		goldenarm 7 days ago \| parent \| context \| favorite \| on: GLM-5: Targeting complex systems engineering and l... If you're tired of cross-referencing the cherry-picked benchmarks, here's the geometric mean of SWE-bench Verified & HLE-tools : Claude Opus 4.6: 65.5% GLM-5: 62.6% GPT-5.2: 60.3% Gemini 3 Pro: 59.1%

		help