GLM-4.7-Flash was the first local coding model that I felt was intelligent enough to be useful. It feels something like Claude 4.5 Haiku at a parameter size where other coding models are still getting into loops and making bewilderingly stupid tool calls. It also has very clear reasoning traces that feel like Claude, which does result in the ability to inspect its reasoning to figure out why it made certain decisions.
So far I haven't managed to get comparably good results out of any other local model including Devstral 2 Small and the more recent Qwen-Coder-Next.
Slightly off topic. I had a hard time getting models to run with ollama, and I thought that my computer (32gm ram, GTX4070 12Gb vram) just couldn't do it. The I tried LM Studio and after fiddling with some settings, I got models running and quite fast. I didn't try GLM-4.7 flash but I did GLM-4.6v flash and it was amazing to see it be able to analyze all kinds of images (since it has vision support). I was simply stunned. I can't believe that a simple gaming machine can do many of the things I used cloud models for. It was absolutely strikingly good at guessing locations of photos. Even vague ones. Deducing landmarks, writings, types of traffic signs. I need to try 4.7 flash. Hopefully it can ran fast with my machine.
I'm not sure what it is about GLM 4.7 Flash, but it definitely seems to nail a sweet spot. Even the supposedly frontier models make a mess of large requests, so small, well-scoped requests are the way, IMO; and in that space, 4.7 Flash holds its own better than it has any right to.
So far I haven't managed to get comparably good results out of any other local model including Devstral 2 Small and the more recent Qwen-Coder-Next.