Memory bandwidth puts an upper limit on LLM tokens per second. At 200GB/s, that ...

Rohansi · 2025-07-15T22:03:25 1752617005

The M1 Max's GPU can only make use of about 90GB/s out of the 400GB/s they advertise/support. If the AMD chip can make better use of its 200GB/s then, as you say, it will manage to have better LLM tokens per second. You can't just look at what has the wider/faster memory bus.

https://www.anandtech.com/show/17024/apple-m1-max-performanc...

hamandcheese · 2025-07-16T01:10:54 1752628254

This mainly shows that you need to watch out when it comes to unified architectures. The sticker bandwidth might not be what you can get for GPU-only workloads. Fair point. Duly noted.

But my overarching point still stands: LLM inference needs memory bandwidth, and 200GB/s is not very much (especially for the higher ram variants).

If the M1 Max is actually 90GBs that just means it's a poor choice for LLM inference.