The M1 Max's GPU can only make use of about 90GB/s out of the 400GB/s they advertise/support. If the AMD chip can make better use of its 200GB/s then, as you say, it will manage to have better LLM tokens per second. You can't just look at what has the wider/faster memory bus.
This mainly shows that you need to watch out when it comes to unified architectures. The sticker bandwidth might not be what you can get for GPU-only workloads. Fair point. Duly noted.
But my overarching point still stands: LLM inference needs memory bandwidth, and 200GB/s is not very much (especially for the higher ram variants).
If the M1 Max is actually 90GBs that just means it's a poor choice for LLM inference.
At 200GB/s, that upper limit is not very high at all. So it doesn't really matter if the compute is there or not.