If you want to double the memory and double the total memory bandwidth, sure. That'd need twice as many data lines, or the same lines at twice the speed.
But if you just want to double the memory without increasing the total memory bandwidth, isn't it a good deal simpler? What's 1 more bit on the address bus for a 256 bit bus?
The GPU already has DMA to system RAM. If you're going to make the VRAM as slow as system RAM, then a UMA makes more sense than throwing more memory chips on the GPU.
Good point. I misunderstood the situation. I figured doubling the VRAM size at the same bus width would halve the bandwidth.
Instead, it appears entirely possible to double VRAM size (starting from current amounts) while keeping the bus width and bandwidth the same (cf. 4060 Ti 8GB vs. 4060 Ti 16GB). And, since that bandwidth is already much higher than system RAM (e.g. 128-bit GDDR6 at 288 GB/s vs DDR5 at 32-64 GB/s), it seems very useful to do so, though I'd imagine games wouldn't benefit as much as compute would.
But having the VRAM allows you to run the model on the GPU at all, doesn't it? A card with 48GB can run twice as much model than a card with 24GB, even though it takes twice as long. Nobody is expecting to run twice as much model in the same time just by increasing the VRAM.
Without the extra VRAM, it takes hundreds of times divided by batch size longer due to swapping, or tens of times longer consistently if you run the rest of the model on the CPU.