How will an LLM "take greater care with typography" if it can't see the page it is creating? How will it "improve" leading if you need a human to see that there's too much distance between lines or too little?
Because humans have already annotated diagrams and examples of what ‘too much’ and ‘too little’ look like, and these have been incorporated into the model. It tries to reproduce the content that is associated with humans indicating that they are taking greater care, and that content has the ‘not too much / not too little’ judgement already baked into it.