One good reason (at least in the olden days, where lex and yacc come from) is efficiency. I wrote a compiler for a made-up programming language for a very slow CPU, with a recursive-descent parser, and it spends almost all of its time re-examining characters that have already been examined, checking if they match a new rule, then giving up, backtracking, and trying the exact same characters again in the next rule.
If it was looking at tokens instead of characters this would go a lot faster, because examining characters is O(n) in the number of characters.
If it was looking at tokens instead of characters this would go a lot faster, because examining characters is O(n) in the number of characters.