> Why does the LLM have to do everything these days?
Because why not? It feels like we've stumbled on a first actually general ML model, and we're testing the limits of this approach - throwing more and more complex tasks at them. And so far, surprisingly, LLMs seem to manage. The more they do, the more interesting it is to see how far they can be pushed until they finally break.
> Wouldn't it make more sense to delegate such a pattern finding intelligence?
Maybe. But then, if an LLM can do that specific thing comparably well, and it can do a hundred other similarly specific tasks with acceptable results, then there's also a whole space between those tasks, where they can be combined and blended, which none of the specialized models can access. Call it "synergy", "cross-pollination", "being a generalist".
EDIT:
As a random example: speech recognition models were really bad until recently[0], because it turns out that having an actual understanding of the language is extremely helpful for recognizing speech correctly. That's why LLM (or the multi-modal variant, or some future better general-purpose model) has to do everything - because seemingly separate skills reinforce each other.
--
[0] - Direct and powerful evidence: compare your experience with voice assistants and dictation keyboards, vs. the conversation mode in ChatGPT app. The latter can easily handle casual speech with weird accents delivered outside on a windy day, near busy street, with near-100% accuracy. It's a really spectacular jump in capabilities.
> Because why not? It feels like we've stumbled on a first actually general ML model, and we're testing the limits of this approach - throwing more and more complex tasks at them. And so far, surprisingly, LLMs seem to manage.
We live in completely different worlds. Every LLM I've tried manages to nothing except spout bullshit. If you job is to create bullshit, an LLM is useful. If your job requires anything approximating correctness, LLMs are useless.
Because why not? It feels like we've stumbled on a first actually general ML model, and we're testing the limits of this approach - throwing more and more complex tasks at them. And so far, surprisingly, LLMs seem to manage. The more they do, the more interesting it is to see how far they can be pushed until they finally break.
> Wouldn't it make more sense to delegate such a pattern finding intelligence?
Maybe. But then, if an LLM can do that specific thing comparably well, and it can do a hundred other similarly specific tasks with acceptable results, then there's also a whole space between those tasks, where they can be combined and blended, which none of the specialized models can access. Call it "synergy", "cross-pollination", "being a generalist".
EDIT:
As a random example: speech recognition models were really bad until recently[0], because it turns out that having an actual understanding of the language is extremely helpful for recognizing speech correctly. That's why LLM (or the multi-modal variant, or some future better general-purpose model) has to do everything - because seemingly separate skills reinforce each other.
--
[0] - Direct and powerful evidence: compare your experience with voice assistants and dictation keyboards, vs. the conversation mode in ChatGPT app. The latter can easily handle casual speech with weird accents delivered outside on a windy day, near busy street, with near-100% accuracy. It's a really spectacular jump in capabilities.