What does Microsoft get out of this? They already have TTS and deep learning transcription, what technical capabilities does Nuance have that they don't have already (or can't develop for substantially less than $20B?)
Probably a crapton of patents for voice recognition.
Also, if you cannot operate a keyboard and must communicate by speech to operate a computer, it's pretty much Dragon NaturallySpeaking or GTFO. Integrating NaturallySpeaking tech into Windows would be a huge boon and further cement Windows as the os to have if you have disabilities.
I have users who have intentionally switched their speech engine from the latest version of Dragon to Talon, for both dictation and commands. Talon is cross platform and directly targets accessibility use cases (far more than just speech input).
I'm specifically talking about the new Conformer model, available in early access as of ten days ago. What you tried was likely the previous (circa 2018) model, which is much less accurate than Conformer.
And what do you suggest is better? I've worked with nearly every tool (open source and closed) under the sun in medical, industrial, and personal settings and Dragon NaturallySpeaking/Professional was by far the best in terms of accuracy regardless of prosody, accent, background noise, technical terms used, etc.
Personally I think they should've been acquired a decade ago.
That answer depends on the language and on your use case. It seems like you're asking about desktop apps, but my parent was not talking in that context. Indeed there's not a lot of choice there because there's no money in it.
I'm even talking vs custom trained models with Kaldi (was working on a startup that was trying to create lessons for public speaking so we could grab enough data to tackle accent remediation/help those with aphasic speech disorders) and again just reiterating, the out of the box performance of Nuance's products are just better than anything else.
Obviously Nuance is more than just speech recognition, but still not sure why people are downplaying how good they were at it.
EDIT: or maybe it's just too prohibitively expensive for people outside of medical/legal fields to know about? And don't get me wrong, I love that things like Talon Voice are widely available for hands free coding, I just hope this means NaturallySpeaking will supplant Windows Dictation.
If you have the data and a specific domain you can focus on then building a custom model [with kaldi] should always win. That's what I've done in the past (beating google, nuance etc.). You most likely didn't have the data and/or didn't know kaldi well.
> Obviously Nuance is more than just speech recognition, but still not sure why people are downplaying how good they were at it.
Because nuance wasn't very good.. at least in all the benchmarks I've seen. It's been a while since I compared numbers it's possible they've improved a lot. They're also known for kinda being dicks with the contracts they offer in B2B.