I would love for my phone keyboard (Swiftkey) to use a locally-running Voxtral for speech-to-text (bonus points if it can use the NPU of the Snapdragon SoC).
The voice recognition capabilities of Google Speech Services, which is what the mic button hooks into, suck. Meanwhile, Voxtral (and Whisper) understand what I'm trying to say far better, they automatically "edit out" any stuttering or stammering that I might have, and they properly capitalize and include punctuation. And they handle being bilingual exceedingly well, including, for example, using English words in the middle of French sentences.
But it has some downsides. First, I have to manually switch to that different keyboard; thankfully my Samsung phone offers an easy switch shortcut any time a keyboard is on screen, so it only requires 3 taps... and thankfully it's smart enough to send me back to Swiftkey once it's done. Second, only 30 seconds... sometimes I ramble on for longer. Third, the way it's designed kind of sucks: you either have to hold a button (even though the point of speech-to-text is that I don't have to hold anything down) or let automatic detection end the recording and start processing, in which case it often cuts me off if I take more than 1 second thinking about my next words.
This is arguably one of the biggest use cases of modern AI technology and the least controversial one; phones have the hardware necessary to do it all locally, too! And yet... I couldn't find a better offering than this.
(Bonus points for anyone working on speech-to-text: give me a quick shortcut to add the string "[(microphone emoji)]" in my messages just to let the other party know that this was transcribed, so that they know to overlook possible mistakes.)
The voice recognition capabilities of Google Speech Services, which is what the mic button hooks into, suck. Meanwhile, Voxtral (and Whisper) understand what I'm trying to say far better, they automatically "edit out" any stuttering or stammering that I might have, and they properly capitalize and include punctuation. And they handle being bilingual exceedingly well, including, for example, using English words in the middle of French sentences.
The best solution I could find so far is this F-Droid app that uses Whisper : https://f-droid.org/en/packages/org.woheller69.whisperplus/
But it has some downsides. First, I have to manually switch to that different keyboard; thankfully my Samsung phone offers an easy switch shortcut any time a keyboard is on screen, so it only requires 3 taps... and thankfully it's smart enough to send me back to Swiftkey once it's done. Second, only 30 seconds... sometimes I ramble on for longer. Third, the way it's designed kind of sucks: you either have to hold a button (even though the point of speech-to-text is that I don't have to hold anything down) or let automatic detection end the recording and start processing, in which case it often cuts me off if I take more than 1 second thinking about my next words.
This is arguably one of the biggest use cases of modern AI technology and the least controversial one; phones have the hardware necessary to do it all locally, too! And yet... I couldn't find a better offering than this.
(Bonus points for anyone working on speech-to-text: give me a quick shortcut to add the string "[(microphone emoji)]" in my messages just to let the other party know that this was transcribed, so that they know to overlook possible mistakes.)