This is not strictly speech-to-speech, but I quite like it when working with Cla...

skrebbel · 2026-01-24T16:41:53 1769272913

Wow Handy works impressively well! Excellent UX too (on Windows at least).

raajg · 2026-01-30T20:59:10 1769806750

I've been dabbling with STT quite a bit and built my own tool using Deepgram. But just tried Handy and it's SO FREAKING FAST! Love it.

d4rkp4ttern · 2026-02-13T15:55:17 1770998117

Hex is my new favorite STT on MacOS. Also uses Parakeet V3. I didn't think it could possibly be faster than Handy, but it is much faster - even long ramblings transcribed within a second. It's MacOS only, leverages the CoreML / Apple Neural Engine.

https://github.com/kitlangton/Hex

Also the transcriptions with hex don't seem to suffer from some of the issues with Handy, such as stutter.

srinifromsalem · 2026-02-12T22:39:46 1770935986

For local speech-to-text, Whisper remains the gold standard - you can run it locally with good accuracy across languages. For speech-to-speech, you'd typically chain Whisper with a local TTS model like Coqui TTS or use something like Tortoise TTS for higher quality but slower processing. The key is balancing accuracy, speed, and resource usage based on your specific use case. If you're doing content creation workflows, consider what post-processing you might need - sometimes the raw transcription needs structure and enhancement beyond just accurate words.

mumbleflow · 2026-02-13T23:41:55 1771026115

+1 on the post-processing point. Raw Whisper output is ~90% there but punctuation, grammar, and formatting are the missing piece.

I built MumbleFlow to address exactly this — whisper.cpp for STT plus llama.cpp for smart text cleanup, all running on-device. Metal/CUDA accelerated, sub-second latency on Apple Silicon. Global hotkey works in any app.

$5 one-time, no cloud, no subscription. https://mumble.helix-co.com

d4rkp4ttern · 2026-01-30T21:12:30 1769807550

Yes especially with Parakeet V3. It’s also nicely hackable, I Clauded a couple PRs to improve the experience, like removing stutters and filler words.

freakynit · 2026-01-29T12:57:11 1769691431

A 25MB TTS model: https://github.com/kittenml/kittentts

d4rkp4ttern · 2026-01-29T13:03:12 1769691792

Nice, I’ll have to try it out. They should really make a uv-installable CLI tool like pocket-TTS did. People underestimate just how much more immediately usable something becomes when you can simply get something by doing “uv tool install …”

freakynit · 2026-01-29T13:45:23 1769694323

True that. People, especially developers, underestimate the importance of packaging. Or, in general, making it easier for others to use your product.

d4rkp4ttern · 2026-01-30T21:11:20 1769807480

So I benchmarked it and there’s really no advantage over pocket TTS. There are some tradeoffs like Kitten doesn’t have streaming audio.

indigodaddy · 2026-01-24T16:20:10 1769271610

Hi, so I'm looking for an stt that can happen on a server/cron, that will use a small local model (I have 4 vCPU threadripper CPU only and 20G ram on the server) and be able to transcribe from remote audio URLs (preferably, but I know that local models probably don't have this feature so will have to do something like curl the audio down to memory or /tmp and then transcribe and then remove the file etc).

Have any thoughts?

d4rkp4ttern · 2026-01-24T22:59:50 1769295590

I’ve no thoughts on that unfortunately.

indigodaddy · 2026-01-24T23:41:20 1769298080

3dsnano · 2026-01-24T14:38:15 1769265495

posts like this are why i visit HN daily!!!

thanks for sharing your knowledge; can’t wait to try out your voice plugin

d4rkp4ttern · 2026-01-24T23:18:48 1769296728

Same!

Feel free to file a gh issue if you have problems with the voice plugin