3 Seconds of Audio → Perfect Voice Clone — Free Desktop App, No Subscription

Download: github.com/jamiepine/voicebox
| Platform | GPU Requirement | Speed |
| macOS (M1/M2/M3/M4) | None — native Metal acceleration via MLX | Near real-time, 4-5x faster |
| Windows | NVIDIA GPU (CUDA) | Fast with decent GPU |
| Linux | Coming soon | Blocked by build infra |
Step 1 — Download the installer from the GitHub releases page.
Step 2 — Launch → it auto-downloads the Qwen3-TTS model on first run.
Step 3 — Record or upload a voice sample (3+ seconds).
Step 4 — Type your text → hit generate → done.
Mac users win here — Apple Silicon gets native Neural Engine acceleration. Generation is near real-time.