2.3 KiB
Whisper Dictation
Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.
Features
- System tray icon with settings GUI (tkinter)
- Configurable hotkey, model, language, audio device
- Shared config via git (
config.json,vocabulary.json) - Machine-specific settings stored locally (audio device, GPU settings)
- Windows: GPU acceleration via CUDA; Linux: CPU
Requirements
Windows
- Python 3.13
- NVIDIA GPU with CUDA 12 drivers
- PortAudio (bundled with most Python sounddevice wheels)
Linux
- Python 3.10+
- PortAudio:
sudo apt install portaudio19-dev
Installation
Windows
install.bat
This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.
Linux
chmod +x install.sh start.sh
./install.sh
Creates a .venv-linux virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.
Usage
Windows
start.bat
Linux
./start.sh
The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.
Configuration
config.json (shared, stored in the repo):
| Key | Default | Description |
|---|---|---|
hotkey |
ctrl+shift+space |
Recording trigger |
model |
medium |
Whisper model size (tiny, base, small, medium, large-v2, large-v3) |
language |
de |
Transcription language (de, en, fr, es, it, null = auto) |
sample_rate |
16000 |
Audio sample rate in Hz |
Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:
- Windows:
%LOCALAPPDATA%\WhisperDictation\config_local.json - Linux:
~/.local/share/WhisperDictation/config_local.json
Vocabulary
Custom vocabulary/replacements can be added to vocabulary.json. These are passed as initial prompts to improve recognition of domain-specific terms.
Model Download
On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model.