# Whisper Dictation Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key. ## Features - System tray icon with settings GUI (tkinter) - Configurable hotkey, model, language, audio device - Cross-platform: Windows and Linux builds from a single codebase - Shared config via git (`config.json`, `vocabulary.json`) - Machine-specific settings stored locally (audio device, GPU settings, model) - Configurable shared paths for vocabulary and model cache (useful for dual-boot setups) ## Requirements ### Windows - Python 3.13 - NVIDIA GPU with CUDA 12 drivers - [PortAudio](http://www.portaudio.com/) (bundled with most Python sounddevice wheels) - `pywin32` (for system tray and keyboard injection) - `pyinstaller` (for building a standalone executable) ### Linux **System packages (install via package manager):** Arch/CachyOS: ```bash sudo pacman -S tk libayatana-appindicator wl-clipboard xdotool ``` Debian/Ubuntu: ```bash sudo apt install python3-tk libayatana-appindicator3-1 wl-clipboard xdotool ``` | Package | Purpose | |---------|---------| | `tk` | tkinter GUI (settings, log, vocabulary windows) | | `libayatana-appindicator` | System tray icon (required for KDE/GNOME on Wayland) | | `wl-clipboard` | Text injection on Wayland (`wl-copy`) | | `xdotool` | Simulates Ctrl+V paste on Wayland, text typing on X11 | **Optional (for GPU acceleration):** Arch/CachyOS: ```bash sudo pacman -S nvidia cuda ``` Without CUDA, the app runs on CPU. Use `int8` compute type and a smaller model (`small` or `base`) for acceptable speed on CPU. **Python:** - Python 3.10+ - PortAudio (bundled with `sounddevice` wheels) ## Installation ### Windows ```bat install.bat ``` This creates a `.venv-windows` virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper. ### Linux ```bash chmod +x install.sh start.sh build-linux.sh ./install.sh ``` Creates a `.venv-linux` virtual environment with all dependencies and PyInstaller. ## Usage ### Windows ```bat start.bat ``` ### Linux ```bash ./start.sh ``` The app starts in the system tray. Hold the hotkey (default: `Ctrl+Shift+Space`) to record, release to transcribe and type into the active window. ## Build Builds are platform-specific and output to separate directories: - Windows: `dist/whisper-dictation-windows/` - Linux: `dist/whisper-dictation-linux/` ### Windows ```bat .venv-windows\Scripts\python.exe build.py ``` ### Linux ```bash ./build-linux.sh ``` Both use PyInstaller to bundle the app into a standalone folder. The resulting executable can be run without a Python installation. ## Configuration ### Shared config (`config.json`, in app directory) | Key | Default | Description | |-----|---------|-------------| | `hotkey` | `ctrl+shift+space` | Recording trigger | | `language` | `de` | Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto) | | `sample_rate` | `16000` | Audio sample rate in Hz | | `vocab_path` | `""` | Path to vocabulary file (empty = local `vocabulary.json`) | | `model_dir` | `""` | Path to shared model cache directory (empty = default HuggingFace cache) | ### Local config (`config_local.json`, per machine) Stored outside the app directory to keep machine-specific settings separate: - **Windows:** `%LOCALAPPDATA%\WhisperDictation\config_local.json` - **Linux:** `~/.local/share/WhisperDictation/config_local.json` | Key | Default | Description | |-----|---------|-------------| | `model` | `medium` | Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`) | | `device` | `cuda` | Inference device (`cuda` or `cpu`) | | `compute_type` | `float16` | Precision (`float16` for GPU, `int8` for CPU, `float32`) | | `audio_device` | `null` | Microphone (null = system default) | ### Sharing data between Windows and Linux On a shared drive (e.g. Ventoy USB), both builds can use the same vocabulary and model files. Set `vocab_path` and `model_dir` in the Settings UI to point to a common directory: ``` shared_data/ vocabulary.json <- shared vocabulary models/ <- shared Whisper model cache ``` Audio settings, model selection, and compute type remain per-platform in `config_local.json`. ## Vocabulary Custom vocabulary/replacements can be edited via the Settings UI or directly in `vocabulary.json`. Words are passed as initial prompts to improve recognition of domain-specific terms. Replacements are applied as find/replace after transcription. ## Model Download On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for `medium`). Subsequent starts use the cached model. Set `model_dir` to share the cache between builds.