|
|
||
|---|---|---|
| .claude | ||
| docs/superpowers | ||
| shared_data | ||
| tests | ||
| whisper_app | ||
| .gitignore | ||
| README.md | ||
| build-linux.sh | ||
| build.py | ||
| config.json | ||
| icon.png | ||
| install.bat | ||
| install.sh | ||
| main.py | ||
| requirements-cuda.txt | ||
| requirements.txt | ||
| start.bat | ||
| start.sh | ||
| vocabulary.json | ||
| whisper-dictation.spec | ||
README.md
Whisper Dictation
Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.
Features
- System tray icon with settings GUI (tkinter)
- Configurable hotkey, model, language, audio device
- Cross-platform: Windows and Linux builds from a single codebase
- Shared config via git (
config.json,vocabulary.json) - Machine-specific settings stored locally (audio device, GPU settings, model)
- Configurable shared paths for vocabulary and model cache (useful for dual-boot setups)
Requirements
Windows
- Python 3.13
- NVIDIA GPU with CUDA 12 drivers
- PortAudio (bundled with most Python sounddevice wheels)
pywin32(for system tray and keyboard injection)pyinstaller(for building a standalone executable)
Linux
System packages (install via package manager):
Arch/CachyOS:
sudo pacman -S tk libayatana-appindicator wl-clipboard xdotool
Debian/Ubuntu:
sudo apt install python3-tk libayatana-appindicator3-1 wl-clipboard xdotool
| Package | Purpose |
|---|---|
tk |
tkinter GUI (settings, log, vocabulary windows) |
libayatana-appindicator |
System tray icon (required for KDE/GNOME on Wayland) |
wl-clipboard |
Text injection on Wayland (wl-copy) |
xdotool |
Simulates Ctrl+V paste on Wayland, text typing on X11 |
Optional (for GPU acceleration):
Arch/CachyOS:
sudo pacman -S nvidia cuda
Without CUDA, the app runs on CPU. Use int8 compute type and a smaller model (small or base) for acceptable speed on CPU.
Python:
- Python 3.10+
- PortAudio (bundled with
sounddevicewheels)
Installation
Windows
install.bat
This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.
Linux
chmod +x install.sh start.sh build-linux.sh
./install.sh
Creates a .venv-linux virtual environment with all dependencies and PyInstaller.
Usage
Windows
start.bat
Linux
./start.sh
The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.
Build
Builds are platform-specific and output to separate directories:
- Windows:
dist/whisper-dictation-windows/ - Linux:
dist/whisper-dictation-linux/
Windows
.venv-windows\Scripts\python.exe build.py
Linux
./build-linux.sh
Both use PyInstaller to bundle the app into a standalone folder. The resulting executable can be run without a Python installation.
Configuration
Shared config (config.json, in app directory)
| Key | Default | Description |
|---|---|---|
hotkey |
ctrl+shift+space |
Recording trigger |
language |
de |
Transcription language (de, en, fr, es, it, null = auto) |
sample_rate |
16000 |
Audio sample rate in Hz |
vocab_path |
"" |
Path to vocabulary file (empty = local vocabulary.json) |
model_dir |
"" |
Path to shared model cache directory (empty = default HuggingFace cache) |
Local config (config_local.json, per machine)
Stored outside the app directory to keep machine-specific settings separate:
- Windows:
%LOCALAPPDATA%\WhisperDictation\config_local.json - Linux:
~/.local/share/WhisperDictation/config_local.json
| Key | Default | Description |
|---|---|---|
model |
medium |
Whisper model size (tiny, base, small, medium, large-v2, large-v3) |
device |
cuda |
Inference device (cuda or cpu) |
compute_type |
float16 |
Precision (float16 for GPU, int8 for CPU, float32) |
audio_device |
null |
Microphone (null = system default) |
Sharing data between Windows and Linux
On a shared drive (e.g. Ventoy USB), both builds can use the same vocabulary and model files. Set vocab_path and model_dir in the Settings UI to point to a common directory:
shared_data/
vocabulary.json <- shared vocabulary
models/ <- shared Whisper model cache
Audio settings, model selection, and compute type remain per-platform in config_local.json.
Vocabulary
Custom vocabulary/replacements can be edited via the Settings UI or directly in vocabulary.json. Words are passed as initial prompts to improve recognition of domain-specific terms. Replacements are applied as find/replace after transcription.
Model Download
On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model. Set model_dir to share the cache between builds.