Go to file
Christian Kauer a86f44c342 upd 2026-03-26 20:33:18 +01:00
.claude upd 2026-03-26 20:33:18 +01:00
docs/superpowers dev app version for win and linux 2026-03-22 08:31:05 +01:00
shared_data upd 2026-03-26 10:05:14 +01:00
tests dev app version for win and linux 2026-03-22 08:31:05 +01:00
whisper_app upd 2026-03-26 20:33:18 +01:00
.gitignore fix linux version 2026-03-22 11:01:14 +01:00
README.md fix linux 2026-03-22 13:12:23 +01:00
build-linux.sh added linux build 2026-03-22 09:20:34 +01:00
build.py added linux build 2026-03-22 09:20:34 +01:00
config.json upd dict 2026-03-26 09:16:33 +01:00
icon.png added linux build 2026-03-22 09:20:34 +01:00
install.bat dev app version for win and linux 2026-03-22 08:31:05 +01:00
install.sh fix linux version 2026-03-22 12:57:17 +01:00
main.py upd 2026-03-26 20:33:18 +01:00
requirements-cuda.txt feat: add Linux support, replace keyboard lib with pynput 2026-03-19 19:13:26 +01:00
requirements.txt feat: add Linux support, replace keyboard lib with pynput 2026-03-19 19:13:26 +01:00
start.bat feat: convert dictate.py to modular GUI app with PyInstaller build 2026-03-20 13:21:01 +01:00
start.sh added linux build 2026-03-22 09:20:34 +01:00
vocabulary.json upd voc 2026-03-20 10:31:06 +01:00
whisper-dictation.spec added grammar check 2026-03-23 14:54:17 +01:00

README.md

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

  • System tray icon with settings GUI (tkinter)
  • Configurable hotkey, model, language, audio device
  • Cross-platform: Windows and Linux builds from a single codebase
  • Shared config via git (config.json, vocabulary.json)
  • Machine-specific settings stored locally (audio device, GPU settings, model)
  • Configurable shared paths for vocabulary and model cache (useful for dual-boot setups)

Requirements

Windows

  • Python 3.13
  • NVIDIA GPU with CUDA 12 drivers
  • PortAudio (bundled with most Python sounddevice wheels)
  • pywin32 (for system tray and keyboard injection)
  • pyinstaller (for building a standalone executable)

Linux

System packages (install via package manager):

Arch/CachyOS:

sudo pacman -S tk libayatana-appindicator wl-clipboard xdotool

Debian/Ubuntu:

sudo apt install python3-tk libayatana-appindicator3-1 wl-clipboard xdotool
Package Purpose
tk tkinter GUI (settings, log, vocabulary windows)
libayatana-appindicator System tray icon (required for KDE/GNOME on Wayland)
wl-clipboard Text injection on Wayland (wl-copy)
xdotool Simulates Ctrl+V paste on Wayland, text typing on X11

Low-latency hotkey (recommended):

For fast hotkey response via evdev (instead of the slower XWayland fallback), add your user to the input group:

sudo usermod -aG input $USER

Log out and back in for the change to take effect.

Optional (for GPU acceleration):

Arch/CachyOS:

sudo pacman -S nvidia cuda

Note: The system CUDA package may install a newer version (e.g. CUDA 13) than what faster-whisper/ctranslate2 requires (CUDA 12). The CUDA 12 runtime libraries (nvidia-cublas-cu12, nvidia-cudnn-cu12) are installed via pip in the virtual environment and bundled into the PyInstaller build, so the system CUDA version does not matter for the app itself. The system nvidia + cuda packages are only needed for the GPU driver and kernel module.

Without an NVIDIA GPU, the app runs on CPU. Use int8 compute type and a smaller model (small or base) for acceptable speed on CPU.

Python:

  • Python 3.10+
  • PortAudio (bundled with sounddevice wheels)

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh build-linux.sh
./install.sh

Creates a .venv-linux virtual environment with all dependencies and PyInstaller.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Build

Builds are platform-specific and output to separate directories:

  • Windows: dist/whisper-dictation-windows/
  • Linux: dist/whisper-dictation-linux/

Windows

.venv-windows\Scripts\python.exe build.py

Linux

./build-linux.sh

Both use PyInstaller to bundle the app into a standalone folder. The resulting executable can be run without a Python installation.

Configuration

Shared config (config.json, in app directory)

Key Default Description
hotkey ctrl+shift+space Recording trigger
language de Transcription language (de, en, fr, es, it, null = auto)
sample_rate 16000 Audio sample rate in Hz
vocab_path "" Path to vocabulary file (empty = local vocabulary.json)
model_dir "" Path to shared model cache directory (empty = default HuggingFace cache)

Local config (config_local.json, per machine)

Stored outside the app directory to keep machine-specific settings separate:

  • Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
  • Linux: ~/.local/share/WhisperDictation/config_local.json
Key Default Description
model medium Whisper model size (tiny, base, small, medium, large-v2, large-v3)
device cuda Inference device (cuda or cpu)
compute_type float16 Precision (float16 for GPU, int8 for CPU, float32)
audio_device null Microphone (null = system default)

Sharing data between Windows and Linux

On a shared drive (e.g. Ventoy USB), both builds can use the same vocabulary and model files. Set vocab_path and model_dir in the Settings UI to point to a common directory:

shared_data/
  vocabulary.json    <- shared vocabulary
  models/            <- shared Whisper model cache

Audio settings, model selection, and compute type remain per-platform in config_local.json.

Vocabulary

Custom vocabulary/replacements can be edited via the Settings UI or directly in vocabulary.json. Words are passed as initial prompts to improve recognition of domain-specific terms. Replacements are applied as find/replace after transcription.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model. Set model_dir to share the cache between builds.