Go to file

Christian Kauer a86f44c342 upd		2026-03-26 20:33:18 +01:00
.claude	upd	2026-03-26 20:33:18 +01:00
docs/superpowers	dev app version for win and linux	2026-03-22 08:31:05 +01:00
shared_data	upd	2026-03-26 10:05:14 +01:00
tests	dev app version for win and linux	2026-03-22 08:31:05 +01:00
whisper_app	upd	2026-03-26 20:33:18 +01:00
.gitignore	fix linux version	2026-03-22 11:01:14 +01:00
README.md	fix linux	2026-03-22 13:12:23 +01:00
build-linux.sh	added linux build	2026-03-22 09:20:34 +01:00
build.py	added linux build	2026-03-22 09:20:34 +01:00
config.json	upd dict	2026-03-26 09:16:33 +01:00
icon.png	added linux build	2026-03-22 09:20:34 +01:00
install.bat	dev app version for win and linux	2026-03-22 08:31:05 +01:00
install.sh	fix linux version	2026-03-22 12:57:17 +01:00
main.py	upd	2026-03-26 20:33:18 +01:00
requirements-cuda.txt	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
requirements.txt	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
start.bat	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
start.sh	added linux build	2026-03-22 09:20:34 +01:00
vocabulary.json	upd voc	2026-03-20 10:31:06 +01:00
whisper-dictation.spec	added grammar check	2026-03-23 14:54:17 +01:00

README.md

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

System tray icon with settings GUI (tkinter)
Configurable hotkey, model, language, audio device
Cross-platform: Windows and Linux builds from a single codebase
Shared config via git (config.json, vocabulary.json)
Machine-specific settings stored locally (audio device, GPU settings, model)
Configurable shared paths for vocabulary and model cache (useful for dual-boot setups)

Requirements

Windows

Python 3.13
NVIDIA GPU with CUDA 12 drivers
PortAudio (bundled with most Python sounddevice wheels)
pywin32 (for system tray and keyboard injection)
pyinstaller (for building a standalone executable)

Linux

System packages (install via package manager):

Arch/CachyOS:

sudo pacman -S tk libayatana-appindicator wl-clipboard xdotool

Debian/Ubuntu:

sudo apt install python3-tk libayatana-appindicator3-1 wl-clipboard xdotool

Package	Purpose
`tk`	tkinter GUI (settings, log, vocabulary windows)
`libayatana-appindicator`	System tray icon (required for KDE/GNOME on Wayland)
`wl-clipboard`	Text injection on Wayland (`wl-copy`)
`xdotool`	Simulates Ctrl+V paste on Wayland, text typing on X11

Low-latency hotkey (recommended):

For fast hotkey response via evdev (instead of the slower XWayland fallback), add your user to the input group:

sudo usermod -aG input $USER

Log out and back in for the change to take effect.

Optional (for GPU acceleration):

Arch/CachyOS:

sudo pacman -S nvidia cuda

Note: The system CUDA package may install a newer version (e.g. CUDA 13) than what faster-whisper/ctranslate2 requires (CUDA 12). The CUDA 12 runtime libraries (nvidia-cublas-cu12, nvidia-cudnn-cu12) are installed via pip in the virtual environment and bundled into the PyInstaller build, so the system CUDA version does not matter for the app itself. The system nvidia + cuda packages are only needed for the GPU driver and kernel module.

Without an NVIDIA GPU, the app runs on CPU. Use int8 compute type and a smaller model (small or base) for acceptable speed on CPU.

Python:

Python 3.10+
PortAudio (bundled with sounddevice wheels)

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh build-linux.sh
./install.sh

Creates a .venv-linux virtual environment with all dependencies and PyInstaller.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Build

Builds are platform-specific and output to separate directories:

Windows: dist/whisper-dictation-windows/
Linux: dist/whisper-dictation-linux/

Windows

.venv-windows\Scripts\python.exe build.py

Linux

./build-linux.sh

Both use PyInstaller to bundle the app into a standalone folder. The resulting executable can be run without a Python installation.

Configuration

Shared config (`config.json`, in app directory)

Key	Default	Description
`hotkey`	`ctrl+shift+space`	Recording trigger
`language`	`de`	Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto)
`sample_rate`	`16000`	Audio sample rate in Hz
`vocab_path`	`""`	Path to vocabulary file (empty = local `vocabulary.json`)
`model_dir`	`""`	Path to shared model cache directory (empty = default HuggingFace cache)

Local config (`config_local.json`, per machine)

Stored outside the app directory to keep machine-specific settings separate:

Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
Linux: ~/.local/share/WhisperDictation/config_local.json

Key	Default	Description
`model`	`medium`	Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`)
`device`	`cuda`	Inference device (`cuda` or `cpu`)
`compute_type`	`float16`	Precision (`float16` for GPU, `int8` for CPU, `float32`)
`audio_device`	`null`	Microphone (null = system default)

On a shared drive (e.g. Ventoy USB), both builds can use the same vocabulary and model files. Set vocab_path and model_dir in the Settings UI to point to a common directory:

shared_data/
  vocabulary.json    <- shared vocabulary
  models/            <- shared Whisper model cache

Audio settings, model selection, and compute type remain per-platform in config_local.json.

Vocabulary

Custom vocabulary/replacements can be edited via the Settings UI or directly in vocabulary.json. Words are passed as initial prompts to improve recognition of domain-specific terms. Replacements are applied as find/replace after transcription.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model. Set model_dir to share the cache between builds.

README.md

Whisper Dictation

Features

Requirements

Windows

Linux

Installation

Windows

Linux

Usage

Windows

Linux

Build

Windows

Linux

Configuration

Shared config (config.json, in app directory)

Local config (config_local.json, per machine)

Sharing data between Windows and Linux

Vocabulary

Model Download

Shared config (`config.json`, in app directory)

Local config (`config_local.json`, per machine)