Go to file

Christian Kauer 5aaf8b59ce fix linux version		2026-03-22 11:01:14 +01:00
.claude	fix linux version	2026-03-22 11:01:14 +01:00
docs/superpowers	dev app version for win and linux	2026-03-22 08:31:05 +01:00
shared_data	fix linux version	2026-03-22 11:01:14 +01:00
tests	dev app version for win and linux	2026-03-22 08:31:05 +01:00
whisper_app	fix linux version	2026-03-22 11:01:14 +01:00
.gitignore	fix linux version	2026-03-22 11:01:14 +01:00
README.md	fix linux version	2026-03-22 11:01:14 +01:00
build-linux.sh	added linux build	2026-03-22 09:20:34 +01:00
build.py	added linux build	2026-03-22 09:20:34 +01:00
config.json	fix linux version	2026-03-22 11:01:14 +01:00
icon.png	added linux build	2026-03-22 09:20:34 +01:00
install.bat	dev app version for win and linux	2026-03-22 08:31:05 +01:00
install.sh	added linux build	2026-03-22 09:20:34 +01:00
main.py	fix linux version	2026-03-22 11:01:14 +01:00
requirements-cuda.txt	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
requirements.txt	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
start.bat	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
start.sh	added linux build	2026-03-22 09:20:34 +01:00
vocabulary.json	upd voc	2026-03-20 10:31:06 +01:00
whisper-dictation.spec	added linux build	2026-03-22 09:20:34 +01:00

README.md

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

System tray icon with settings GUI (tkinter)
Configurable hotkey, model, language, audio device
Cross-platform: Windows and Linux builds from a single codebase
Shared config via git (config.json, vocabulary.json)
Machine-specific settings stored locally (audio device, GPU settings, model)
Configurable shared paths for vocabulary and model cache (useful for dual-boot setups)

Requirements

Windows

Python 3.13
NVIDIA GPU with CUDA 12 drivers
PortAudio (bundled with most Python sounddevice wheels)
pywin32 (for system tray and keyboard injection)
pyinstaller (for building a standalone executable)

Linux

System packages (install via package manager):

Arch/CachyOS:

sudo pacman -S tk libayatana-appindicator wl-clipboard xdotool

Debian/Ubuntu:

sudo apt install python3-tk libayatana-appindicator3-1 wl-clipboard xdotool

Package	Purpose
`tk`	tkinter GUI (settings, log, vocabulary windows)
`libayatana-appindicator`	System tray icon (required for KDE/GNOME on Wayland)
`wl-clipboard`	Text injection on Wayland (`wl-copy`)
`xdotool`	Simulates Ctrl+V paste on Wayland, text typing on X11

Optional (for GPU acceleration):

Arch/CachyOS:

sudo pacman -S nvidia cuda

Without CUDA, the app runs on CPU. Use int8 compute type and a smaller model (small or base) for acceptable speed on CPU.

Python:

Python 3.10+
PortAudio (bundled with sounddevice wheels)

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh build-linux.sh
./install.sh

Creates a .venv-linux virtual environment with all dependencies and PyInstaller.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Build

Builds are platform-specific and output to separate directories:

Windows: dist/whisper-dictation-windows/
Linux: dist/whisper-dictation-linux/

Windows

.venv-windows\Scripts\python.exe build.py

Linux

./build-linux.sh

Both use PyInstaller to bundle the app into a standalone folder. The resulting executable can be run without a Python installation.

Configuration

Shared config (`config.json`, in app directory)

Key	Default	Description
`hotkey`	`ctrl+shift+space`	Recording trigger
`language`	`de`	Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto)
`sample_rate`	`16000`	Audio sample rate in Hz
`vocab_path`	`""`	Path to vocabulary file (empty = local `vocabulary.json`)
`model_dir`	`""`	Path to shared model cache directory (empty = default HuggingFace cache)

Local config (`config_local.json`, per machine)

Stored outside the app directory to keep machine-specific settings separate:

Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
Linux: ~/.local/share/WhisperDictation/config_local.json

Key	Default	Description
`model`	`medium`	Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`)
`device`	`cuda`	Inference device (`cuda` or `cpu`)
`compute_type`	`float16`	Precision (`float16` for GPU, `int8` for CPU, `float32`)
`audio_device`	`null`	Microphone (null = system default)

On a shared drive (e.g. Ventoy USB), both builds can use the same vocabulary and model files. Set vocab_path and model_dir in the Settings UI to point to a common directory:

shared_data/
  vocabulary.json    <- shared vocabulary
  models/            <- shared Whisper model cache

Audio settings, model selection, and compute type remain per-platform in config_local.json.

Vocabulary

Custom vocabulary/replacements can be edited via the Settings UI or directly in vocabulary.json. Words are passed as initial prompts to improve recognition of domain-specific terms. Replacements are applied as find/replace after transcription.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model. Set model_dir to share the cache between builds.

README.md

Whisper Dictation

Features

Requirements

Windows

Linux

Installation

Windows

Linux

Usage

Windows

Linux

Build

Windows

Linux

Configuration

Shared config (config.json, in app directory)

Local config (config_local.json, per machine)

Sharing data between Windows and Linux

Vocabulary

Model Download

Shared config (`config.json`, in app directory)

Local config (`config_local.json`, per machine)