Go to file

beo3000 4cf03a6c8c upd claude perms		2026-03-20 15:32:42 +01:00
.claude	upd claude perms	2026-03-20 15:32:42 +01:00
docs/superpowers	docs: add GUI app implementation plan	2026-03-20 11:23:12 +01:00
tests	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
whisper_app	fix: audio device handling, CUDA/VAD bundling, and transcription errors	2026-03-20 15:28:50 +01:00
.gitignore	fix: audio device handling, CUDA/VAD bundling, and transcription errors	2026-03-20 15:28:50 +01:00
README.md	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
build.py	fix: audio device handling, CUDA/VAD bundling, and transcription errors	2026-03-20 15:28:50 +01:00
config.json	chore: save local changes before pull	2026-03-19 20:20:54 +01:00
install.bat	chore: save local changes before pull	2026-03-19 20:20:54 +01:00
install.sh	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
main.py	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
requirements-cuda.txt	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
requirements.txt	feat: add Linux support, replace keyboard lib with pynput	2026-03-19 19:13:26 +01:00
start.bat	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
start.sh	feat: convert dictate.py to modular GUI app with PyInstaller build	2026-03-20 13:21:01 +01:00
vocabulary.json	upd voc	2026-03-20 10:31:06 +01:00
whisper-dictation.spec	fix: audio device handling, CUDA/VAD bundling, and transcription errors	2026-03-20 15:28:50 +01:00

README.md

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

System tray icon with settings GUI (tkinter)
Configurable hotkey, model, language, audio device
Shared config via git (config.json, vocabulary.json)
Machine-specific settings stored locally (audio device, GPU settings)
Windows: GPU acceleration via CUDA; Linux: CPU

Requirements

Windows

Python 3.13
NVIDIA GPU with CUDA 12 drivers
PortAudio (bundled with most Python sounddevice wheels)
pywin32 (for system tray and keyboard injection)
pyinstaller (for building a standalone executable)

Linux

Python 3.10+
PortAudio: sudo apt install portaudio19-dev

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh
./install.sh

Creates a .venv-linux virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Build

To produce a standalone Windows executable:

.venv-windows\Scripts\python.exe build.py

This uses PyInstaller to bundle the app and all dependencies into a single folder under dist/. The resulting executable can be run without a Python installation.

Configuration

config.json (shared, stored in the repo):

Key	Default	Description
`hotkey`	`ctrl+shift+space`	Recording trigger
`model`	`medium`	Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`)
`language`	`de`	Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto)
`sample_rate`	`16000`	Audio sample rate in Hz

Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:

Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
Linux: ~/.local/share/WhisperDictation/config_local.json

Vocabulary

Custom vocabulary/replacements can be added to vocabulary.json. These are passed as initial prompts to improve recognition of domain-specific terms.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model.