Go to file
beo3000 4cf03a6c8c upd claude perms 2026-03-20 15:32:42 +01:00
.claude upd claude perms 2026-03-20 15:32:42 +01:00
docs/superpowers docs: add GUI app implementation plan 2026-03-20 11:23:12 +01:00
tests feat: convert dictate.py to modular GUI app with PyInstaller build 2026-03-20 13:21:01 +01:00
whisper_app fix: audio device handling, CUDA/VAD bundling, and transcription errors 2026-03-20 15:28:50 +01:00
.gitignore fix: audio device handling, CUDA/VAD bundling, and transcription errors 2026-03-20 15:28:50 +01:00
README.md feat: convert dictate.py to modular GUI app with PyInstaller build 2026-03-20 13:21:01 +01:00
build.py fix: audio device handling, CUDA/VAD bundling, and transcription errors 2026-03-20 15:28:50 +01:00
config.json chore: save local changes before pull 2026-03-19 20:20:54 +01:00
install.bat chore: save local changes before pull 2026-03-19 20:20:54 +01:00
install.sh feat: add Linux support, replace keyboard lib with pynput 2026-03-19 19:13:26 +01:00
main.py feat: convert dictate.py to modular GUI app with PyInstaller build 2026-03-20 13:21:01 +01:00
requirements-cuda.txt feat: add Linux support, replace keyboard lib with pynput 2026-03-19 19:13:26 +01:00
requirements.txt feat: add Linux support, replace keyboard lib with pynput 2026-03-19 19:13:26 +01:00
start.bat feat: convert dictate.py to modular GUI app with PyInstaller build 2026-03-20 13:21:01 +01:00
start.sh feat: convert dictate.py to modular GUI app with PyInstaller build 2026-03-20 13:21:01 +01:00
vocabulary.json upd voc 2026-03-20 10:31:06 +01:00
whisper-dictation.spec fix: audio device handling, CUDA/VAD bundling, and transcription errors 2026-03-20 15:28:50 +01:00

README.md

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

  • System tray icon with settings GUI (tkinter)
  • Configurable hotkey, model, language, audio device
  • Shared config via git (config.json, vocabulary.json)
  • Machine-specific settings stored locally (audio device, GPU settings)
  • Windows: GPU acceleration via CUDA; Linux: CPU

Requirements

Windows

  • Python 3.13
  • NVIDIA GPU with CUDA 12 drivers
  • PortAudio (bundled with most Python sounddevice wheels)
  • pywin32 (for system tray and keyboard injection)
  • pyinstaller (for building a standalone executable)

Linux

  • Python 3.10+
  • PortAudio: sudo apt install portaudio19-dev

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh
./install.sh

Creates a .venv-linux virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Build

To produce a standalone Windows executable:

.venv-windows\Scripts\python.exe build.py

This uses PyInstaller to bundle the app and all dependencies into a single folder under dist/. The resulting executable can be run without a Python installation.

Configuration

config.json (shared, stored in the repo):

Key Default Description
hotkey ctrl+shift+space Recording trigger
model medium Whisper model size (tiny, base, small, medium, large-v2, large-v3)
language de Transcription language (de, en, fr, es, it, null = auto)
sample_rate 16000 Audio sample rate in Hz

Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:

  • Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
  • Linux: ~/.local/share/WhisperDictation/config_local.json

Vocabulary

Custom vocabulary/replacements can be added to vocabulary.json. These are passed as initial prompts to improve recognition of domain-specific terms.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model.