whisper-dictation/README.md

2.3 KiB

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

  • System tray icon with settings GUI (tkinter)
  • Configurable hotkey, model, language, audio device
  • Shared config via git (config.json, vocabulary.json)
  • Machine-specific settings stored locally (audio device, GPU settings)
  • Windows: GPU acceleration via CUDA; Linux: CPU

Requirements

Windows

  • Python 3.13
  • NVIDIA GPU with CUDA 12 drivers
  • PortAudio (bundled with most Python sounddevice wheels)

Linux

  • Python 3.10+
  • PortAudio: sudo apt install portaudio19-dev

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh
./install.sh

Creates a .venv-linux virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Configuration

config.json (shared, stored in the repo):

Key Default Description
hotkey ctrl+shift+space Recording trigger
model medium Whisper model size (tiny, base, small, medium, large-v2, large-v3)
language de Transcription language (de, en, fr, es, it, null = auto)
sample_rate 16000 Audio sample rate in Hz

Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:

  • Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
  • Linux: ~/.local/share/WhisperDictation/config_local.json

Vocabulary

Custom vocabulary/replacements can be added to vocabulary.json. These are passed as initial prompts to improve recognition of domain-specific terms.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model.