2.3 KiB

Raw Blame History

Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

Features

System tray icon with settings GUI (tkinter)
Configurable hotkey, model, language, audio device
Shared config via git (config.json, vocabulary.json)
Machine-specific settings stored locally (audio device, GPU settings)
Windows: GPU acceleration via CUDA; Linux: CPU

Requirements

Windows

Python 3.13
NVIDIA GPU with CUDA 12 drivers
PortAudio (bundled with most Python sounddevice wheels)

Linux

Python 3.10+
PortAudio: sudo apt install portaudio19-dev

Installation

Windows

install.bat

This creates a .venv-windows virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

Linux

chmod +x install.sh start.sh
./install.sh

Creates a .venv-linux virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.

Usage

Windows

start.bat

Linux

./start.sh

The app starts in the system tray. Hold the hotkey (default: Ctrl+Shift+Space) to record, release to transcribe and type into the active window.

Configuration

config.json (shared, stored in the repo):

Key	Default	Description
`hotkey`	`ctrl+shift+space`	Recording trigger
`model`	`medium`	Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`)
`language`	`de`	Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto)
`sample_rate`	`16000`	Audio sample rate in Hz

Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:

Windows: %LOCALAPPDATA%\WhisperDictation\config_local.json
Linux: ~/.local/share/WhisperDictation/config_local.json

Vocabulary

Custom vocabulary/replacements can be added to vocabulary.json. These are passed as initial prompts to improve recognition of domain-specific terms.

Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for medium). Subsequent starts use the cached model.

2.3 KiB Raw Blame History

Whisper Dictation

Features

Requirements

Windows

Linux

Installation

Windows

Linux

Usage

Windows

Linux

Configuration

Vocabulary

Model Download

2.3 KiB

Raw Blame History