whisper-dictation/README.md

# Whisper Dictation

Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.

## Features

- System tray icon with settings GUI (tkinter)
- Configurable hotkey, model, language, audio device
- Shared config via git (`config.json`, `vocabulary.json`)
- Machine-specific settings stored locally (audio device, GPU settings)
- Windows: GPU acceleration via CUDA; Linux: CPU

## Requirements

### Windows
- Python 3.13
- NVIDIA GPU with CUDA 12 drivers
- [PortAudio](http://www.portaudio.com/) (bundled with most Python sounddevice wheels)

### Linux
- Python 3.10+
- PortAudio: `sudo apt install portaudio19-dev`

## Installation

### Windows

```bat
install.bat
```

This creates a `.venv-windows` virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.

### Linux

```bash
chmod +x install.sh start.sh
./install.sh
```

Creates a `.venv-linux` virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.

## Usage

### Windows
```bat
start.bat
```

### Linux
```bash
./start.sh
```

The app starts in the system tray. Hold the hotkey (default: `Ctrl+Shift+Space`) to record, release to transcribe and type into the active window.

## Configuration

`config.json` (shared, stored in the repo):

| Key | Default | Description |
|-----|---------|-------------|
| `hotkey` | `ctrl+shift+space` | Recording trigger |
| `model` | `medium` | Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`) |
| `language` | `de` | Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto) |
| `sample_rate` | `16000` | Audio sample rate in Hz |

Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:
- **Windows:** `%LOCALAPPDATA%\WhisperDictation\config_local.json`
- **Linux:** `~/.local/share/WhisperDictation/config_local.json`

## Vocabulary

Custom vocabulary/replacements can be added to `vocabulary.json`. These are passed as initial prompts to improve recognition of domain-specific terms.

## Model Download

On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for `medium`). Subsequent starts use the cached model.