whisper-dictation/README.md

79 lines
2.3 KiB
Markdown

# Whisper Dictation
Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.
## Features
- System tray icon with settings GUI (tkinter)
- Configurable hotkey, model, language, audio device
- Shared config via git (`config.json`, `vocabulary.json`)
- Machine-specific settings stored locally (audio device, GPU settings)
- Windows: GPU acceleration via CUDA; Linux: CPU
## Requirements
### Windows
- Python 3.13
- NVIDIA GPU with CUDA 12 drivers
- [PortAudio](http://www.portaudio.com/) (bundled with most Python sounddevice wheels)
### Linux
- Python 3.10+
- PortAudio: `sudo apt install portaudio19-dev`
## Installation
### Windows
```bat
install.bat
```
This creates a `.venv-windows` virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.
### Linux
```bash
chmod +x install.sh start.sh
./install.sh
```
Creates a `.venv-linux` virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.
## Usage
### Windows
```bat
start.bat
```
### Linux
```bash
./start.sh
```
The app starts in the system tray. Hold the hotkey (default: `Ctrl+Shift+Space`) to record, release to transcribe and type into the active window.
## Configuration
`config.json` (shared, stored in the repo):
| Key | Default | Description |
|-----|---------|-------------|
| `hotkey` | `ctrl+shift+space` | Recording trigger |
| `model` | `medium` | Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`) |
| `language` | `de` | Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto) |
| `sample_rate` | `16000` | Audio sample rate in Hz |
Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:
- **Windows:** `%LOCALAPPDATA%\WhisperDictation\config_local.json`
- **Linux:** `~/.local/share/WhisperDictation/config_local.json`
## Vocabulary
Custom vocabulary/replacements can be added to `vocabulary.json`. These are passed as initial prompts to improve recognition of domain-specific terms.
## Model Download
On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for `medium`). Subsequent starts use the cached model.