merge from linux

This commit is contained in:
beo3000 2026-03-19 20:33:37 +01:00
parent 98519dc79d
commit 0d413f1058
3 changed files with 142 additions and 15 deletions

View File

@ -0,0 +1,8 @@
{
"permissions": {
"allow": [
"Bash(git remote:*)",
"Bash(git pull:*)"
]
}
}

78
README.md Normal file
View File

@ -0,0 +1,78 @@
# Whisper Dictation
Local GPU speech-to-text dictation tool. Hold a hotkey to record, release to transcribe and type the result into the active window. Runs fully offline — no cloud, no API key.
## Features
- System tray icon with settings GUI (tkinter)
- Configurable hotkey, model, language, audio device
- Shared config via git (`config.json`, `vocabulary.json`)
- Machine-specific settings stored locally (audio device, GPU settings)
- Windows: GPU acceleration via CUDA; Linux: CPU
## Requirements
### Windows
- Python 3.13
- NVIDIA GPU with CUDA 12 drivers
- [PortAudio](http://www.portaudio.com/) (bundled with most Python sounddevice wheels)
### Linux
- Python 3.10+
- PortAudio: `sudo apt install portaudio19-dev`
## Installation
### Windows
```bat
install.bat
```
This creates a `.venv-windows` virtual environment, installs all dependencies and the CUDA 12 DLLs required by faster-whisper.
### Linux
```bash
chmod +x install.sh start.sh
./install.sh
```
Creates a `.venv-linux` virtual environment. GPU support on Linux requires a manually installed CUDA environment; by default runs on CPU.
## Usage
### Windows
```bat
start.bat
```
### Linux
```bash
./start.sh
```
The app starts in the system tray. Hold the hotkey (default: `Ctrl+Shift+Space`) to record, release to transcribe and type into the active window.
## Configuration
`config.json` (shared, stored in the repo):
| Key | Default | Description |
|-----|---------|-------------|
| `hotkey` | `ctrl+shift+space` | Recording trigger |
| `model` | `medium` | Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`) |
| `language` | `de` | Transcription language (`de`, `en`, `fr`, `es`, `it`, `null` = auto) |
| `sample_rate` | `16000` | Audio sample rate in Hz |
Machine-specific settings (GPU device, compute type, audio device) are stored separately and not tracked by git:
- **Windows:** `%LOCALAPPDATA%\WhisperDictation\config_local.json`
- **Linux:** `~/.local/share/WhisperDictation/config_local.json`
## Vocabulary
Custom vocabulary/replacements can be added to `vocabulary.json`. These are passed as initial prompts to improve recognition of domain-specific terms.
## Model Download
On first start the selected Whisper model is downloaded automatically from HuggingFace (~500 MB for `medium`). Subsequent starts use the cached model.

View File

@ -1,18 +1,59 @@
{
"words": [],
"words": [
"test"
],
"replacements": [
{"from": "KRA", "to": "KRAH"},
{"from": "Atos", "to": "ATHOS"},
{"from": "Resistec", "to": "RESISTEC"},
{"from": "Resistek", "to": "RESISTEC"},
{"from": "HES", "to": "HEES"},
{"from": "Ackerschot", "to": "Ackerschott"},
{"from": "Carrois", "to": "Kauer"},
{"from": "Jouer fixe", "to": "Jour-Fixe"},
{"from": "Docuware", "to": "DocuWare"},
{"from": "Nates", "to": "Nejc"},
{"from": "Bittzeit", "to": "BitSight"},
{"from": "Kalmikow", "to": "Kalmykov"},
{"from": "Leifert", "to": "Leifer"}
{
"from": "KRA",
"to": "KRAH"
},
{
"from": "Atos",
"to": "ATHOS"
},
{
"from": "Resistec",
"to": "RESISTEC"
},
{
"from": "Resistek",
"to": "RESISTEC"
},
{
"from": "HES",
"to": "HEES"
},
{
"from": "Ackerschot",
"to": "Ackerschott"
},
{
"from": "Carrois",
"to": "Kauer"
},
{
"from": "Jouer fixe",
"to": "Jour-Fixe"
},
{
"from": "Docuware",
"to": "DocuWare"
},
{
"from": "Nates",
"to": "Nejc"
},
{
"from": "Bittzeit",
"to": "BitSight"
},
{
"from": "Kalmikow",
"to": "Kalmykov"
},
{
"from": "Leifert",
"to": "Leifer"
}
]
}