docs: add GUI app design spec

This commit is contained in:
beo3000 2026-03-20 11:12:40 +01:00
parent 2464081f4c
commit ebfd751e8d
1 changed files with 184 additions and 0 deletions

View File

@ -0,0 +1,184 @@
# Whisper Dictation — GUI App Design
**Date:** 2026-03-20
**Status:** Approved
## Overview
Convert the existing `dictate.py` script into a proper packaged desktop application that:
- Runs without a terminal window on Windows and Linux
- Shows a compact log/status panel accessible from the tray icon
- Can be integrated into the system (autostart, start menu, desktop shortcut)
- Is distributed as a standalone binary via PyInstaller
## Module Structure
```
whisper-dictation/
├── main.py # Entry point
├── build.py # PyInstaller build script (platform-specific)
├── whisper-dictation.spec # PyInstaller spec (manual, for ctranslate2)
├── whisper_app/
│ ├── __init__.py
│ ├── app.py # AppState, central coordination, log queue
│ ├── audio.py # sounddevice stream, audio_callback
│ ├── transcriber.py # WhisperModel, stop_and_transcribe()
│ ├── hotkey.py # HotkeyListener (unchanged)
│ ├── config.py # load/save config + vocab, path resolution
│ ├── typer.py # type_text(), cross-platform
│ ├── tray.py # pystray icon + menu
│ ├── log_window.py # Compact log panel
│ ├── settings_window.py # Settings dialog
│ ├── vocab_window.py # Vocabulary dialog
│ ├── overlay.py # "Recording..." overlay
│ └── installer.py # System integration (autostart, start menu, shortcut)
├── config.json # Shared config (git-tracked)
├── vocabulary.json # Shared vocabulary (git-tracked)
├── start.sh # Dev start (Linux)
└── start.bat # Dev start (Windows)
```
## Path Handling
All path resolution lives in `config.py`. Must work in two modes:
```python
import sys, os
def _app_dir() -> str:
"""Root dir for config.json and vocabulary.json."""
if getattr(sys, "frozen", False):
# PyInstaller binary: use directory containing the executable
return os.path.dirname(sys.executable)
else:
# Dev mode: use script directory (git repo root)
return os.path.dirname(os.path.abspath(__file__ + "/../../"))
```
Machine-local config (`config_local.json`) continues to use `%LOCALAPPDATA%\WhisperDictation` (Windows) or `~/.local/share/WhisperDictation` (Linux) — unchanged.
## Logging Architecture
`app.py` owns a `queue.Queue[str]` and a pre-queue buffer list.
```python
_log_buffer: list[str] = [] # before queue is ready
_log_queue: queue.Queue | None = None
def log(msg: str) -> None:
if _log_queue is not None:
_log_queue.put(msg)
else:
_log_buffer.append(msg)
def set_log_queue(q: queue.Queue) -> None:
global _log_queue
_log_queue = q
for msg in _log_buffer:
q.put(msg)
_log_buffer.clear()
```
All `print()` calls in all modules are replaced with `app.log()`.
## Log Panel (Compact)
Opened via tray icon left-click or "Anzeigen" menu item. Implemented in `log_window.py`.
```
┌─────────────────────────────────────┐
│ ● WHISPER DICTATION medium·de ✕ │
├─────────────────────────────────────┤
│ Model ready. │
│ Hotkey: ctrl+shift+space │
│ ● Recording... │
│ Audio: 2.1s RMS: 0.048 │
│ ✓ "Das ist ein Test" │
├─────────────────────────────────────┤
│ ⚙ Einstellungen 📚 Vokabular 🗑 │
└─────────────────────────────────────┘
```
- Size: ~380×220px, **not resizable**
- `tk.Text` widget in read-only mode, max 200 lines (older lines discarded)
- Auto-scroll to bottom on new messages
- Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info
- Close button → `withdraw()` (does not quit the app)
- 🗑 button → clears the text widget
- Queue polled via `root.after(100, _poll_log_queue)`
## Settings Window — Installation Section
New section "INSTALLATION" added to the existing settings window. Implemented in `installer.py`.
Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen".
| Feature | Windows | Linux |
|---|---|---|
| Autostart beim Login | `HKCU\Software\Microsoft\Windows\CurrentVersion\Run` | `~/.config/autostart/whisper-dictation.desktop` |
| Startmenü-Eintrag | `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk` | `~/.local/share/applications/whisper-dictation.desktop` |
| Desktop-Verknüpfung | `%USERPROFILE%\Desktop\Whisper Dictation.lnk` | `~/Desktop/whisper-dictation.desktop` |
Windows `.lnk` files are created via `pywin32` (`win32com.client.Dispatch("WScript.Shell")`).
**Only available when running as a frozen binary.** In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar".
## Icon
The existing tray icon (64×64 PIL `Image`) is extended:
- At build time, `build.py` generates `icon.ico` (sizes: 16, 32, 48, 256) via Pillow
- Used as the PyInstaller `--icon` and for `.lnk` shortcuts
## PyInstaller Build
### Manual `.spec` file (required for ctranslate2/faster-whisper)
```python
# whisper-dictation.spec (key sections)
a = Analysis(
['main.py'],
hiddenimports=[
'ctranslate2',
'faster_whisper',
'sounddevice',
'pynput.keyboard._win32', # Windows
'pynput.keyboard._xorg', # Linux
],
datas=[
('config.json', '.'),
('vocabulary.json', '.'),
],
)
exe = EXE(a.pure, ..., console=False, icon='icon.ico')
```
### `build.py`
1. Generates `icon.ico` from PIL
2. Runs PyInstaller with the `.spec` file
3. Copies `config.json` and `vocabulary.json` into `dist/whisper-dictation/`
### Platform requirement
PyInstaller cannot cross-compile. **Build must be run separately on each platform:**
- Windows: `python build.py``dist/whisper-dictation/whisper-dictation.exe`
- Linux: `python build.py``dist/whisper-dictation/whisper-dictation`
## New Dependencies
| Package | Purpose |
|---|---|
| `pywin32` | `.lnk` shortcut creation (Windows only) |
Added to `requirements-windows.txt`. Not required on Linux.
## Out of Scope
- Code signing / notarization
- Auto-updater
- Versioning
- Cross-compilation
## Open Questions
_(none — all resolved during design session)_