diff --git a/docs/superpowers/specs/2026-03-20-gui-app-design.md b/docs/superpowers/specs/2026-03-20-gui-app-design.md new file mode 100644 index 0000000..ae377a4 --- /dev/null +++ b/docs/superpowers/specs/2026-03-20-gui-app-design.md @@ -0,0 +1,184 @@ +# Whisper Dictation — GUI App Design + +**Date:** 2026-03-20 +**Status:** Approved + +## Overview + +Convert the existing `dictate.py` script into a proper packaged desktop application that: +- Runs without a terminal window on Windows and Linux +- Shows a compact log/status panel accessible from the tray icon +- Can be integrated into the system (autostart, start menu, desktop shortcut) +- Is distributed as a standalone binary via PyInstaller + +## Module Structure + +``` +whisper-dictation/ +├── main.py # Entry point +├── build.py # PyInstaller build script (platform-specific) +├── whisper-dictation.spec # PyInstaller spec (manual, for ctranslate2) +├── whisper_app/ +│ ├── __init__.py +│ ├── app.py # AppState, central coordination, log queue +│ ├── audio.py # sounddevice stream, audio_callback +│ ├── transcriber.py # WhisperModel, stop_and_transcribe() +│ ├── hotkey.py # HotkeyListener (unchanged) +│ ├── config.py # load/save config + vocab, path resolution +│ ├── typer.py # type_text(), cross-platform +│ ├── tray.py # pystray icon + menu +│ ├── log_window.py # Compact log panel +│ ├── settings_window.py # Settings dialog +│ ├── vocab_window.py # Vocabulary dialog +│ ├── overlay.py # "Recording..." overlay +│ └── installer.py # System integration (autostart, start menu, shortcut) +├── config.json # Shared config (git-tracked) +├── vocabulary.json # Shared vocabulary (git-tracked) +├── start.sh # Dev start (Linux) +└── start.bat # Dev start (Windows) +``` + +## Path Handling + +All path resolution lives in `config.py`. Must work in two modes: + +```python +import sys, os + +def _app_dir() -> str: + """Root dir for config.json and vocabulary.json.""" + if getattr(sys, "frozen", False): + # PyInstaller binary: use directory containing the executable + return os.path.dirname(sys.executable) + else: + # Dev mode: use script directory (git repo root) + return os.path.dirname(os.path.abspath(__file__ + "/../../")) +``` + +Machine-local config (`config_local.json`) continues to use `%LOCALAPPDATA%\WhisperDictation` (Windows) or `~/.local/share/WhisperDictation` (Linux) — unchanged. + +## Logging Architecture + +`app.py` owns a `queue.Queue[str]` and a pre-queue buffer list. + +```python +_log_buffer: list[str] = [] # before queue is ready +_log_queue: queue.Queue | None = None + +def log(msg: str) -> None: + if _log_queue is not None: + _log_queue.put(msg) + else: + _log_buffer.append(msg) + +def set_log_queue(q: queue.Queue) -> None: + global _log_queue + _log_queue = q + for msg in _log_buffer: + q.put(msg) + _log_buffer.clear() +``` + +All `print()` calls in all modules are replaced with `app.log()`. + +## Log Panel (Compact) + +Opened via tray icon left-click or "Anzeigen" menu item. Implemented in `log_window.py`. + +``` +┌─────────────────────────────────────┐ +│ ● WHISPER DICTATION medium·de ✕ │ +├─────────────────────────────────────┤ +│ Model ready. │ +│ Hotkey: ctrl+shift+space │ +│ ● Recording... │ +│ Audio: 2.1s RMS: 0.048 │ +│ ✓ "Das ist ein Test" │ +├─────────────────────────────────────┤ +│ ⚙ Einstellungen 📚 Vokabular 🗑 │ +└─────────────────────────────────────┘ +``` + +- Size: ~380×220px, **not resizable** +- `tk.Text` widget in read-only mode, max 200 lines (older lines discarded) +- Auto-scroll to bottom on new messages +- Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info +- Close button → `withdraw()` (does not quit the app) +- 🗑 button → clears the text widget +- Queue polled via `root.after(100, _poll_log_queue)` + +## Settings Window — Installation Section + +New section "INSTALLATION" added to the existing settings window. Implemented in `installer.py`. + +Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen". + +| Feature | Windows | Linux | +|---|---|---| +| Autostart beim Login | `HKCU\Software\Microsoft\Windows\CurrentVersion\Run` | `~/.config/autostart/whisper-dictation.desktop` | +| Startmenü-Eintrag | `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk` | `~/.local/share/applications/whisper-dictation.desktop` | +| Desktop-Verknüpfung | `%USERPROFILE%\Desktop\Whisper Dictation.lnk` | `~/Desktop/whisper-dictation.desktop` | + +Windows `.lnk` files are created via `pywin32` (`win32com.client.Dispatch("WScript.Shell")`). + +**Only available when running as a frozen binary.** In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar". + +## Icon + +The existing tray icon (64×64 PIL `Image`) is extended: +- At build time, `build.py` generates `icon.ico` (sizes: 16, 32, 48, 256) via Pillow +- Used as the PyInstaller `--icon` and for `.lnk` shortcuts + +## PyInstaller Build + +### Manual `.spec` file (required for ctranslate2/faster-whisper) + +```python +# whisper-dictation.spec (key sections) +a = Analysis( + ['main.py'], + hiddenimports=[ + 'ctranslate2', + 'faster_whisper', + 'sounddevice', + 'pynput.keyboard._win32', # Windows + 'pynput.keyboard._xorg', # Linux + ], + datas=[ + ('config.json', '.'), + ('vocabulary.json', '.'), + ], +) +exe = EXE(a.pure, ..., console=False, icon='icon.ico') +``` + +### `build.py` + +1. Generates `icon.ico` from PIL +2. Runs PyInstaller with the `.spec` file +3. Copies `config.json` and `vocabulary.json` into `dist/whisper-dictation/` + +### Platform requirement + +PyInstaller cannot cross-compile. **Build must be run separately on each platform:** +- Windows: `python build.py` → `dist/whisper-dictation/whisper-dictation.exe` +- Linux: `python build.py` → `dist/whisper-dictation/whisper-dictation` + +## New Dependencies + +| Package | Purpose | +|---|---| +| `pywin32` | `.lnk` shortcut creation (Windows only) | + +Added to `requirements-windows.txt`. Not required on Linux. + +## Out of Scope + +- Code signing / notarization +- Auto-updater +- Versioning +- Cross-compilation + +## Open Questions + +_(none — all resolved during design session)_