whisper-dictation/docs/superpowers/specs/2026-03-20-gui-app-design.md

6.7 KiB
Raw Blame History

Whisper Dictation — GUI App Design

Date: 2026-03-20 Status: Approved

Overview

Convert the existing dictate.py script into a proper packaged desktop application that:

  • Runs without a terminal window on Windows and Linux
  • Shows a compact log/status panel accessible from the tray icon
  • Can be integrated into the system (autostart, start menu, desktop shortcut)
  • Is distributed as a standalone binary via PyInstaller

Module Structure

whisper-dictation/
├── main.py                        # Entry point
├── build.py                       # PyInstaller build script (platform-specific)
├── whisper-dictation.spec         # PyInstaller spec (manual, for ctranslate2)
├── whisper_app/
│   ├── __init__.py
│   ├── app.py                     # AppState, central coordination, log queue
│   ├── audio.py                   # sounddevice stream, audio_callback
│   ├── transcriber.py             # WhisperModel, stop_and_transcribe()
│   ├── hotkey.py                  # HotkeyListener (unchanged)
│   ├── config.py                  # load/save config + vocab, path resolution
│   ├── typer.py                   # type_text(), cross-platform
│   ├── tray.py                    # pystray icon + menu
│   ├── log_window.py              # Compact log panel
│   ├── settings_window.py         # Settings dialog
│   ├── vocab_window.py            # Vocabulary dialog
│   ├── overlay.py                 # "Recording..." overlay
│   └── installer.py               # System integration (autostart, start menu, shortcut)
├── config.json                    # Shared config (git-tracked)
├── vocabulary.json                # Shared vocabulary (git-tracked)
├── start.sh                       # Dev start (Linux)
└── start.bat                      # Dev start (Windows)

Path Handling

All path resolution lives in config.py. Must work in two modes:

import sys, os

def _app_dir() -> str:
    """Root dir for config.json and vocabulary.json."""
    if getattr(sys, "frozen", False):
        # PyInstaller binary: use directory containing the executable
        return os.path.dirname(sys.executable)
    else:
        # Dev mode: use script directory (git repo root)
        return os.path.dirname(os.path.abspath(__file__ + "/../../"))

Machine-local config (config_local.json) continues to use %LOCALAPPDATA%\WhisperDictation (Windows) or ~/.local/share/WhisperDictation (Linux) — unchanged.

Logging Architecture

app.py owns a queue.Queue[str] and a pre-queue buffer list.

_log_buffer: list[str] = []   # before queue is ready
_log_queue: queue.Queue | None = None

def log(msg: str) -> None:
    if _log_queue is not None:
        _log_queue.put(msg)
    else:
        _log_buffer.append(msg)

def set_log_queue(q: queue.Queue) -> None:
    global _log_queue
    _log_queue = q
    for msg in _log_buffer:
        q.put(msg)
    _log_buffer.clear()

All print() calls in all modules are replaced with app.log().

Log Panel (Compact)

Opened via tray icon left-click or "Anzeigen" menu item. Implemented in log_window.py.

┌─────────────────────────────────────┐
│ ● WHISPER DICTATION    medium·de  ✕ │
├─────────────────────────────────────┤
│ Model ready.                        │
│ Hotkey: ctrl+shift+space            │
│ ● Recording...                      │
│ Audio: 2.1s  RMS: 0.048             │
│ ✓ "Das ist ein Test"                │
├─────────────────────────────────────┤
│ ⚙ Einstellungen  📚 Vokabular  🗑  │
└─────────────────────────────────────┘
  • Size: ~380×220px, not resizable
  • tk.Text widget in read-only mode, max 200 lines (older lines discarded)
  • Auto-scroll to bottom on new messages
  • Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info
  • Close button → withdraw() (does not quit the app)
  • 🗑 button → clears the text widget
  • Queue polled via root.after(100, _poll_log_queue)

Settings Window — Installation Section

New section "INSTALLATION" added to the existing settings window. Implemented in installer.py.

Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen".

Feature Windows Linux
Autostart beim Login HKCU\Software\Microsoft\Windows\CurrentVersion\Run ~/.config/autostart/whisper-dictation.desktop
Startmenü-Eintrag %APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk ~/.local/share/applications/whisper-dictation.desktop
Desktop-Verknüpfung %USERPROFILE%\Desktop\Whisper Dictation.lnk ~/Desktop/whisper-dictation.desktop

Windows .lnk files are created via pywin32 (win32com.client.Dispatch("WScript.Shell")).

Only available when running as a frozen binary. In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar".

Icon

The existing tray icon (64×64 PIL Image) is extended:

  • At build time, build.py generates icon.ico (sizes: 16, 32, 48, 256) via Pillow
  • Used as the PyInstaller --icon and for .lnk shortcuts

PyInstaller Build

Manual .spec file (required for ctranslate2/faster-whisper)

# whisper-dictation.spec (key sections)
a = Analysis(
    ['main.py'],
    hiddenimports=[
        'ctranslate2',
        'faster_whisper',
        'sounddevice',
        'pynput.keyboard._win32',   # Windows
        'pynput.keyboard._xorg',    # Linux
    ],
    datas=[
        ('config.json', '.'),
        ('vocabulary.json', '.'),
    ],
)
exe = EXE(a.pure, ..., console=False, icon='icon.ico')

build.py

  1. Generates icon.ico from PIL
  2. Runs PyInstaller with the .spec file
  3. Copies config.json and vocabulary.json into dist/whisper-dictation/

Platform requirement

PyInstaller cannot cross-compile. Build must be run separately on each platform:

  • Windows: python build.pydist/whisper-dictation/whisper-dictation.exe
  • Linux: python build.pydist/whisper-dictation/whisper-dictation

New Dependencies

Package Purpose
pywin32 .lnk shortcut creation (Windows only)

Added to requirements-windows.txt. Not required on Linux.

Out of Scope

  • Code signing / notarization
  • Auto-updater
  • Versioning
  • Cross-compilation

Open Questions

(none — all resolved during design session)