whisper-dictation/docs/superpowers/specs/2026-03-20-gui-app-design.md

8.6 KiB
Raw Blame History

Whisper Dictation — GUI App Design

Date: 2026-03-20 Status: Approved

Overview

Convert the existing dictate.py script into a proper packaged desktop application that:

  • Runs without a terminal window on Windows and Linux
  • Shows a compact log/status panel accessible from the tray icon
  • Can be integrated into the system (autostart, start menu, desktop shortcut)
  • Is distributed as a standalone binary via PyInstaller

Module Structure

whisper-dictation/
├── main.py                        # Entry point
├── build.py                       # PyInstaller build script (platform-specific)
├── whisper-dictation.spec         # PyInstaller spec (manual, for ctranslate2)
├── whisper_app/
│   ├── __init__.py
│   ├── app.py                     # AppState, central coordination, log queue
│   ├── audio.py                   # sounddevice stream, audio_callback
│   ├── transcriber.py             # WhisperModel, stop_and_transcribe()
│   ├── hotkey.py                  # HotkeyListener (unchanged)
│   ├── config.py                  # load/save config + vocab, path resolution
│   ├── typer.py                   # type_text(), cross-platform
│   ├── tray.py                    # pystray icon + menu
│   ├── log_window.py              # Compact log panel
│   ├── settings_window.py         # Settings dialog
│   ├── vocab_window.py            # Vocabulary dialog
│   ├── overlay.py                 # "Recording..." overlay
│   └── installer.py               # System integration (autostart, start menu, shortcut)
├── config.json                    # Shared config (git-tracked)
├── vocabulary.json                # Shared vocabulary (git-tracked)
├── start.sh                       # Dev start (Linux)
└── start.bat                      # Dev start (Windows)

Path Handling

All path resolution lives in config.py. Must work in two modes:

import sys, os

def _app_dir() -> str:
    """Root dir for config.json and vocabulary.json."""
    if getattr(sys, "frozen", False):
        # PyInstaller binary: use directory containing the executable
        return os.path.dirname(sys.executable)
    else:
        # Dev mode: use script directory (git repo root)
        # config.py lives at whisper_app/config.py → two levels up = repo root
        return os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

Machine-local config (config_local.json) continues to use %LOCALAPPDATA%\WhisperDictation (Windows) or ~/.local/share/WhisperDictation (Linux) — unchanged.

Logging Architecture

app.py owns a queue.Queue[str] and a pre-queue buffer list.

_log_buffer: list[str] = []   # before queue is ready, capped at 500 entries
_log_queue: queue.Queue | None = None
_log_lock = threading.Lock()

def log(msg: str) -> None:
    with _log_lock:
        if _log_queue is not None:
            _log_queue.put(msg)
        else:
            _log_buffer.append(msg)

def set_log_queue(q: queue.Queue) -> None:
    global _log_queue
    with _log_lock:
        _log_queue = q
        buffered = list(_log_buffer)
        _log_buffer.clear()
    for msg in buffered:          # flush outside the lock to avoid deadlock
        q.put_nowait(msg)

All print() calls in all modules are replaced with app.log().

Log Panel (Compact)

Opened via tray icon left-click or "Anzeigen" menu item. Implemented in log_window.py.

┌─────────────────────────────────────┐
│ ● WHISPER DICTATION    medium·de  ✕ │
├─────────────────────────────────────┤
│ Model ready.                        │
│ Hotkey: ctrl+shift+space            │
│ ● Recording...                      │
│ Audio: 2.1s  RMS: 0.048             │
│ ✓ "Das ist ein Test"                │
├─────────────────────────────────────┤
│ ⚙ Einstellungen  📚 Vokabular  🗑  │
└─────────────────────────────────────┘
  • Size: ~380×220px, not resizable
  • tk.Text widget in read-only mode, max 200 lines (older lines discarded)
  • Auto-scroll to bottom on new messages
  • Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info
  • Close button → withdraw() (does not quit the app)
  • 🗑 button → clears the text widget
  • Queue polled via root.after(100, _poll_log_queue)
  • The tkinter root (hidden) is always alive as long as the tray runs — withdraw() on the log window does not trigger mainloop exit. The app exits only via the tray "Beenden" menu item.

Settings Window — Installation Section

New section "INSTALLATION" added to the existing settings window. Implemented in installer.py.

Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen".

Feature Windows Linux
Autostart beim Login HKCU\Software\Microsoft\Windows\CurrentVersion\Run ~/.config/autostart/whisper-dictation.desktop
Startmenü-Eintrag %APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk ~/.local/share/applications/whisper-dictation.desktop
Desktop-Verknüpfung %USERPROFILE%\Desktop\Whisper Dictation.lnk XDG: xdg-user-dir DESKTOP (fallback: ~/Desktop)

Windows .lnk files are created via pywin32 (win32com.client.Dispatch("WScript.Shell")). import win32com must be guarded: if sys.platform == "win32" — top-level import is forbidden to avoid import errors on Linux.

Only available when running as a frozen binary. In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar".

Icon

The existing tray icon (64×64 PIL Image) is extended:

  • At build time, build.py generates icon.ico (sizes: 16, 32, 48, 256) via Pillow
  • Used as the PyInstaller --icon and for .lnk shortcuts

PyInstaller Build

Manual .spec file (required for ctranslate2/faster-whisper)

# whisper-dictation.spec (key sections)
a = Analysis(
    ['main.py'],
    hiddenimports=[
        'ctranslate2',
        'faster_whisper',
        'sounddevice',
        'pynput.keyboard._win32',    # Windows
        'pynput.keyboard._xorg',     # Linux X11
        'pynput.keyboard._uinput',   # Linux Wayland
    ],
    datas=[],  # config.json / vocabulary.json NOT bundled — see below
)
pyz = PYZ(a.pure)
exe = EXE(pyz, a.scripts, ..., console=False, icon='icon.ico')
coll = COLLECT(exe, a.binaries, a.datas, name='whisper-dictation')

config.json / vocabulary.json are NOT bundled via datas. They are user-editable files that live next to the binary. build.py copies them from the repo root into dist/whisper-dictation/ after the PyInstaller run. This is the single authoritative location in frozen mode (os.path.dirname(sys.executable)).

build.py

  1. Generates icon.ico from PIL (must run before PyInstaller — .spec references it)
  2. Runs PyInstaller with the .spec file
  3. Copies config.json and vocabulary.json into dist/whisper-dictation/ only if they don't already exist there (to avoid overwriting user edits)

Platform requirement

PyInstaller cannot cross-compile. Build must be run separately on each platform:

  • Windows: python build.pydist/whisper-dictation/whisper-dictation.exe
  • Linux: python build.pydist/whisper-dictation/whisper-dictation

New Dependencies

Package Purpose
pywin32 .lnk shortcut creation (Windows only)

Added to requirements-windows.txt. Not required on Linux.

UI Language

All UI labels are in German. "WHISPER DICTATION" in the log panel header is the product name and stays as-is. All other UI text (buttons, section headers, tooltips) is German.

Implementation Notes

  • Startup crash visibility: With console=False, crashes before the tray appears produce no visible output. Implementer should wrap main() in a try/except that writes to a logfile (e.g., %LOCALAPPDATA%\WhisperDictation\error.log) as a last resort.
  • pynput hidden imports: Only keyboard backends are needed (_win32, _xorg, _uinput). No mouse backends required — hotkeys are keyboard-only.
  • _log_buffer cap: Enforce max 500 entries; if buffer exceeds cap, drop oldest entry before appending.

Out of Scope

  • Code signing / notarization
  • Auto-updater
  • Versioning
  • Cross-compilation

Open Questions

(none — all resolved during design session)