6.7 KiB
Whisper Dictation — GUI App Design
Date: 2026-03-20 Status: Approved
Overview
Convert the existing dictate.py script into a proper packaged desktop application that:
- Runs without a terminal window on Windows and Linux
- Shows a compact log/status panel accessible from the tray icon
- Can be integrated into the system (autostart, start menu, desktop shortcut)
- Is distributed as a standalone binary via PyInstaller
Module Structure
whisper-dictation/
├── main.py # Entry point
├── build.py # PyInstaller build script (platform-specific)
├── whisper-dictation.spec # PyInstaller spec (manual, for ctranslate2)
├── whisper_app/
│ ├── __init__.py
│ ├── app.py # AppState, central coordination, log queue
│ ├── audio.py # sounddevice stream, audio_callback
│ ├── transcriber.py # WhisperModel, stop_and_transcribe()
│ ├── hotkey.py # HotkeyListener (unchanged)
│ ├── config.py # load/save config + vocab, path resolution
│ ├── typer.py # type_text(), cross-platform
│ ├── tray.py # pystray icon + menu
│ ├── log_window.py # Compact log panel
│ ├── settings_window.py # Settings dialog
│ ├── vocab_window.py # Vocabulary dialog
│ ├── overlay.py # "Recording..." overlay
│ └── installer.py # System integration (autostart, start menu, shortcut)
├── config.json # Shared config (git-tracked)
├── vocabulary.json # Shared vocabulary (git-tracked)
├── start.sh # Dev start (Linux)
└── start.bat # Dev start (Windows)
Path Handling
All path resolution lives in config.py. Must work in two modes:
import sys, os
def _app_dir() -> str:
"""Root dir for config.json and vocabulary.json."""
if getattr(sys, "frozen", False):
# PyInstaller binary: use directory containing the executable
return os.path.dirname(sys.executable)
else:
# Dev mode: use script directory (git repo root)
return os.path.dirname(os.path.abspath(__file__ + "/../../"))
Machine-local config (config_local.json) continues to use %LOCALAPPDATA%\WhisperDictation (Windows) or ~/.local/share/WhisperDictation (Linux) — unchanged.
Logging Architecture
app.py owns a queue.Queue[str] and a pre-queue buffer list.
_log_buffer: list[str] = [] # before queue is ready
_log_queue: queue.Queue | None = None
def log(msg: str) -> None:
if _log_queue is not None:
_log_queue.put(msg)
else:
_log_buffer.append(msg)
def set_log_queue(q: queue.Queue) -> None:
global _log_queue
_log_queue = q
for msg in _log_buffer:
q.put(msg)
_log_buffer.clear()
All print() calls in all modules are replaced with app.log().
Log Panel (Compact)
Opened via tray icon left-click or "Anzeigen" menu item. Implemented in log_window.py.
┌─────────────────────────────────────┐
│ ● WHISPER DICTATION medium·de ✕ │
├─────────────────────────────────────┤
│ Model ready. │
│ Hotkey: ctrl+shift+space │
│ ● Recording... │
│ Audio: 2.1s RMS: 0.048 │
│ ✓ "Das ist ein Test" │
├─────────────────────────────────────┤
│ ⚙ Einstellungen 📚 Vokabular 🗑 │
└─────────────────────────────────────┘
- Size: ~380×220px, not resizable
tk.Textwidget in read-only mode, max 200 lines (older lines discarded)- Auto-scroll to bottom on new messages
- Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info
- Close button →
withdraw()(does not quit the app) - 🗑 button → clears the text widget
- Queue polled via
root.after(100, _poll_log_queue)
Settings Window — Installation Section
New section "INSTALLATION" added to the existing settings window. Implemented in installer.py.
Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen".
| Feature | Windows | Linux |
|---|---|---|
| Autostart beim Login | HKCU\Software\Microsoft\Windows\CurrentVersion\Run |
~/.config/autostart/whisper-dictation.desktop |
| Startmenü-Eintrag | %APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk |
~/.local/share/applications/whisper-dictation.desktop |
| Desktop-Verknüpfung | %USERPROFILE%\Desktop\Whisper Dictation.lnk |
~/Desktop/whisper-dictation.desktop |
Windows .lnk files are created via pywin32 (win32com.client.Dispatch("WScript.Shell")).
Only available when running as a frozen binary. In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar".
Icon
The existing tray icon (64×64 PIL Image) is extended:
- At build time,
build.pygeneratesicon.ico(sizes: 16, 32, 48, 256) via Pillow - Used as the PyInstaller
--iconand for.lnkshortcuts
PyInstaller Build
Manual .spec file (required for ctranslate2/faster-whisper)
# whisper-dictation.spec (key sections)
a = Analysis(
['main.py'],
hiddenimports=[
'ctranslate2',
'faster_whisper',
'sounddevice',
'pynput.keyboard._win32', # Windows
'pynput.keyboard._xorg', # Linux
],
datas=[
('config.json', '.'),
('vocabulary.json', '.'),
],
)
exe = EXE(a.pure, ..., console=False, icon='icon.ico')
build.py
- Generates
icon.icofrom PIL - Runs PyInstaller with the
.specfile - Copies
config.jsonandvocabulary.jsonintodist/whisper-dictation/
Platform requirement
PyInstaller cannot cross-compile. Build must be run separately on each platform:
- Windows:
python build.py→dist/whisper-dictation/whisper-dictation.exe - Linux:
python build.py→dist/whisper-dictation/whisper-dictation
New Dependencies
| Package | Purpose |
|---|---|
pywin32 |
.lnk shortcut creation (Windows only) |
Added to requirements-windows.txt. Not required on Linux.
Out of Scope
- Code signing / notarization
- Auto-updater
- Versioning
- Cross-compilation
Open Questions
(none — all resolved during design session)