8.6 KiB
Whisper Dictation — GUI App Design
Date: 2026-03-20 Status: Approved
Overview
Convert the existing dictate.py script into a proper packaged desktop application that:
- Runs without a terminal window on Windows and Linux
- Shows a compact log/status panel accessible from the tray icon
- Can be integrated into the system (autostart, start menu, desktop shortcut)
- Is distributed as a standalone binary via PyInstaller
Module Structure
whisper-dictation/
├── main.py # Entry point
├── build.py # PyInstaller build script (platform-specific)
├── whisper-dictation.spec # PyInstaller spec (manual, for ctranslate2)
├── whisper_app/
│ ├── __init__.py
│ ├── app.py # AppState, central coordination, log queue
│ ├── audio.py # sounddevice stream, audio_callback
│ ├── transcriber.py # WhisperModel, stop_and_transcribe()
│ ├── hotkey.py # HotkeyListener (unchanged)
│ ├── config.py # load/save config + vocab, path resolution
│ ├── typer.py # type_text(), cross-platform
│ ├── tray.py # pystray icon + menu
│ ├── log_window.py # Compact log panel
│ ├── settings_window.py # Settings dialog
│ ├── vocab_window.py # Vocabulary dialog
│ ├── overlay.py # "Recording..." overlay
│ └── installer.py # System integration (autostart, start menu, shortcut)
├── config.json # Shared config (git-tracked)
├── vocabulary.json # Shared vocabulary (git-tracked)
├── start.sh # Dev start (Linux)
└── start.bat # Dev start (Windows)
Path Handling
All path resolution lives in config.py. Must work in two modes:
import sys, os
def _app_dir() -> str:
"""Root dir for config.json and vocabulary.json."""
if getattr(sys, "frozen", False):
# PyInstaller binary: use directory containing the executable
return os.path.dirname(sys.executable)
else:
# Dev mode: use script directory (git repo root)
# config.py lives at whisper_app/config.py → two levels up = repo root
return os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
Machine-local config (config_local.json) continues to use %LOCALAPPDATA%\WhisperDictation (Windows) or ~/.local/share/WhisperDictation (Linux) — unchanged.
Logging Architecture
app.py owns a queue.Queue[str] and a pre-queue buffer list.
_log_buffer: list[str] = [] # before queue is ready, capped at 500 entries
_log_queue: queue.Queue | None = None
_log_lock = threading.Lock()
def log(msg: str) -> None:
with _log_lock:
if _log_queue is not None:
_log_queue.put(msg)
else:
_log_buffer.append(msg)
def set_log_queue(q: queue.Queue) -> None:
global _log_queue
with _log_lock:
_log_queue = q
buffered = list(_log_buffer)
_log_buffer.clear()
for msg in buffered: # flush outside the lock to avoid deadlock
q.put_nowait(msg)
All print() calls in all modules are replaced with app.log().
Log Panel (Compact)
Opened via tray icon left-click or "Anzeigen" menu item. Implemented in log_window.py.
┌─────────────────────────────────────┐
│ ● WHISPER DICTATION medium·de ✕ │
├─────────────────────────────────────┤
│ Model ready. │
│ Hotkey: ctrl+shift+space │
│ ● Recording... │
│ Audio: 2.1s RMS: 0.048 │
│ ✓ "Das ist ein Test" │
├─────────────────────────────────────┤
│ ⚙ Einstellungen 📚 Vokabular 🗑 │
└─────────────────────────────────────┘
- Size: ~380×220px, not resizable
tk.Textwidget in read-only mode, max 200 lines (older lines discarded)- Auto-scroll to bottom on new messages
- Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info
- Close button →
withdraw()(does not quit the app) - 🗑 button → clears the text widget
- Queue polled via
root.after(100, _poll_log_queue) - The tkinter
root(hidden) is always alive as long as the tray runs —withdraw()on the log window does not trigger mainloop exit. The app exits only via the tray "Beenden" menu item.
Settings Window — Installation Section
New section "INSTALLATION" added to the existing settings window. Implemented in installer.py.
Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen".
| Feature | Windows | Linux |
|---|---|---|
| Autostart beim Login | HKCU\Software\Microsoft\Windows\CurrentVersion\Run |
~/.config/autostart/whisper-dictation.desktop |
| Startmenü-Eintrag | %APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk |
~/.local/share/applications/whisper-dictation.desktop |
| Desktop-Verknüpfung | %USERPROFILE%\Desktop\Whisper Dictation.lnk |
XDG: xdg-user-dir DESKTOP (fallback: ~/Desktop) |
Windows .lnk files are created via pywin32 (win32com.client.Dispatch("WScript.Shell")).
import win32com must be guarded: if sys.platform == "win32" — top-level import is forbidden to avoid import errors on Linux.
Only available when running as a frozen binary. In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar".
Icon
The existing tray icon (64×64 PIL Image) is extended:
- At build time,
build.pygeneratesicon.ico(sizes: 16, 32, 48, 256) via Pillow - Used as the PyInstaller
--iconand for.lnkshortcuts
PyInstaller Build
Manual .spec file (required for ctranslate2/faster-whisper)
# whisper-dictation.spec (key sections)
a = Analysis(
['main.py'],
hiddenimports=[
'ctranslate2',
'faster_whisper',
'sounddevice',
'pynput.keyboard._win32', # Windows
'pynput.keyboard._xorg', # Linux X11
'pynput.keyboard._uinput', # Linux Wayland
],
datas=[], # config.json / vocabulary.json NOT bundled — see below
)
pyz = PYZ(a.pure)
exe = EXE(pyz, a.scripts, ..., console=False, icon='icon.ico')
coll = COLLECT(exe, a.binaries, a.datas, name='whisper-dictation')
config.json / vocabulary.json are NOT bundled via datas.
They are user-editable files that live next to the binary. build.py copies them from the repo root into dist/whisper-dictation/ after the PyInstaller run. This is the single authoritative location in frozen mode (os.path.dirname(sys.executable)).
build.py
- Generates
icon.icofrom PIL (must run before PyInstaller —.specreferences it) - Runs PyInstaller with the
.specfile - Copies
config.jsonandvocabulary.jsonintodist/whisper-dictation/only if they don't already exist there (to avoid overwriting user edits)
Platform requirement
PyInstaller cannot cross-compile. Build must be run separately on each platform:
- Windows:
python build.py→dist/whisper-dictation/whisper-dictation.exe - Linux:
python build.py→dist/whisper-dictation/whisper-dictation
New Dependencies
| Package | Purpose |
|---|---|
pywin32 |
.lnk shortcut creation (Windows only) |
Added to requirements-windows.txt. Not required on Linux.
UI Language
All UI labels are in German. "WHISPER DICTATION" in the log panel header is the product name and stays as-is. All other UI text (buttons, section headers, tooltips) is German.
Implementation Notes
- Startup crash visibility: With
console=False, crashes before the tray appears produce no visible output. Implementer should wrapmain()in a try/except that writes to a logfile (e.g.,%LOCALAPPDATA%\WhisperDictation\error.log) as a last resort. - pynput hidden imports: Only keyboard backends are needed (
_win32,_xorg,_uinput). No mouse backends required — hotkeys are keyboard-only. _log_buffercap: Enforce max 500 entries; if buffer exceeds cap, drop oldest entry before appending.
Out of Scope
- Code signing / notarization
- Auto-updater
- Versioning
- Cross-compilation
Open Questions
(none — all resolved during design session)