docs: add GUI app design spec
This commit is contained in:
parent
2464081f4c
commit
ebfd751e8d
|
|
@ -0,0 +1,184 @@
|
|||
# Whisper Dictation — GUI App Design
|
||||
|
||||
**Date:** 2026-03-20
|
||||
**Status:** Approved
|
||||
|
||||
## Overview
|
||||
|
||||
Convert the existing `dictate.py` script into a proper packaged desktop application that:
|
||||
- Runs without a terminal window on Windows and Linux
|
||||
- Shows a compact log/status panel accessible from the tray icon
|
||||
- Can be integrated into the system (autostart, start menu, desktop shortcut)
|
||||
- Is distributed as a standalone binary via PyInstaller
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
whisper-dictation/
|
||||
├── main.py # Entry point
|
||||
├── build.py # PyInstaller build script (platform-specific)
|
||||
├── whisper-dictation.spec # PyInstaller spec (manual, for ctranslate2)
|
||||
├── whisper_app/
|
||||
│ ├── __init__.py
|
||||
│ ├── app.py # AppState, central coordination, log queue
|
||||
│ ├── audio.py # sounddevice stream, audio_callback
|
||||
│ ├── transcriber.py # WhisperModel, stop_and_transcribe()
|
||||
│ ├── hotkey.py # HotkeyListener (unchanged)
|
||||
│ ├── config.py # load/save config + vocab, path resolution
|
||||
│ ├── typer.py # type_text(), cross-platform
|
||||
│ ├── tray.py # pystray icon + menu
|
||||
│ ├── log_window.py # Compact log panel
|
||||
│ ├── settings_window.py # Settings dialog
|
||||
│ ├── vocab_window.py # Vocabulary dialog
|
||||
│ ├── overlay.py # "Recording..." overlay
|
||||
│ └── installer.py # System integration (autostart, start menu, shortcut)
|
||||
├── config.json # Shared config (git-tracked)
|
||||
├── vocabulary.json # Shared vocabulary (git-tracked)
|
||||
├── start.sh # Dev start (Linux)
|
||||
└── start.bat # Dev start (Windows)
|
||||
```
|
||||
|
||||
## Path Handling
|
||||
|
||||
All path resolution lives in `config.py`. Must work in two modes:
|
||||
|
||||
```python
|
||||
import sys, os
|
||||
|
||||
def _app_dir() -> str:
|
||||
"""Root dir for config.json and vocabulary.json."""
|
||||
if getattr(sys, "frozen", False):
|
||||
# PyInstaller binary: use directory containing the executable
|
||||
return os.path.dirname(sys.executable)
|
||||
else:
|
||||
# Dev mode: use script directory (git repo root)
|
||||
return os.path.dirname(os.path.abspath(__file__ + "/../../"))
|
||||
```
|
||||
|
||||
Machine-local config (`config_local.json`) continues to use `%LOCALAPPDATA%\WhisperDictation` (Windows) or `~/.local/share/WhisperDictation` (Linux) — unchanged.
|
||||
|
||||
## Logging Architecture
|
||||
|
||||
`app.py` owns a `queue.Queue[str]` and a pre-queue buffer list.
|
||||
|
||||
```python
|
||||
_log_buffer: list[str] = [] # before queue is ready
|
||||
_log_queue: queue.Queue | None = None
|
||||
|
||||
def log(msg: str) -> None:
|
||||
if _log_queue is not None:
|
||||
_log_queue.put(msg)
|
||||
else:
|
||||
_log_buffer.append(msg)
|
||||
|
||||
def set_log_queue(q: queue.Queue) -> None:
|
||||
global _log_queue
|
||||
_log_queue = q
|
||||
for msg in _log_buffer:
|
||||
q.put(msg)
|
||||
_log_buffer.clear()
|
||||
```
|
||||
|
||||
All `print()` calls in all modules are replaced with `app.log()`.
|
||||
|
||||
## Log Panel (Compact)
|
||||
|
||||
Opened via tray icon left-click or "Anzeigen" menu item. Implemented in `log_window.py`.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ ● WHISPER DICTATION medium·de ✕ │
|
||||
├─────────────────────────────────────┤
|
||||
│ Model ready. │
|
||||
│ Hotkey: ctrl+shift+space │
|
||||
│ ● Recording... │
|
||||
│ Audio: 2.1s RMS: 0.048 │
|
||||
│ ✓ "Das ist ein Test" │
|
||||
├─────────────────────────────────────┤
|
||||
│ ⚙ Einstellungen 📚 Vokabular 🗑 │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- Size: ~380×220px, **not resizable**
|
||||
- `tk.Text` widget in read-only mode, max 200 lines (older lines discarded)
|
||||
- Auto-scroll to bottom on new messages
|
||||
- Color tags: green = ready/result, red = recording, yellow = transcribing, grey = info
|
||||
- Close button → `withdraw()` (does not quit the app)
|
||||
- 🗑 button → clears the text widget
|
||||
- Queue polled via `root.after(100, _poll_log_queue)`
|
||||
|
||||
## Settings Window — Installation Section
|
||||
|
||||
New section "INSTALLATION" added to the existing settings window. Implemented in `installer.py`.
|
||||
|
||||
Each integration shows status ("eingerichtet" / "nicht eingerichtet") and two buttons: "Einrichten" / "Entfernen".
|
||||
|
||||
| Feature | Windows | Linux |
|
||||
|---|---|---|
|
||||
| Autostart beim Login | `HKCU\Software\Microsoft\Windows\CurrentVersion\Run` | `~/.config/autostart/whisper-dictation.desktop` |
|
||||
| Startmenü-Eintrag | `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Whisper Dictation.lnk` | `~/.local/share/applications/whisper-dictation.desktop` |
|
||||
| Desktop-Verknüpfung | `%USERPROFILE%\Desktop\Whisper Dictation.lnk` | `~/Desktop/whisper-dictation.desktop` |
|
||||
|
||||
Windows `.lnk` files are created via `pywin32` (`win32com.client.Dispatch("WScript.Shell")`).
|
||||
|
||||
**Only available when running as a frozen binary.** In dev mode, the buttons are disabled with a tooltip "Nur im gebauten Binary verfügbar".
|
||||
|
||||
## Icon
|
||||
|
||||
The existing tray icon (64×64 PIL `Image`) is extended:
|
||||
- At build time, `build.py` generates `icon.ico` (sizes: 16, 32, 48, 256) via Pillow
|
||||
- Used as the PyInstaller `--icon` and for `.lnk` shortcuts
|
||||
|
||||
## PyInstaller Build
|
||||
|
||||
### Manual `.spec` file (required for ctranslate2/faster-whisper)
|
||||
|
||||
```python
|
||||
# whisper-dictation.spec (key sections)
|
||||
a = Analysis(
|
||||
['main.py'],
|
||||
hiddenimports=[
|
||||
'ctranslate2',
|
||||
'faster_whisper',
|
||||
'sounddevice',
|
||||
'pynput.keyboard._win32', # Windows
|
||||
'pynput.keyboard._xorg', # Linux
|
||||
],
|
||||
datas=[
|
||||
('config.json', '.'),
|
||||
('vocabulary.json', '.'),
|
||||
],
|
||||
)
|
||||
exe = EXE(a.pure, ..., console=False, icon='icon.ico')
|
||||
```
|
||||
|
||||
### `build.py`
|
||||
|
||||
1. Generates `icon.ico` from PIL
|
||||
2. Runs PyInstaller with the `.spec` file
|
||||
3. Copies `config.json` and `vocabulary.json` into `dist/whisper-dictation/`
|
||||
|
||||
### Platform requirement
|
||||
|
||||
PyInstaller cannot cross-compile. **Build must be run separately on each platform:**
|
||||
- Windows: `python build.py` → `dist/whisper-dictation/whisper-dictation.exe`
|
||||
- Linux: `python build.py` → `dist/whisper-dictation/whisper-dictation`
|
||||
|
||||
## New Dependencies
|
||||
|
||||
| Package | Purpose |
|
||||
|---|---|
|
||||
| `pywin32` | `.lnk` shortcut creation (Windows only) |
|
||||
|
||||
Added to `requirements-windows.txt`. Not required on Linux.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Code signing / notarization
|
||||
- Auto-updater
|
||||
- Versioning
|
||||
- Cross-compilation
|
||||
|
||||
## Open Questions
|
||||
|
||||
_(none — all resolved during design session)_
|
||||
Loading…
Reference in New Issue