# Full-Text Search ## Overview Ka-Note implements a hybrid full-text search strategy: small in-memory corpora (contexts, page titles) are filtered client-side; the large corpus (history entry text, page body) is indexed server-side using SQLite FTS5 and queried via HTTP. ## Architecture ### Search tiers | Entity | Where | Method | |---|---|---| | Contexts (name) | Client only | Substring on in-memory Svelte store | | Pages (title) | Client only | Substring on in-memory Svelte store | | HistoryEntries (text) | Server FTS5 | Debounced HTTP GET /api/search | | Pages (body) | Server FTS5 | Debounced HTTP GET /api/search | History entries are the primary scaling concern (years of daily journals → tens of thousands of rows). SQLite FTS5 with BM25 ranking handles this efficiently without additional infrastructure. ### Offline fallback When the server is unreachable, CommandBar falls back to local results (contexts, page titles) only and shows a notice: "Server nicht erreichbar — nur lokale Ergebnisse". --- ## Server ### FTS5 tables Migration: `server/drizzle/0013_fts_search.sql` Two virtual tables using the `unicode61` tokenizer (handles German umlauts correctly, no stemming): - `fts_history` — content table backed by `history_entries` (columns: `text`) - `fts_pages` — content table backed by `pages` (columns: `title`, `body`) Both tables are populated via `INSERT INTO fts_*(...) VALUES('rebuild')` on first migration run. ### Index maintenance FTS index is updated synchronously after every write, covering all server-side write paths: | Write path | File | FTS update | |---|---|---| | Sync push (primary client sync) | `sync-service.ts` → `pushChanges()` | after each upsert | | Trash / soft-delete | `routes/trash.ts` | after batch update | | AI bundle upload (ZIP) | `ai-export-service.ts` → `applyOps()` | after each op | | AI legacy JSON upload | `ai-export-service.ts` → `applyOps()` | after each op | | Startup drift recovery | `index.ts` `setImmediate` | full rebuild if mismatch > 10 | All paths use `better-sqlite3` prepared statements. Shared helper `applyOps()` in `ai-export-service.ts` handles both upload variants. Soft-deleted rows are removed from FTS; active rows are re-indexed via `INSERT OR REPLACE … SELECT`. **Startup consistency check:** On each server start, row counts of `history_entries` (non-deleted) and `fts_history` are compared. If the difference exceeds 10, both FTS tables are rebuilt via `INSERT INTO fts_*(fts_*) VALUES('rebuild')`. This guards against index drift after DB restores or backup imports. ### Raw SQLite access File: `server/src/db/connection.ts` The `better-sqlite3` instance is exported as `sqlite` alongside the Drizzle `db`. This is needed for FTS prepared statements (Drizzle has no FTS5 DSL). ### Search endpoint ``` GET /api/search?q=&limit= Authorization: Bearer ``` Response: ```json { "history": [ { "id": "...", "topicId": "...", "date": "2025-01-15", "snippet": "...text..." } ], "pages": [ { "id": "...", "title": "Page Title", "snippet": "...body text..." } ] } ``` - `q` must be ≥ 2 characters; shorter queries return empty results. - `limit` is capped at 20 server-side. - Each word in `q` is automatically appended with `*` for prefix matching (`"term"*`). - Results are ranked by BM25 (`ORDER BY rank`). - FTS5 query errors (invalid syntax from special characters) return empty results instead of HTTP 500. - Soft-deleted entries are excluded via the FTS delete-on-soft-delete strategy. File: `server/src/routes/search.ts` --- ## Client ### Settings store File: `client/src/lib/stores/settings.ts` Generic key-value settings backed by a Dexie `settings` table (version 13). Provides: - `getSetting(key, default)` — async one-time read - `setSetting(key, value)` — async write - `settingStore(key, default)` — reactive Svelte store backed by `liveQuery` The `searchResultsLimit` store (default: 3) controls how many server results are requested. ### CommandBar integration File: `client/src/lib/components/CommandBar.svelte` In navigate mode (query ≥ 2 chars, not starting with `/`): 1. **Immediately (sync):** Filters `$contextsQuery` and `$pagesQuery` by substring match on name/title. 2. **After 250ms debounce:** Calls `authFetch('/api/search?q=...&limit=...')` using the existing `apiClient` helper. 3. **On success:** Server results are appended after local results. Pages already found by title match are deduplicated. 4. **On error:** `isOffline = true`, a footer notice is shown, local results remain visible. 5. **Total results** are capped at 10. History results deep-link to `/context/daily-log?date=YYYY-MM-DD`. --- ## Settings | Key | Type | Default | Description | |---|---|---|---| | `searchResultsLimit` | number | 3 | Max server search results per entity type | To change: write to Dexie via `setSetting('searchResultsLimit', 5)` or add a Settings UI field. --- ## Scaling notes - FTS5 + BM25 scales to millions of rows. No action needed as data grows. - The `unicode61` tokenizer handles Unicode correctly. Stemming can be added later by changing `tokenize='unicode61'` to `tokenize='porter unicode61'` in the migration. - If topic title search needs FTS in future, add `fts_topics` following the same pattern. - Offline full-text search for history (e.g. via MiniSearch in a Web Worker) is a possible v2 enhancement.