Ka-Note/docs/feature-search.md

5.3 KiB

Full-Text Search

Overview

Ka-Note implements a hybrid full-text search strategy: small in-memory corpora (contexts, page titles) are filtered client-side; the large corpus (history entry text, page body) is indexed server-side using SQLite FTS5 and queried via HTTP.

Architecture

Search tiers

Entity Where Method
Contexts (name) Client only Substring on in-memory Svelte store
Pages (title) Client only Substring on in-memory Svelte store
HistoryEntries (text) Server FTS5 Debounced HTTP GET /api/search
Pages (body) Server FTS5 Debounced HTTP GET /api/search

History entries are the primary scaling concern (years of daily journals → tens of thousands of rows). SQLite FTS5 with BM25 ranking handles this efficiently without additional infrastructure.

Offline fallback

When the server is unreachable, CommandBar falls back to local results (contexts, page titles) only and shows a notice: "Server nicht erreichbar — nur lokale Ergebnisse".


Server

FTS5 tables

Migration: server/drizzle/0013_fts_search.sql

Two virtual tables using the unicode61 tokenizer (handles German umlauts correctly, no stemming):

  • fts_history — content table backed by history_entries (columns: text)
  • fts_pages — content table backed by pages (columns: title, body)

Both tables are populated via INSERT INTO fts_*(...) VALUES('rebuild') on first migration run.

Index maintenance

FTS index is updated synchronously after every write, covering all server-side write paths:

Write path File FTS update
Sync push (primary client sync) sync-service.tspushChanges() after each upsert
Trash / soft-delete routes/trash.ts after batch update
AI bundle upload (ZIP) ai-export-service.tsapplyOps() after each op
AI legacy JSON upload ai-export-service.tsapplyOps() after each op
Startup drift recovery index.ts setImmediate full rebuild if mismatch > 10

All paths use better-sqlite3 prepared statements. Shared helper applyOps() in ai-export-service.ts handles both upload variants. Soft-deleted rows are removed from FTS; active rows are re-indexed via INSERT OR REPLACE … SELECT.

Startup consistency check: On each server start, row counts of history_entries (non-deleted) and fts_history are compared. If the difference exceeds 10, both FTS tables are rebuilt via INSERT INTO fts_*(fts_*) VALUES('rebuild'). This guards against index drift after DB restores or backup imports.

Raw SQLite access

File: server/src/db/connection.ts

The better-sqlite3 instance is exported as sqlite alongside the Drizzle db. This is needed for FTS prepared statements (Drizzle has no FTS5 DSL).

Search endpoint

GET /api/search?q=<query>&limit=<n>
Authorization: Bearer <token>

Response:

{
  "history": [
    { "id": "...", "topicId": "...", "date": "2025-01-15", "snippet": "...text..." }
  ],
  "pages": [
    { "id": "...", "title": "Page Title", "snippet": "...body text..." }
  ]
}
  • q must be ≥ 2 characters; shorter queries return empty results.
  • limit is capped at 20 server-side.
  • Each word in q is automatically appended with * for prefix matching ("term"*).
  • Results are ranked by BM25 (ORDER BY rank).
  • FTS5 query errors (invalid syntax from special characters) return empty results instead of HTTP 500.
  • Soft-deleted entries are excluded via the FTS delete-on-soft-delete strategy.

File: server/src/routes/search.ts


Client

Settings store

File: client/src/lib/stores/settings.ts

Generic key-value settings backed by a Dexie settings table (version 13). Provides:

  • getSetting<T>(key, default) — async one-time read
  • setSetting<T>(key, value) — async write
  • settingStore<T>(key, default) — reactive Svelte store backed by liveQuery

The searchResultsLimit store (default: 3) controls how many server results are requested.

CommandBar integration

File: client/src/lib/components/CommandBar.svelte

In navigate mode (query ≥ 2 chars, not starting with /):

  1. Immediately (sync): Filters $contextsQuery and $pagesQuery by substring match on name/title.
  2. After 250ms debounce: Calls authFetch('/api/search?q=...&limit=...') using the existing apiClient helper.
  3. On success: Server results are appended after local results. Pages already found by title match are deduplicated.
  4. On error: isOffline = true, a footer notice is shown, local results remain visible.
  5. Total results are capped at 10.

History results deep-link to /context/daily-log?date=YYYY-MM-DD.


Settings

Key Type Default Description
searchResultsLimit number 3 Max server search results per entity type

To change: write to Dexie via setSetting('searchResultsLimit', 5) or add a Settings UI field.


Scaling notes

  • FTS5 + BM25 scales to millions of rows. No action needed as data grows.
  • The unicode61 tokenizer handles Unicode correctly. Stemming can be added later by changing tokenize='unicode61' to tokenize='porter unicode61' in the migration.
  • If topic title search needs FTS in future, add fts_topics following the same pattern.
  • Offline full-text search for history (e.g. via MiniSearch in a Web Worker) is a possible v2 enhancement.