2026-03-17 Session Log

LinguaRAG: CJK pronunciation, text annotations on PDF, native PDF selection, server-side vocabulary pagination, translation with Claude Haiku, Google One Tap login

lingua-rag

lingua-rag

Note panel UX improvements, pronunciation practice bug fixes, CJK word splitting, Chinese pinyin matching, Japanese furigana via backend morphological analysis

What I Did

  • Segment control for note panel — replaced ambiguous toggle button with [All|p.N] segment control so current filter state is always visible
  • CJK word splitting — extracted segmentWords utility using Intl.Segmenter API for Chinese/Japanese where spaces don’t exist between words
  • AudioContext fix — React StrictMode double-mount closes AudioContext but ref isn’t nulled; added getAudioCtx() helper that checks state === "closed" and recreates
  • Chinese pinyin matching — installed pinyin-pro (~15KB) for tone-stripped pinyin comparison in pronunciation practice. STT may return different characters with same pronunciation (做 vs 这), so comparing pinyin is more forgiving than comparing characters
  • Character-level normalize for ChineseIntl.Segmenter splits original text and STT transcript differently (e.g., ["我的", "天"] vs ["我", "的", "天"]), so Chinese text is split into individual characters before matching
  • Japanese furigana APIPOST /api/furigana using fugashi (MeCab wrapper) for morphological segmentation + pykakasi for hiragana/romaji conversion. Chose server-side over client-side kuroshiro because the dictionary is ~20MB
  • Smart token merging — auxiliary verbs (助動詞) merge with preceding verb stems for natural chip grouping. しています instead of ///ます
  • Unified annotation style — replaced <ruby> (above-text) with inline <span> (right-of-word) for both Chinese and Japanese

Key Decisions

  • fugashi over pykakasi alone — pykakasi groups all consecutive hiragana into one segment (e.g., があってとても), fugashi provides proper morphological boundaries
  • Backend processing over client-side — 20MB dictionary too heavy for browser; server handles it with no client cost

Learnings

  • Google STT cannot recognize single-syllable particles (啊, 呢, 吧) in isolation — requires at least 2 syllables. This is an engine-level constraint
  • pykakasi romanization has edge cases: なったnatsuta instead of natta. Contracted forms don’t always match standard Hepburn

Text Annotations on PDF

Built a custom text annotation system — place, edit, drag, resize text boxes on PDF pages with full DB persistence and extensive rendering optimization

What I Did

  • Text annotation feature — click-to-place text boxes on PDF pages using contentEditable divs with absolute positioning (x/y as percentages)
  • Toolbar — floating bar matching existing bottom toolbar style with font family, size, color, bold, italic, text-align, opacity controls
  • Style persistence — added style JSONB column to pdf_annotations table. All text styling stored as a single JSON object
  • Optimistic UI — text box renders immediately with a temporary ID (temp-${Date.now()}), API persists in background, temp→real ID swap on response. If API fails, annotation is removed
  • UX flow — T button activates text mode → click places box → type → click outside saves → double-click re-enters edit mode → drag handle moves, right-edge resizes

Performance Deep Dive

This was the most educational part. Adding text annotations exposed rendering bottlenecks in the existing highlight overlay system:

Problem 1: State coupling textAnnotations was initially stored in the same annotations state. Changing it triggered the highlight overlay effect that DOM-walks all page text layers — even though text annotations have nothing to do with text highlights. → Fix: Separated into textAnnotations state, then further extracted into React Context (TextAnnotationProvider) so changes don’t re-render PdfViewer at all.

Problem 2: Full DOM re-walk The highlight overlay effect used document.querySelectorAll("mark.note-highlight") to clean up ALL marks across ALL pages, then re-applied everything. → Fix: Track previous state per page via buildPageKey(). Only cleanup + reapply on pages whose key actually changed. Pages scrolled out of view get their marks cleaned up lazily.

Problem 3: React.memo invalidation handleTextAnnDelete had selectedTextAnnId in its dependency array — every selection change created a new function reference, invalidating React.memo for all 15 TextAnnotation components. → Fix: Used setSelectedTextAnnId(prev => prev === id ? null : prev) (updater function) to remove the dependency.

Problem 4: Style comparison by reference React.memo compared annotation.style by reference (===). But spread operations in state updates always create new objects even when values are identical. → Fix: Custom styleEqual function that compares each field individually.

Key Decisions

  • Custom implementation over npm packages — researched react-pdf-highlighter (highlight-only), @pdfme (PDF generation, not annotation), recogito-react-pdf (unmaintained 2 years), Syncfusion (commercial). No free OSS package supports free-text boxes on react-pdf
  • Context separationTextAnnotationProvider wraps PdfViewer. Text annotation state lives in context, not in PdfViewer’s local state. This is the single biggest performance win

Translation & Upload Pipeline

Claude Haiku translation, /init API consolidation, Google One Tap login, PDF upload modal, embedding retry improvements

What I Did

  • Claude Haiku translation — backend /api/translate endpoint uses Claude Haiku for logged-in users (~$0.00016/request), MyMemory free API for guests
  • Word vs sentence detection — words get 3 dictionary-style meanings, sentences get single translation. CJK uses character count (≤4 chars = word) since text.split() doesn’t work without spaces
  • 2-layer translation cache — L1 in-memory (both frontend and backend) + L2 persistent DB cache for words only. Sentences are too context-specific to cache
  • Quality gate — validates LLM output before caching: filters error strings, strips parentheses, extracts numbered lines. Bad outputs still returned to user but not persisted
  • GET /pdfs/{id}/init — single request replaces 4 individual fetches with asyncio.gather. Reduced initial load from 4 sequential round-trips to 1 parallel request
  • Google One Tap login — GSI script with signInWithIdToken + SHA-256 nonce. Supabase GoTrue expects hex-encoded hash (not base64url)
  • PDF upload modal — drag-and-drop + file picker, replacing direct OS file dialog
  • Embedding retry — failed batches retried 3 rounds with cooldown. executemany batch INSERT for chunks (~52s → seconds)

Key Decisions

  • Prompt version in cache key — cleaner than TTL expiration. Improving the prompt automatically invalidates all stale cached translations
  • Guest = MyMemory, logged-in = Claude — avoids API cost for unauthenticated users
  • Words cached in DB, sentences not — words have high cross-user reuse value, sentences are too context-specific

Native PDF Selection & TTS Cleanup

Replaced ~400 lines of custom selection code with native browser selection

What I Did

  • Removed custom selection system — deleted computeRangeRects, ensureTextContent, textContentCache, getCaretRange, custom overlay divs. pdf.js text layer already handles selection natively; the custom caretRangeFromPoint logic was causing DOM-order vs visual-order mismatch bugs
  • sel.toString() over extractTextFromRange — the TreeWalker-based extraction followed DOM order (not visual order), capturing wrong text on multi-line drag selections
  • Popup repositioning fix — added popupElRef guard in handleMouseUp to skip selection logic when clicking popup buttons (previously repositioned the popup on every click)
  • Removed TTS word-by-word highlight — deleted spokenWordIndex, ttsTimersRef, charWeights timing logic. The timing-based estimation was inaccurate; clean word chips UI is sufficient

Learnings

  • pdf.js text layer renders spans in DOM order which may differ from visual (reading) order on complex layouts. Native browser selection handles this correctly; custom code cannot without reimplementing the rendering engine’s layout logic

Server-Side Vocabulary Pagination

Rewrote vocabulary page with server-side pagination, persistent filters, and multiple code quality fixes

What I Did

  • Server-side pagination — backend list_all_for_user rewritten with offset/limit/search/language/sort_by params. Frontend sends searchParams to backend instead of fetching all and filtering client-side
  • Per-language localStorage persistence — page size, page number, language filter, and sort order are all saved per-language in localStorage. Restored on browser reload, reset to defaults on SPA re-entry
  • Reload vs re-entry detection — module-level hasBeenMounted flag + isPageReload() (Performance Navigation API) distinguishes browser refresh from SPA navigation
  • Fetch race condition fix — version counter (fetchIdRef) ignores stale responses when language switch triggers two rapid fetches
  • Auto-save on page viewer close — replaced window.confirm with pageViewerSaveRef callback pattern
  • Code quality fixes — Pydantic Field(ge=0, le=3) validation, authorization on check update, toLocaleLowerCase() for i18n-safe comparison, silent errors → toast notifications

Key Decisions

  • Module-level hasBeenMounted over Performance API aloneperformance.getEntriesByType("navigation") persists for the entire page lifecycle; after one reload, all subsequent SPA navigations still show type="reload"
  • Version counter over AbortController — simpler pattern for ignoring stale fetch responses; fetchIdRef.current++ invalidates in-flight requests without aborting them