2026-03-08 Session Log

LinguaRAG: WORTLISTE-A1 RAG indexing, dual parallel search, chat text hover/selection UX with German-only highlighting

lingua-rag

WORTLISTE-A1 RAG Indexing

Indexed the A1 vocabulary list as a separate search source alongside the main textbook

What I Did

  • Dedicated indexing scriptscripts/index_wortliste.py for WORTLISTE-A1 (176 chunks covering topics like Kennenlernen, Familie, Essen, Fortbewegung)
  • Vocabulary searchVectorSearchRepository.search_vocabulary() with a stricter max_distance=0.65 (vs 0.7 for textbook) since vocabulary entries are shorter and more specific
  • Dual parallel RAGasyncio.gather() runs textbook search (unit-scoped, top 2) + vocabulary search (wortliste, top 2) simultaneously. Combined 4 chunks give Claude both lesson context and exact word definitions
  • Sentence validation filterisValidGermanSentence blocks hover popup on non-sentence content: exercise labels (b), c)), audio markers (MP3, CD), repeated word sequences, and strings with fewer than 3 words

Key Decisions

  • WORTLISTE with unit_id=None — the vocabulary list is organized by topic (not by lesson), so it’s searched globally without unit scoping
  • Stricter distance for vocabulary — exact word definitions match more precisely, so 0.65 threshold reduces false positives

Chat Text Hover & Selection UX

Added PDF-like hover highlighting and popup actions for German text in chat messages

What I Did

  • LineWithActions hover highlight — mouseover on German text shows yellow highlight with a popup containing speak/copy/inject/practice buttons
  • Smart text splitting — three helper functions work together:
    • splitAtArrow — splits at ”→” separator, highlights only the German (primary) portion
    • splitAtKorean — splits at the first Korean/CJK character boundary, excluding translations from the highlight
    • extractSpeakerPrefix — extracts “A: ”, “B: ”, “Leo:” prefixes from the highlight area (handles both plain strings and <strong>B:</strong> React elements)
  • Hover timinghoverHideTimer with 200ms delay allows the mouse to travel from text to popup without the popup disappearing. mouseLeave-based hover blocking prevents immediate reopen after selection popup closes
  • Yellow selection highlight::selection CSS scoped to .chat-messages only, matching the PDF viewer’s highlight color

Learnings

  • ReactMarkdown produces unexpected node structures like ["\n", "B: ", <strong>text</strong>] — the leading "\n" node caused extractSpeakerPrefix to check the wrong position. Discovered by logging Children.toArray output
  • getNodeText recursive helper was needed because ReactMarkdown <strong> elements can have arrays or nested elements as children, not just strings