#debugging

1 posts

PDF Indexing Pipeline: Unit Detection Guards and Copyright Filtering

Hard-won lessons from building a robust PDF chunker for a Korean-German textbook: multiple detection guards, line-level copyright stripping, and RAG behavior verification.