- Markdown-aware content splitting (512 token chunks with 64 overlap) - CJK keyword extraction from chunk content with stop-word filtering - Full refresh strategy (delete old chunks → re-insert on update) - Phase 2 placeholder for vector embedding API integration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>