feat(ai): 文档解析管线 — PDF 解析 + 切片 + 嵌入管线

- 简化版 parser:PDF(pdf-extract) + 纯文本 + 二进制兜底
- 固定窗口切片器(500 字符/50 重叠),5 个单元测试全通过
- DocumentService:手动/上传文档创建 → 切片 → 嵌入 → 存储
- UploadDocumentParams 结构体避免过多参数
- 移除未使用的 docx-rs/calamine 依赖

Phase 2 Task 7-9
This commit is contained in:
iven
2026-05-27 00:13:08 +08:00
parent 23c5bbdb40
commit 0a1f4cb9a9
4 changed files with 523 additions and 0 deletions

View File

@@ -120,6 +120,9 @@ handlebars = "6"
# HTML sanitization
ammonia = "4"
# Document parsing
pdf-extract = "0.7"
# Metrics
metrics = "0.24"
metrics-exporter-prometheus = "0.16"