feat(ai): 文档解析管线 — PDF 解析 + 切片 + 嵌入管线
- 简化版 parser:PDF(pdf-extract) + 纯文本 + 二进制兜底 - 固定窗口切片器(500 字符/50 重叠),5 个单元测试全通过 - DocumentService:手动/上传文档创建 → 切片 → 嵌入 → 存储 - UploadDocumentParams 结构体避免过多参数 - 移除未使用的 docx-rs/calamine 依赖 Phase 2 Task 7-9
This commit is contained in:
@@ -120,6 +120,9 @@ handlebars = "6"
|
||||
# HTML sanitization
|
||||
ammonia = "4"
|
||||
|
||||
# Document parsing
|
||||
pdf-extract = "0.7"
|
||||
|
||||
# Metrics
|
||||
metrics = "0.24"
|
||||
metrics-exporter-prometheus = "0.16"
|
||||
|
||||
Reference in New Issue
Block a user