fix(runtime): 禁用 DataMasking 中间件 — 正则过度匹配通用中文文本
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled

问题: DataMasking 中间件用正则 [^\s]{1,20}(?:公司|...) 匹配公司名,
将"有一家公司"等通用文本误判为公司实体,替换为 __ENTITY_1__ 占位符。
同时 LLM 响应路径缺少 unmask 逻辑,导致用户看到原始占位符。

修复:
- 禁用 DataMasking 中间件 (桌面端单用户场景无需脱敏)
- 在 AgentLoop 添加 data_masker + unmask 基础设施 (备用)
- 添加 unmask_text() 方法覆盖流式/非流式两条响应路径
- 保留 data_masking.rs 模块 (含改进正则和新增测试),待未来 NLP 方案启用

测试: 934 PASS, 0 FAIL
This commit is contained in:
iven
2026-04-22 17:24:46 +08:00
parent 8b3e43710b
commit 73d50fda21
3 changed files with 103 additions and 18 deletions

View File

@@ -365,16 +365,16 @@ impl Kernel {
chain.register(Arc::new(mw));
}
// Data masking middleware — mask sensitive entities before any other processing
// NOTE: Registration order does NOT determine execution order.
// The chain sorts by priority() ascending before execution.
// Execution order: Evolution(78) → ButlerRouter(80) → DataMasking(90) → ...
{
use std::sync::Arc;
let masker = Arc::new(zclaw_runtime::middleware::data_masking::DataMasker::new());
let mw = zclaw_runtime::middleware::data_masking::DataMaskingMiddleware::new(masker);
chain.register(Arc::new(mw));
}
// Data masking middleware — DISABLED for desktop single-user scenario.
// The regex-based approach over-matches common Chinese text (e.g. "有一家公司"
// gets masked as a company entity). Response unmask was also missing.
// Re-enable when NLP-based entity detection is available.
// {
// use std::sync::Arc;
// let masker = Arc::new(zclaw_runtime::middleware::data_masking::DataMasker::new());
// let mw = zclaw_runtime::middleware::data_masking::DataMaskingMiddleware::new(masker);
// chain.register(Arc::new(mw));
// }
// Growth integration — cached to avoid recreating empty scorer per request
let growth = {