fix(tool): Windows UNC 路径规范 — PathValidator 路径比较一致性

- with_workspace() 对 workspace_root 做 canonicalize，确保与 resolve_and_validate 产出的 canonical 路径格式一致 - 新增 normalize_windows_path() 剥离 \?\ 前缀，解决 Windows 上 starts_with 比较失败问题 - check_blocked/check_allowed 统一使用规范化路径比较
fix(tool): 相对路径文件写入失败 — PathValidator 先基于 workspace 解析
2026-04-24 17:02:24 +08:00 · 2026-04-24 16:02:09 +08:00 · 2026-04-24 12:56:07 +08:00 · 2026-04-24 12:20:14 +08:00 · 2026-04-24 10:59:27 +08:00 · 2026-04-24 08:54:48 +08:00
56 changed files with 2838 additions and 755 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -165,10 +165,25 @@ desktop/src-tauri    (→ kernel, skills, hands, protocols)
 2. **自动验证** — `cargo check` / `cargo test` / `tsc --noEmit` / `vitest run` 必须通过
 3. **回归测试** — 跑受影响 crate 的全量测试，确认无回归
-#### 阶段 4: 提交 + 同步（立即，不积压）
+#### 阶段 4: Wiki 同步 + 提交（立即，不积压）
-1. **提交推送** — 按 §11 规范提交，**立即 `git push`**
+**Wiki 同步评估（硬门槛，不可跳过）**
-2. **文档同步** — 按 §8.3 检查并更新相关文档，提交并推送
+
 代码改完后、提交前，逐条回答以下问题。任何一条为"是"→ 必须更新对应 wiki 页面：
 | 评估问题 | 为"是"时更新 |
 |----------|-------------|
 | 这个改动修复或引入了 bug？ | 对应模块页"活跃问题+陷阱"节 + `wiki/known-issues.md` |
 | 这个改动改变了某个模块的行为或设计理由？ | 对应模块页"设计决策"节 |
 | 这个改动增删了文件或改变了目录结构？ | 对应模块页"关键文件"表 |
 | 这个改动影响了跨模块接口（谁调谁、参数形状、触发时机）？ | 涉及双方的"集成契约"表 |
 | 这个改动涉及一个必须始终成立的约束？ | 对应模块页"代码逻辑"节的 ⚡ 不变量 |
 | 这个改动改变了功能链路（前端→后端的完整路径）？ | `wiki/feature-map.md` 索引表 |
 | 这个改动改变了关键数字（命令数/Store数/测试数等）？ | `wiki/index.md` 关键数字表 + `docs/TRUTH.md` |
 全部回答完后，无论是否有更新，都追加一条到 `wiki/log.md` + 更新模块页"变更记录"节（保持 5 条）。
 **提交推送** — 按 §11 规范提交，**立即 `git push`**。详细文档同步规则见 §8.3。
 **铁律：不允许"等一下再提交"或"最后一起推送"。每个独立工作单元完成后立即推送。**
@@ -374,34 +389,44 @@ docs/
 每次完成功能实现、架构变更、问题修复后，**必须立即执行以下收尾**：
-#### 步骤 A：文档同步（代码提交前）
+#### 步骤 A：Wiki 同步（最高优先，代码提交前）
-检查以下文档是否需要更新，有变更则立即修改：
+> **为什么 wiki 排第一**：wiki 是新 AI 会话的启动燃料。如果 wiki 与代码不一致，后续所有会话都会基于错误上下文工作，错误会积累放大。
 在 §3.3 阶段 4 的评估表基础上，执行具体更新：
 | 触发事件 | 更新目标 | 更新内容 |
 |----------|---------|---------|
 | 修复 bug | 对应模块页"活跃问题+陷阱" | 修复→移除条目；新增→添加条目 |
 | 架构/设计变更 | 对应模块页"设计决策" | WHY 变了 + 新的权衡取舍 |
 | 文件增删/移动 | 对应模块页"关键文件"表 | 更新文件列表 |
 | 跨模块接口变化 | **涉及双方**的"集成契约"表 | 方向/接口/触发时机 |
 | 发现新的不变量 | 对应模块页"代码逻辑"节 | ⚡ 标记 + 一句话描述 |
 | 功能链路变化 | `wiki/feature-map.md` | 更新索引表对应行 |
 | 关键数字变化 | `wiki/index.md` + `docs/TRUTH.md` | 更新数字 + 验证命令 |
 | **每次收尾** | `wiki/log.md` + 模块页"变更记录" | 追加日志条目 + 变更记录保持 5 条 |
 **wiki 更新原则**：
 - 只记录代码不能告诉你的东西（WHY、跨模块关系、不变量、历史教训）
 - 模块页控制在 100-200 行，超出则归档到 `wiki/archive/`
 - 同一信息只出现在一个页面（单一真相源），其他页面只引用
 #### 步骤 B：其他文档同步
 1. **CLAUDE.md** — 项目结构、技术栈、工作流程、命令变化时
-2. **CLAUDE.md §13 架构快照** — 涉及子系统变更时，更新 `<!-- ARCH-SNAPSHOT-START/END -->` 标记区域（可执行 `/sync-arch` 技能自动分析）
+2. **CLAUDE.md §13 架构快照** — 涉及子系统变更时（可执行 `/sync-arch` 技能自动分析）
 3. **docs/ARCHITECTURE_BRIEF.md** — 架构决策或关键组件变更时
 4. **docs/features/** — 功能状态变化时
 5. **docs/knowledge-base/** — 新的排查经验或配置说明
 6. **wiki/** — 编译后知识库维护（按触发规则更新对应页面，每页统一 5 节: 设计决策 / 关键文件+集成契约 / 代码逻辑 / 活跃问题+陷阱 / 变更记录）：
   - 修复 bug → 更新对应模块页"活跃问题"节 + `wiki/known-issues.md` 索引
   - 架构变更 → 更新对应模块页"设计决策"节
   - 文件结构变化 → 更新对应模块页"关键文件"表
   - 跨模块接口变化 → 更新对应模块页"集成契约"表
   - 新增不变量发现 → 更新对应模块页"代码逻辑"节的 ⚡ 标记项
   - 功能链路变化 → 更新 `wiki/feature-map.md` 索引表
   - 数字变化 → 更新 `wiki/index.md` 关键数字表 + `docs/TRUTH.md`
   - 每次更新 → 在 `wiki/log.md` 追加一条记录 + 模块页"变更记录"节更新最近 5 条
 6. **docs/TRUTH.md** — 数字（命令数、Store 数、crates 数等）变化时
-#### 步骤 B：提交（按逻辑分组）
+#### 步骤 C：提交（按逻辑分组）
 ```
 代码变更 → 一个或多个逻辑提交
 文档变更 → 独立提交（如果和代码分开更清晰）
 ```
-#### 步骤 C：推送（立即）
+#### 步骤 D：推送（立即）
 ```
 git push
@@ -559,7 +584,7 @@ refactor(store): 统一 Store 数据获取方式
 ***
 <!-- ARCH-SNAPSHOT-START -->
-<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-15 -->
+<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-23 -->
 ## 13. 当前架构快照
@@ -567,51 +592,53 @@ refactor(store): 统一 Store 数据获取方式
 | 子系统 | 状态 | 最新变更 |
 |--------|------|----------|
-| 管家模式 (Butler) | ✅ 活跃 | 04-12 行业配置4行业 + 跨会话连续性 + <butler-context> XML fencing |
+| 管家模式 (Butler) | ✅ 活跃 | 04-23 跨会话身份(soul.md) + 动态建议(4路并行LLM驱动) + Agent tab 移除 |
-| Hermes 管线 | ✅ 活跃 | 04-12 触发信号持久化 + 经验行业维度 + 注入格式优化 |
+| Hermes 管线 | ✅ 活跃 | 04-23 experience_find_relevant Tauri 命令 + ExperienceBrief + OnceLock 单例 |
 | Intelligence Heartbeat | ✅ 活跃 | 04-15 统一健康快照 (health_snapshot.rs) + HeartbeatManager 重构 + HealthPanel 前端 |
-| 聊天流 (ChatStream) | ✅ 稳定 | 04-02 ChatStore 拆分为 4 Store (stream/conversation/message/chat) |
+| 聊天流 (ChatStream) | ✅ 活跃 | 04-23 LLM 动态建议(替换硬编码) + 澄清卡片 UX 优化 |
-| 记忆管道 (Memory) | ✅ 稳定 | 04-17 E2E 验证: 存储+FTS5+TF-IDF+注入闭环，去重+跨会话注入已修复 |
+| 记忆管道 (Memory) | ✅ 活跃 | 04-23 身份信号提取(agent_name/user_name) + ProfileSignals 增强 |
 | SaaS 认证 (Auth) | ✅ 稳定 | Token池 RPM/TPM 轮换 + JWT password_version 失效机制 |
-| Pipeline DSL | ✅ 稳定 | 04-01 17 个 YAML 模板 + DAG 执行器 |
+| Pipeline DSL | ✅ 稳定 | 04-01 18 个 YAML 模板 + DAG 执行器 |
-| Hands 系统 | ✅ 稳定 | 7 注册 (6 HAND.toml + _reminder)，Whiteboard/Slideshow/Speech 开发中 |
+| Hands 系统 | ✅ 稳定 | 7 注册 (6 HAND.toml + _reminder)，Whiteboard/Slideshow/Speech 已删除 |
 | 技能系统 (Skills) | ✅ 稳定 | 75 个 SKILL.md + 语义路由 |
-| 中间件链 | ✅ 稳定 | 13 层 (ButlerRouter@80, Compaction@100, Memory@150, Title@180, SkillIndex@200, DanglingTool@300, ToolError@350, ToolOutputGuard@360, Guardrail@400, LoopGuard@500, SubagentLimit@550, TrajectoryRecorder@650, TokenCalibration@700) |
+| 中间件链 | ✅ 稳定 | 14 层 + 分波并行 (Evolution@78✅, ButlerRouter@80✅, Compaction@100, Memory@150✅, Title@180✅, SkillIndex@200✅, DanglingTool@300, ToolError@350, ToolOutputGuard@360, Guardrail@400, LoopGuard@500, SubagentLimit@550, TrajectoryRecorder@650, TokenCalibration@700) — ✅=parallel_safe |
 ### 关键架构模式
 - **Hermes 管线**: 4模块闭环 — ExperienceStore(FTS5经验存取) + UserProfiler(结构化用户画像) + NlScheduleParser(中文时间→cron) + TrajectoryRecorder+Compressor(轨迹记录压缩)。通过中间件链+intelligence hooks调用
- **管家模式**: 双模式UI (默认简洁/解锁专业) + ButlerRouter 动态行业关键词(4内置+自定义) + <butler-context> XML fencing注入 + 跨会话连续性(痛点回访+经验检索) + 触发信号持久化(VikingStorage) + 冷启动4阶段hook
+- **管家模式**: 双模式UI (默认简洁/解锁专业) + ButlerRouter 动态行业关键词(4内置+自定义) + <butler-context> XML fencing注入 + 跨会话连续性(痛点回访+经验检索) + 触发信号持久化(VikingStorage) + 冷启动4阶段hook + 跨会话身份(soul.md) + 动态建议(4路并行LLM驱动2续问+1关怀)
- **聊天流**: 3种实现 → GatewayClient(WebSocket) / KernelClient(Tauri Event) / SaaSRelay(SSE) + 5min超时守护。详见 [ARCHITECTURE_BRIEF.md](docs/ARCHITECTURE_BRIEF.md)
+- **聊天流**: 3种实现 → GatewayClient(WebSocket) / KernelClient(Tauri Event) / SaaSRelay(SSE) + 5min超时守护。动态建议: prefetch context + generateLLMSuggestions(1追问+1行动+1关怀) 与 memory extraction 解耦。详见 [ARCHITECTURE_BRIEF.md](docs/ARCHITECTURE_BRIEF.md)
 - **客户端路由**: `getClient()` 4分支决策树 → Admin路由 / SaaS Relay(可降级到本地) / Local Kernel / External Gateway
 - **SaaS 认证**: JWT→OS keyring 存储 + HttpOnly cookie + Token池 RPM/TPM 限流轮换 + SaaS unreachable 自动降级
- **记忆闭环**: 对话→extraction_adapter→FTS5全文+TF-IDF权重→检索→注入系统提示（E2E 04-17 验证通过，去重+跨会话注入已修复）
+- **记忆闭环**: 对话→extraction_adapter→FTS5全文+TF-IDF权重→检索→注入系统提示 + 身份信号提取(agent_name/user_name)→VikingStorage→soul.md→跨会话名字记忆
 - **LLM 驱动**: 4 Rust Driver (Anthropic/OpenAI/Gemini/Local) + 国内兼容 (DeepSeek/Qwen/Moonshot 通过 base_url)
 ### 最近变更
-1. [04-21] Embedding 接通 + 自学习自动化 A线+B线: 记忆检索Embedding(GrowthIntegration→MemoryRetriever→SemanticScorer) + Skill路由Embedding+LLM Fallback(替换new_tf_idf_only) + evolution_bridge(SkillCandidate→SkillManifest) + generate_and_register_skill()全链路 + EvolutionMiddleware双模式(auto/suggest) + QualityGate加固(长度/标题/置信度上限)。验证: 934 tests PASS
+1. [04-23] 回复效率+建议生成并行化: identity prompt 缓存 + pre-hook 并行(tokio::join!) + middleware 分波并行(parallel_safe, 5层✅) + suggestion context 预取 + 建议与 memory 解耦 + prompt 重写(1追问+1行动+1关怀)
-2. [04-21] Phase 0+1 突破之路 8 项基础链路修复: 经验积累覆盖修复(reuse_count累积) + Skill工具调用桥接(complete_with_tools) + Hand字段映射(runId) + Heartbeat痛点感知 + Browser委托消息 + 跨会话检索增强(IdentityRecall 26→43模式+弱身份fallback) + Twitter凭据持久化。验证: 912 tests PASS
+2. [04-23] 动态建议智能化: fetchSuggestionContext 4路并行(用户画像/痛点/经验/技能匹配) + generateLLMSuggestions 混合型 prompt (2续问+1管家关怀) + experience_find_relevant Tauri 命令 + ExperienceBrief
-2. [04-17] 全系统 E2E 测试 129 链路: 82 PASS / 20 PARTIAL / 1 FAIL / 26 SKIP，有效通过率 79.1%。7 项 Bug 修复 (Dashboard 404/记忆去重/记忆注入/invoice_id/Prompt版本/agent隔离/行业字段)
+3. [04-23] 跨会话身份: detectAgentNameSuggestion trigger+extract 两步法(10 trigger) + ProfileSignals agent_name/user_name + soul.md 写回 + Agent tab 移除 (~280 行 dead code 清理)
-2. [04-16] 3 项 P0 修复 + 5 项 E2E Bug 修复 + Agent 面板刷新 + TRUTH.md 数字校准
+4. [04-22] Wiki 全面重构: 5节模板+集成契约+症状导航+归档压缩，净减 ~1,200 行
-3. [04-15] Heartbeat 统一健康系统: health_snapshot.rs 统一收集器(LLM连接/记忆/会话/系统资源) + heartbeat.rs HeartbeatManager 重构 + HealthPanel.tsx 前端面板 + Tauri 命令 182→183 + intelligence 模块 15→16 文件 + 删除 intelligence-client/ 9 废弃文件
+4. [04-22] 跨会话记忆断裂修复 + DataMasking 中间件移除 + 搜索功能修复(多引擎+质量过滤+SSE行缓冲)
-4. [04-12] 行业配置+管家主动性 全栈 5 Phase: 行业数据模型+4内置配置+ButlerRouter动态关键词+触发信号+Tauri加载+Admin管理页面+跨会话连续性+XML fencing注入格式
+5. [04-21] Embedding 接通 + 自学习自动化 A线+B线 + Phase 0+1 突破之路 8 项链路修复。验证: 934 tests PASS
-5. [04-09] Hermes Intelligence Pipeline 4 Chunk: ExperienceStore+Extractor, UserProfileStore+Profiler, NlScheduleParser, TrajectoryRecorder+Compressor (684 tests, 0 failed)
+6. [04-20] 50 轮功能链路审计 7 项断链修复 (42/50 = 84% 通过率)
-6. [04-09] 管家模式6交付物完成: ButlerRouter + 冷启动 + 简洁模式UI + 桥测试 + 发布文档
+7. [04-17] 全系统 E2E 测试 129 链路: 82 PASS / 20 PARTIAL / 1 FAIL / 26 SKIP，有效通过率 79.1%
 <!-- ARCH-SNAPSHOT-END -->
 <!-- ARCH-SNAPSHOT-END -->
 <!-- ANTI-PATTERN-START -->
-<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-09 -->
+<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-23 -->
 ## 14. AI 协作注意事项
 ### 反模式警告
- ❌ **不要**建议新增 SaaS API 端点 — 已有 140 个，稳定化约束禁止新增
+- ❌ **不要**建议新增 SaaS API 端点 — 已有 137 个，稳定化约束禁止新增
 - ❌ **不要**忽略管家模式 — 已上线且为默认模式，所有聊天经过 ButlerRouter
 - ❌ **不要**假设 Tauri 直连 LLM — 实际通过 SaaS Token 池中转，SaaS unreachable 时降级到本地 Kernel
- ❌ **不要**建议从零实现已有能力 — 先查 Hand(9个)/Skill(75个)/Pipeline(17模板) 现有库
+- ❌ **不要**建议从零实现已有能力 — 先查 Hand(7注册)/Skill(75个)/Pipeline(18模板) 现有库
 - ❌ **不要**在 CLAUDE.md 以外创建项目级配置或规则文件 — 单一入口原则
 ### 场景化指令
@@ -620,6 +647,75 @@ refactor(store): 统一 Store 数据获取方式
 - 当遇到**认证相关** → 记住 Tauri 模式用 OS keyring 存 JWT，SaaS 模式用 HttpOnly cookie
 - 当遇到**新功能建议** → 先查 [TRUTH.md](docs/TRUTH.md) 确认可用能力清单，避免重复建设
 - 当遇到**记忆/上下文相关** → 记住闭环已接通: FTS5+TF-IDF+embedding，不是空壳
- 当遇到**管家/Butler** → 管家模式是默认模式，ButlerRouter 在中间件链中做关键词分类+system prompt 增强
+- 当遇到**管家/Butler** → 管家模式是默认模式，ButlerRouter 在中间件链中做关键词分类+system prompt 增强。跨会话身份走 soul.md，动态建议走 4 路并行上下文+LLM
 <!-- ANTI-PATTERN-END -->
 ***
 ## 15. Karpathy 编码原则
 > 源自 Andrej Karpathy 对 LLM 编码问题的观察。偏向谨慎而非速度，简单任务可灵活判断。
 ### 15.1 Think Before Coding
 **Don't assume. Don't hide confusion. Surface tradeoffs.**
 - State assumptions explicitly. If uncertain, ask.
 - If multiple interpretations exist, present them — don't pick silently.
 - If a simpler approach exists, say so. Push back when warranted.
 - If something is unclear, stop. Name what's confusing. Ask.
 ### 15.2 Simplicity First
 **Minimum code that solves the problem. Nothing speculative.**
 - No features beyond what was asked.
 - No abstractions for single-use code.
 - No "flexibility" or "configurability" that wasn't requested.
 - No error handling for impossible scenarios.
 - If you write 200 lines and it could be 50, rewrite it.
 Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
 ### 15.3 Surgical Changes
 **Touch only what you must. Clean up only your own mess.**
 When editing existing code:
 - Don't "improve" adjacent code, comments, or formatting.
 - Don't refactor things that aren't broken.
 - Match existing style, even if you'd do it differently.
 - If you notice unrelated dead code, mention it — don't delete it.
 When your changes create orphans:
 - Remove imports/variables/functions that YOUR changes made unused.
 - Don't remove pre-existing dead code unless asked.
 The test: Every changed line should trace directly to the user's request.
 ### 15.4 Goal-Driven Execution
 **Define success criteria. Loop until verified.**
 Transform tasks into verifiable goals:
 - "Add validation" → "Write tests for invalid inputs, then make them pass"
 - "Fix the bug" → "Write a test that reproduces it, then make it pass"
 - "Refactor X" → "Ensure tests pass before and after"
 For multi-step tasks, state a brief plan:
 ```
 1. [Step] → verify: [check]
 2. [Step] → verify: [check]
 3. [Step] → verify: [check]
 ```
 Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
 ---
 **These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
--- a/crates/zclaw-kernel/src/kernel/messaging.rs
+++ b/crates/zclaw-kernel/src/kernel/messaging.rs
@@ -117,7 +117,9 @@ impl Kernel {
    }
 }
-use zclaw_runtime::{AgentLoop, tool::builtin::PathValidator};
+use std::sync::Arc;
 use zclaw_runtime::{AgentLoop, LlmDriver, tool::builtin::PathValidator};
 use zclaw_runtime::driver::{RetryDriver, RetryConfig};
 use super::Kernel;
 use super::super::MessageResponse;
@@ -161,9 +163,12 @@ impl Kernel {
        let subagent_enabled = chat_mode.as_ref().and_then(|m| m.subagent_enabled).unwrap_or(false);
        let tools = self.create_tool_registry(subagent_enabled);
        self.skill_executor.set_tool_registry(tools.clone());
        let driver: Arc<dyn LlmDriver> = Arc::new(
            RetryDriver::new(self.driver.clone(), RetryConfig::default())
        );
        let mut loop_runner = AgentLoop::new(
            *agent_id,
-            self.driver.clone(),
+            driver,
            tools,
            self.memory.clone(),
        )
@@ -275,9 +280,12 @@ impl Kernel {
        let subagent_enabled = chat_mode.as_ref().and_then(|m| m.subagent_enabled).unwrap_or(false);
        let tools = self.create_tool_registry(subagent_enabled);
        self.skill_executor.set_tool_registry(tools.clone());
        let driver: Arc<dyn LlmDriver> = Arc::new(
            RetryDriver::new(self.driver.clone(), RetryConfig::default())
        );
        let mut loop_runner = AgentLoop::new(
            *agent_id,
-            self.driver.clone(),
+            driver,
            tools,
            self.memory.clone(),
        )
@@ -426,6 +434,7 @@ impl Kernel {
        prompt.push_str("- Provide clear options when possible\n");
        prompt.push_str("- Include brief context about why you're asking\n");
        prompt.push_str("- After receiving clarification, proceed immediately\n");
        prompt.push_str("- CRITICAL: When calling ask_clarification, do NOT repeat the options in your text response. The options will be shown in a dedicated card above your reply. Simply greet the user and briefly explain why you need clarification — avoid phrases like \"以下信息\" or \"the following options\" that imply a list follows in your text\n");
        prompt
    }
--- a/crates/zclaw-kernel/tests/hand_chain.rs
+++ b/crates/zclaw-kernel/tests/hand_chain.rs
@@ -31,6 +31,8 @@ async fn seam_hand_tool_routing() {
                input_tokens: 10,
                output_tokens: 20,
                stop_reason: "tool_use".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ])
        // Second stream: final text after tool executes
@@ -40,6 +42,8 @@ async fn seam_hand_tool_routing() {
                input_tokens: 10,
                output_tokens: 5,
                stop_reason: "end_turn".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ]);
@@ -105,6 +109,8 @@ async fn seam_hand_execution_callback() {
                input_tokens: 10,
                output_tokens: 5,
                stop_reason: "tool_use".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ])
        .with_stream_chunks(vec![
@@ -113,6 +119,8 @@ async fn seam_hand_execution_callback() {
                input_tokens: 5,
                output_tokens: 1,
                stop_reason: "end_turn".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ]);
@@ -173,6 +181,8 @@ async fn seam_generic_tool_routing() {
                input_tokens: 10,
                output_tokens: 5,
                stop_reason: "tool_use".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ])
        .with_stream_chunks(vec![
@@ -181,6 +191,8 @@ async fn seam_generic_tool_routing() {
                input_tokens: 5,
                output_tokens: 3,
                stop_reason: "end_turn".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ]);
--- a/crates/zclaw-kernel/tests/smoke_hands.rs
+++ b/crates/zclaw-kernel/tests/smoke_hands.rs
@@ -27,6 +27,8 @@ async fn smoke_hands_full_lifecycle() {
                input_tokens: 15,
                output_tokens: 10,
                stop_reason: "tool_use".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ])
        // After hand_quiz returns, LLM generates final response
@@ -36,6 +38,8 @@ async fn smoke_hands_full_lifecycle() {
                input_tokens: 20,
                output_tokens: 5,
                stop_reason: "end_turn".to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            },
        ]);
--- a/crates/zclaw-runtime/src/compaction.rs
+++ b/crates/zclaw-runtime/src/compaction.rs
@@ -14,6 +14,7 @@
 use std::sync::Arc;
 use std::sync::atomic::{AtomicU64, Ordering};
 use serde_json::Value;
 use zclaw_types::{AgentId, Message, SessionId};
 use crate::driver::{CompletionRequest, ContentBlock, LlmDriver};
@@ -136,7 +137,7 @@ pub fn update_calibration(estimated: usize, actual: u32) {
 }
 /// Estimate total tokens for messages with calibration applied.
-fn estimate_messages_tokens_calibrated(messages: &[Message]) -> usize {
+pub fn estimate_messages_tokens_calibrated(messages: &[Message]) -> usize {
    let raw = estimate_messages_tokens(messages);
    let factor = get_calibration_factor();
    if (factor - 1.0).abs() < f64::EPSILON {
@@ -178,7 +179,7 @@ pub fn compact_messages(messages: Vec<Message>, keep_recent: usize) -> (Vec<Mess
    let old_messages = &messages[..split_index];
    let recent_messages = &messages[split_index..];
-    let summary = generate_summary(old_messages);
+    let summary = generate_summary(old_messages, None);
    let removed_count = old_messages.len();
    let mut compacted = Vec::with_capacity(1 + recent_messages.len());
@@ -188,6 +189,38 @@ pub fn compact_messages(messages: Vec<Message>, keep_recent: usize) -> (Vec<Mess
    (compacted, removed_count)
 }
 /// Prune old tool outputs to reduce token consumption. Runs before compaction.
 /// Only prunes ToolResult messages older than PRUNE_AGE_THRESHOLD messages.
 const PRUNE_AGE_THRESHOLD: usize = 8;
 const PRUNE_MAX_CHARS: usize = 2000;
 const PRUNE_KEEP_HEAD_CHARS: usize = 500;
 pub fn prune_tool_outputs(messages: &mut [Message]) -> usize {
    let total = messages.len();
    let mut pruned_count = 0;
    for i in 0..total.saturating_sub(PRUNE_AGE_THRESHOLD) {
        if let Message::ToolResult { output, is_error, .. } = &mut messages[i] {
            if *is_error { continue; }
            let text = match output {
                Value::String(ref s) => s.clone(),
                ref other => other.to_string(),
            };
            if text.len() <= PRUNE_MAX_CHARS { continue; }
            let end = text.floor_char_boundary(PRUNE_KEEP_HEAD_CHARS.min(text.len()));
            *output = serde_json::json!({
                "_pruned": true,
                "_original_chars": text.len(),
                "head": &text[..end],
            });
            pruned_count += 1;
        }
    }
    pruned_count
 }
 /// Check if compaction should be triggered and perform it if needed.
 ///
 /// Returns the (possibly compacted) message list.
@@ -315,6 +348,18 @@ pub async fn maybe_compact_with_config(
        .iter()
        .take_while(|m| matches!(m, Message::System { .. }))
        .count();
    // Extract previous summary from leading system messages for iterative summarization
    let previous_summary = messages.iter()
        .take(leading_system_count)
        .filter_map(|m| match m {
            Message::System { content } if content.starts_with("[以下是之前对话的摘要]") => {
                Some(content.clone())
            }
            _ => None,
        })
        .next();
    let keep_from_end = DEFAULT_KEEP_RECENT
        .min(messages.len().saturating_sub(leading_system_count));
    let split_index = messages.len().saturating_sub(keep_from_end);
@@ -333,14 +378,16 @@ pub async fn maybe_compact_with_config(
    let recent_messages = &messages[split_index..];
    let removed_count = old_messages.len();
-    // Step 3: Generate summary (LLM or rule-based)
+    // Step 3: Generate summary (LLM or rule-based), with iterative context
    let prev_ref = previous_summary.as_deref();
    let summary = if config.use_llm {
        if let Some(driver) = driver {
-            match generate_llm_summary(driver, old_messages, config.summary_max_tokens).await {
+            match generate_llm_summary(driver, old_messages, prev_ref, config.summary_max_tokens).await {
                Ok(llm_summary) => {
                    tracing::info!(
-                        "[Compaction] Generated LLM summary ({} chars)",
+                        "[Compaction] Generated LLM summary ({} chars, iterative={})",
-                        llm_summary.len()
+                        llm_summary.len(),
                        previous_summary.is_some()
                    );
                    llm_summary
                }
@@ -350,7 +397,7 @@ pub async fn maybe_compact_with_config(
                            "[Compaction] LLM summary failed: {}, falling back to rules",
                            e
                        );
-                        generate_summary(old_messages)
+                        generate_summary(old_messages, prev_ref)
                    } else {
                        tracing::warn!(
                            "[Compaction] LLM summary failed: {}, returning original messages",
@@ -369,10 +416,10 @@ pub async fn maybe_compact_with_config(
            tracing::warn!(
                "[Compaction] LLM compaction requested but no driver available, using rules"
            );
-            generate_summary(old_messages)
+            generate_summary(old_messages, prev_ref)
        }
    } else {
-        generate_summary(old_messages)
+        generate_summary(old_messages, prev_ref)
    };
    let used_llm = config.use_llm && driver.is_some();
@@ -398,9 +445,11 @@ pub async fn maybe_compact_with_config(
 }
 /// Generate a summary using an LLM driver.
 /// If `previous_summary` is provided, builds on it iteratively.
 async fn generate_llm_summary(
    driver: &Arc<dyn LlmDriver>,
    messages: &[Message],
    previous_summary: Option<&str>,
    max_tokens: u32,
 ) -> Result<String, String> {
    let mut conversation_text = String::new();
@@ -437,11 +486,21 @@ async fn generate_llm_summary(
        conversation_text.push_str("\n...(对话已截断)");
    }
-    let prompt = format!(
+    let prompt = match previous_summary {
        Some(prev) => format!(
            "你是一个对话摘要助手。\n\n\
             ## 上一轮摘要\n{}\n\n\
             ## 新增对话内容\n{}\n\n\
             请在上一轮摘要的基础上更新，保留所有关键决策、用户偏好和文件操作。\
             输出200字以内的中文摘要。",
            prev, conversation_text
        ),
        None => format!(
            "请用简洁的中文总结以下对话的关键信息。保留重要的讨论主题、决策、结论和待办事项。\
             输出格式为段落式摘要，不超过200字。\n\n{}",
            conversation_text
-    );
+        ),
    };
    let request = CompletionRequest {
        model: String::new(),
@@ -484,13 +543,22 @@ async fn generate_llm_summary(
 }
 /// Generate a rule-based summary of old messages.
-fn generate_summary(messages: &[Message]) -> String {
+/// If `previous_summary` is provided, carries forward key info.
 fn generate_summary(messages: &[Message], previous_summary: Option<&str>) -> String {
    if messages.is_empty() {
        return "[对话开始]".to_string();
    }
    let mut sections: Vec<String> = vec!["[以下是之前对话的摘要]".to_string()];
    // Carry forward previous summary if available
    if let Some(prev) = previous_summary {
        // Strip the header line from previous summary for cleaner nesting
        let prev_body = prev.strip_prefix("[以下是之前对话的摘要]\n")
            .unwrap_or(prev);
        sections.push(format!("[上轮摘要保留]: {}", truncate(prev_body, 200)));
    }
    let mut user_count = 0;
    let mut assistant_count = 0;
    let mut topics: Vec<String> = Vec::new();
@@ -696,8 +764,21 @@ mod tests {
            Message::user("How does ownership work?"),
            Message::assistant("Ownership is Rust's memory management system"),
        ];
-        let summary = generate_summary(&messages);
+        let summary = generate_summary(&messages, None);
        assert!(summary.contains("摘要"));
        assert!(summary.contains("2"));
    }
    #[test]
    fn test_generate_summary_iterative() {
        let messages = vec![
            Message::user("What is async/await?"),
            Message::assistant("Async/await is a concurrency model"),
        ];
        let prev = "[以下是之前对话的摘要]\n讨论主题: Rust; 所有权\n(已压缩 4 条消息)";
        let summary = generate_summary(&messages, Some(prev));
        assert!(summary.contains("摘要"));
        assert!(summary.contains("上轮摘要保留"));
        assert!(summary.contains("所有权"));
    }
 }
--- a/crates/zclaw-runtime/src/driver/anthropic.rs
+++ b/crates/zclaw-runtime/src/driver/anthropic.rs
@@ -121,6 +121,8 @@ impl LlmDriver for AnthropicDriver {
            let mut byte_stream = response.bytes_stream();
            let mut current_tool_id: Option<String> = None;
            let mut tool_input_buffer = String::new();
            let mut cache_creation_input_tokens: Option<u32> = None;
            let mut cache_read_input_tokens: Option<u32> = None;
            while let Some(chunk_result) = byte_stream.next().await {
                let chunk = match chunk_result {
@@ -141,6 +143,15 @@ impl LlmDriver for AnthropicDriver {
                        match serde_json::from_str::<AnthropicStreamEvent>(data) {
                            Ok(event) => {
                                match event.event_type.as_str() {
                                    "message_start" => {
                                        // Capture cache token info from message_start event
                                        if let Some(msg) = event.message {
                                            if let Some(usage) = msg.usage {
                                                cache_creation_input_tokens = usage.cache_creation_input_tokens;
                                                cache_read_input_tokens = usage.cache_read_input_tokens;
                                            }
                                        }
                                    }
                                    "content_block_delta" => {
                                        if let Some(delta) = event.delta {
                                            if let Some(text) = delta.text {
@@ -186,6 +197,8 @@ impl LlmDriver for AnthropicDriver {
                                                    input_tokens: msg.usage.as_ref().map(|u| u.input_tokens).unwrap_or(0),
                                                    output_tokens: msg.usage.as_ref().map(|u| u.output_tokens).unwrap_or(0),
                                                    stop_reason: msg.stop_reason.unwrap_or_else(|| "end_turn".to_string()),
                                                    cache_creation_input_tokens,
                                                    cache_read_input_tokens,
                                                });
                                            }
                                        }
@@ -298,7 +311,15 @@ impl AnthropicDriver {
        AnthropicRequest {
            model: request.model.clone(),
            max_tokens: effective_max,
-            system: request.system.clone(),
+            system: request.system.as_ref().map(|s| {
                vec![SystemContentBlock {
                    r#type: "text".to_string(),
                    text: s.clone(),
                    cache_control: Some(CacheControl {
                        r#type: "ephemeral".to_string(),
                    }),
                }]
            }),
            messages,
            tools: if tools.is_empty() { None } else { Some(tools) },
            temperature: request.temperature,
@@ -337,18 +358,35 @@ impl AnthropicDriver {
            input_tokens: api_response.usage.input_tokens,
            output_tokens: api_response.usage.output_tokens,
            stop_reason,
            cache_creation_input_tokens: api_response.usage.cache_creation_input_tokens,
            cache_read_input_tokens: api_response.usage.cache_read_input_tokens,
        }
    }
 }
 // Anthropic API types
 /// Anthropic cache_control 标记
 #[derive(Serialize, Clone)]
 struct CacheControl {
    r#type: String, // "ephemeral"
 }
 /// Anthropic system prompt 内容块（支持 cache_control）
 #[derive(Serialize, Clone)]
 struct SystemContentBlock {
    r#type: String, // "text"
    text: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    cache_control: Option<CacheControl>,
 }
 #[derive(Serialize)]
 struct AnthropicRequest {
    model: String,
    max_tokens: u32,
    #[serde(skip_serializing_if = "Option::is_none")]
-    system: Option<String>,
+    system: Option<Vec<SystemContentBlock>>,
    messages: Vec<AnthropicMessage>,
    #[serde(skip_serializing_if = "Option::is_none")]
    tools: Option<Vec<AnthropicTool>>,
@@ -404,6 +442,10 @@ struct AnthropicContentBlock {
 struct AnthropicUsage {
    input_tokens: u32,
    output_tokens: u32,
    #[serde(default)]
    cache_creation_input_tokens: Option<u32>,
    #[serde(default)]
    cache_read_input_tokens: Option<u32>,
 }
 // Streaming types
@@ -458,4 +500,8 @@ struct AnthropicStreamUsage {
    input_tokens: u32,
    #[serde(default)]
    output_tokens: u32,
    #[serde(default)]
    cache_creation_input_tokens: Option<u32>,
    #[serde(default)]
    cache_read_input_tokens: Option<u32>,
 }
--- a/crates/zclaw-runtime/src/driver/error_classifier.rs
+++ b/crates/zclaw-runtime/src/driver/error_classifier.rs
@@ -0,0 +1,139 @@
 //! LLM 错误分类器。将 HTTP 状态码 + 错误体映射为 LlmErrorKind。
 use std::time::Duration;
 use zclaw_types::{LlmErrorKind, ClassifiedLlmError};
 /// 分类 LLM 错误
 pub fn classify_llm_error(
    provider: &str,
    status: u16,
    body: &str,
    is_timeout: bool,
 ) -> ClassifiedLlmError {
    let _ = provider; // reserved for per-provider overrides
    if is_timeout {
        return ClassifiedLlmError {
            kind: LlmErrorKind::Timeout,
            retryable: true,
            should_compress: false,
            should_rotate_credential: false,
            retry_after: None,
            message: "请求超时".to_string(),
        };
    }
    match status {
        401 | 403 => ClassifiedLlmError {
            kind: LlmErrorKind::Auth,
            retryable: false,
            should_compress: false,
            should_rotate_credential: true,
            retry_after: None,
            message: "认证失败，请检查 API Key".to_string(),
        },
        402 => {
            let is_quota_transient = body.contains("retry")
                || body.contains("limit")
                || body.contains("usage");
            ClassifiedLlmError {
                kind: if is_quota_transient { LlmErrorKind::RateLimited } else { LlmErrorKind::BillingExhausted },
                retryable: is_quota_transient,
                should_compress: false,
                should_rotate_credential: !is_quota_transient,
                retry_after: if is_quota_transient { Some(Duration::from_secs(30)) } else { None },
                message: if is_quota_transient { "使用限制，稍后重试".to_string() } else { "计费额度已耗尽".to_string() },
            }
        }
        429 => ClassifiedLlmError {
            kind: LlmErrorKind::RateLimited,
            retryable: true,
            should_compress: false,
            should_rotate_credential: true,
            retry_after: parse_retry_after(body),
            message: "速率限制".to_string(),
        },
        529 => ClassifiedLlmError {
            kind: LlmErrorKind::Overloaded,
            retryable: true,
            should_compress: false,
            should_rotate_credential: false,
            retry_after: Some(Duration::from_secs(5)),
            message: "提供商过载".to_string(),
        },
        500 | 502 => ClassifiedLlmError {
            kind: LlmErrorKind::ServerError,
            retryable: true,
            should_compress: false,
            should_rotate_credential: false,
            retry_after: None,
            message: "服务端错误".to_string(),
        },
        503 => ClassifiedLlmError {
            kind: LlmErrorKind::Overloaded,
            retryable: true,
            should_compress: false,
            should_rotate_credential: false,
            retry_after: Some(Duration::from_secs(3)),
            message: "服务暂时不可用".to_string(),
        },
        400 => {
            let is_context_overflow = body.contains("context_length")
                || body.contains("max_tokens")
                || body.contains("too many tokens")
                || body.contains("prompt is too long");
            ClassifiedLlmError {
                kind: if is_context_overflow { LlmErrorKind::ContextOverflow } else { LlmErrorKind::Unknown },
                retryable: false,
                should_compress: is_context_overflow,
                should_rotate_credential: false,
                retry_after: None,
                message: if is_context_overflow {
                    "上下文过长，需要压缩".to_string()
                } else {
                    format!("请求错误: {}", &body[..body.len().min(200)])
                },
            }
        }
        404 => ClassifiedLlmError {
            kind: LlmErrorKind::ModelNotFound,
            retryable: false,
            should_compress: false,
            should_rotate_credential: false,
            retry_after: None,
            message: "模型不存在".to_string(),
        },
        _ => ClassifiedLlmError {
            kind: LlmErrorKind::Unknown,
            retryable: true,
            should_compress: false,
            should_rotate_credential: false,
            retry_after: None,
            message: format!("未知错误 ({}) {}", status, &body[..body.len().min(200)]),
        },
    }
 }
 fn parse_retry_after(body: &str) -> Option<Duration> {
    // Anthropic: "Please retry after X seconds"
    // OpenAI: "Please retry after Xms"
    if let Some(secs) = extract_retry_seconds(body) {
        return Some(Duration::from_secs(secs));
    }
    if let Some(ms) = extract_retry_millis(body) {
        return Some(Duration::from_millis(ms));
    }
    Some(Duration::from_secs(2))
 }
 fn extract_retry_seconds(body: &str) -> Option<u64> {
    let re = regex::Regex::new(r"retry\s+(?:after\s+)?(\d+)\s*(?:s|sec|seconds?)").ok()?;
    let caps = re.captures(body)?;
    caps[1].parse().ok()
 }
 fn extract_retry_millis(body: &str) -> Option<u64> {
    let re = regex::Regex::new(r"retry\s+(?:after\s+)?(\d+)\s*ms").ok()?;
    let caps = re.captures(body)?;
    caps[1].parse().ok()
 }
--- a/crates/zclaw-runtime/src/driver/gemini.rs
+++ b/crates/zclaw-runtime/src/driver/gemini.rs
@@ -238,6 +238,8 @@ impl LlmDriver for GeminiDriver {
                                                input_tokens,
                                                output_tokens,
                                                stop_reason: stop_reason.to_string(),
                                                cache_creation_input_tokens: None,
                                                cache_read_input_tokens: None,
                                            });
                                        }
                                    }
@@ -500,6 +502,8 @@ impl GeminiDriver {
            input_tokens,
            output_tokens,
            stop_reason,
            cache_creation_input_tokens: None,
            cache_read_input_tokens: None,
        }
    }
 }
--- a/crates/zclaw-runtime/src/driver/local.rs
+++ b/crates/zclaw-runtime/src/driver/local.rs
@@ -238,6 +238,8 @@ impl LocalDriver {
            input_tokens,
            output_tokens,
            stop_reason,
            cache_creation_input_tokens: None,
            cache_read_input_tokens: None,
        }
    }
@@ -396,6 +398,8 @@ impl LlmDriver for LocalDriver {
                                input_tokens: 0,
                                output_tokens: 0,
                                stop_reason: "end_turn".to_string(),
                                cache_creation_input_tokens: None,
                                cache_read_input_tokens: None,
                            });
                            continue;
                        }
--- a/crates/zclaw-runtime/src/driver/mod.rs
+++ b/crates/zclaw-runtime/src/driver/mod.rs
@@ -15,11 +15,14 @@ mod anthropic;
 mod openai;
 mod gemini;
 mod local;
 mod error_classifier;
 mod retry_driver;
 pub use anthropic::AnthropicDriver;
 pub use openai::OpenAiDriver;
 pub use gemini::GeminiDriver;
 pub use local::LocalDriver;
 pub use retry_driver::{RetryDriver, RetryConfig};
 /// LLM Driver trait - unified interface for all providers
 #[async_trait]
@@ -106,6 +109,12 @@ pub struct CompletionResponse {
    pub output_tokens: u32,
    /// Stop reason
    pub stop_reason: StopReason,
    /// Cache creation input tokens (Anthropic prompt caching)
    #[serde(default)]
    pub cache_creation_input_tokens: Option<u32>,
    /// Cache read input tokens (Anthropic prompt caching)
    #[serde(default)]
    pub cache_read_input_tokens: Option<u32>,
 }
 /// LLM driver response content block (subset of canonical zclaw_types::ContentBlock).
--- a/crates/zclaw-runtime/src/driver/openai.rs
+++ b/crates/zclaw-runtime/src/driver/openai.rs
@@ -222,10 +222,13 @@ impl LlmDriver for OpenAiDriver {
                                let parsed_args: serde_json::Value = if args.is_empty() {
                                    serde_json::json!({})
                                } else {
-                                    serde_json::from_str(args).unwrap_or_else(|e| {
+                                    match serde_json::from_str(args) {
-                                        tracing::warn!("[OpenAI] Failed to parse tool args '{}': {}, using empty object", args, e);
+                                        Ok(v) => v,
-                                        serde_json::json!({})
+                                        Err(e) => {
-                                    })
+                                            tracing::error!("[OpenAI] Failed to parse tool call '{}' args: {}. Raw: {}", name, e, &args[..args.len().min(200)]);
                                            serde_json::json!({ "_parse_error": e.to_string(), "_raw_args": args[..args.len().min(500)].to_string() })
                                        }
                                    }
                                };
                                yield Ok(StreamChunk::ToolUseEnd {
                                    id: id.clone(),
@@ -237,6 +240,8 @@ impl LlmDriver for OpenAiDriver {
                                input_tokens: 0,
                                output_tokens: 0,
                                stop_reason: "end_turn".to_string(),
                                cache_creation_input_tokens: None,
                                cache_read_input_tokens: None,
                            });
                            continue;
                        }
@@ -638,6 +643,8 @@ impl OpenAiDriver {
            input_tokens,
            output_tokens,
            stop_reason,
            cache_creation_input_tokens: None,
            cache_read_input_tokens: None,
        }
    }
@@ -761,6 +768,8 @@ impl OpenAiDriver {
                    StopReason::StopSequence => "stop",
                    StopReason::Error => "error",
                }.to_string(),
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            });
        })
    }
--- a/crates/zclaw-runtime/src/driver/retry_driver.rs
+++ b/crates/zclaw-runtime/src/driver/retry_driver.rs
@@ -0,0 +1,123 @@
 //! RetryDriver: LlmDriver 的重试装饰器。
 //! 仅在本地 Kernel 路径使用，SaaS Relay 已有自己的重试逻辑。
 use std::sync::Arc;
 use std::time::Duration;
 use async_trait::async_trait;
 use futures::Stream;
 use rand::Rng;
 use zclaw_types::{Result, ZclawError};
 use super::{LlmDriver, CompletionRequest, CompletionResponse, StreamChunk};
 use super::error_classifier::classify_llm_error;
 /// 重试配置
 #[derive(Debug, Clone)]
 pub struct RetryConfig {
    pub max_attempts: u32,
    pub base_delay_secs: f64,
    pub max_delay_secs: f64,
    pub jitter_ratio: f64,
 }
 impl Default for RetryConfig {
    fn default() -> Self {
        Self {
            max_attempts: 3,
            base_delay_secs: 1.0,
            max_delay_secs: 8.0,
            jitter_ratio: 0.5,
        }
    }
 }
 /// 重试装饰器
 pub struct RetryDriver {
    inner: Arc<dyn LlmDriver>,
    config: RetryConfig,
 }
 impl RetryDriver {
    pub fn new(inner: Arc<dyn LlmDriver>, config: RetryConfig) -> Self {
        Self { inner, config }
    }
    fn jittered_backoff(&self, attempt: u32) -> Duration {
        let base = self.config.base_delay_secs * 2_f64.powi(attempt as i32);
        let capped = base.min(self.config.max_delay_secs);
        let mut rng = rand::thread_rng();
        let jitter = capped * self.config.jitter_ratio * rng.gen::<f64>();
        Duration::from_secs_f64(capped + jitter)
    }
 }
 #[async_trait]
 impl LlmDriver for RetryDriver {
    fn provider(&self) -> &str {
        self.inner.provider()
    }
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse> {
        let mut last_error: Option<ZclawError> = None;
        for attempt in 0..self.config.max_attempts {
            match self.inner.complete(request.clone()).await {
                Ok(response) => return Ok(response),
                Err(e) => {
                    let message = e.to_string();
                    let status = extract_status_from_error(&message);
                    let classified = classify_llm_error(
                        self.inner.provider(),
                        status,
                        &message,
                        message.contains("timeout") || message.contains("Timeout"),
                    );
                    if !classified.retryable {
                        return Err(e);
                    }
                    if classified.should_compress {
                        return Err(ZclawError::LlmError(
                            format!("[CONTEXT_OVERFLOW] {}", message)
                        ));
                    }
                    last_error = Some(e);
                    if attempt + 1 < self.config.max_attempts {
                        let delay = classified.retry_after
                            .unwrap_or_else(|| self.jittered_backoff(attempt));
                        tracing::warn!(
                            "[RetryDriver] Attempt {}/{} failed ({}), retrying in {:.1}s",
                            attempt + 1, self.config.max_attempts, classified.message,
                            delay.as_secs_f64()
                        );
                        tokio::time::sleep(delay).await;
                    }
                }
            }
        }
        Err(last_error.unwrap_or_else(|| ZclawError::LlmError("重试耗尽".to_string())))
    }
    fn stream(
        &self,
        request: CompletionRequest,
    ) -> std::pin::Pin<Box<dyn Stream<Item = Result<StreamChunk>> + Send + '_>> {
        // 流式路径不重试——部分 delta 已发送，重试会导致 UI 重复
        self.inner.stream(request)
    }
    fn is_configured(&self) -> bool {
        self.inner.is_configured()
    }
 }
 fn extract_status_from_error(message: &str) -> u16 {
    let re = regex::Regex::new(r"(?:error|status)[:\s]+(\d{3})").ok();
    re.and_then(|re| re.captures(message))
        .and_then(|caps| caps[1].parse().ok())
        .unwrap_or(0)
 }
--- a/crates/zclaw-runtime/src/loop_runner.rs
+++ b/crates/zclaw-runtime/src/loop_runner.rs
@@ -4,10 +4,11 @@ use std::sync::Arc;
 use futures::StreamExt;
 use tokio::sync::mpsc;
 use zclaw_types::{AgentId, SessionId, Message, Result};
 use serde_json::Value;
 use crate::driver::{LlmDriver, CompletionRequest, ContentBlock};
 use crate::stream::StreamChunk;
-use crate::tool::{ToolRegistry, ToolContext, SkillExecutor, HandExecutor};
+use crate::tool::{ToolRegistry, ToolContext, SkillExecutor, HandExecutor, ToolConcurrency};
 use crate::tool::builtin::PathValidator;
 use crate::growth::GrowthIntegration;
 use crate::compaction::{self, CompactionConfig};
@@ -303,8 +304,28 @@ impl AgentLoop {
                plan_mode: self.plan_mode,
            };
-            // Call LLM
+            // Call LLM with context-overflow recovery
-            let response = self.driver.complete(request).await?;
+            let response = match self.driver.complete(request).await {
                Ok(r) => r,
                Err(e) => {
                    let err_str = e.to_string();
                    if err_str.contains("[CONTEXT_OVERFLOW]") && self.compaction_threshold > 0 {
                        tracing::warn!("[AgentLoop] Context overflow detected, triggering emergency compaction");
                        let pruned = compaction::prune_tool_outputs(&mut messages);
                        if pruned > 0 {
                            tracing::info!("[AgentLoop] Emergency pruning removed {} tool outputs", pruned);
                        }
                        let keep_recent = messages.len().saturating_sub(messages.len() / 3);
                        let (compacted, removed) = compaction::compact_messages(messages, keep_recent.max(4));
                        if removed > 0 {
                            tracing::info!("[AgentLoop] Emergency compaction removed {} messages", removed);
                            messages = compacted;
                            continue; // retry the iteration with compacted messages
                        }
                    }
                    return Err(e);
                }
            };
            total_input_tokens += response.input_tokens;
            total_output_tokens += response.output_tokens;
@@ -375,21 +396,22 @@ impl AgentLoop {
            let tool_context = self.create_tool_context(session_id.clone());
            let mut abort_result: Option<AgentLoopResult> = None;
            let mut clarification_result: Option<AgentLoopResult> = None;
-            for (id, name, input) in tool_calls {
+
-                // Check if loop was already aborted
+            // Phase 1: Pre-process inputs + middleware checks (serial)
-                if abort_result.is_some() {
+            struct ToolPlan {
-                    break;
+                idx: usize,
                id: String,
                name: String,
                input: Value,
            }
            let mut plans: Vec<ToolPlan> = Vec::new();
            for (idx, (id, name, input)) in tool_calls.into_iter().enumerate() {
                if abort_result.is_some() { break; }
                // GLM and other models sometimes send tool calls with empty arguments `{}`
                // Inject the last user message as a fallback query so the tool can infer intent.
                let input = if input.as_object().map_or(false, |obj| obj.is_empty()) {
                    if let Some(last_user_msg) = messages.iter().rev().find_map(|m| {
-                        if let Message::User { content } = m {
+                        if let Message::User { content } = m { Some(content.clone()) } else { None }
                            Some(content.clone())
                        } else {
                            None
                        }
                    }) {
                        tracing::info!("[AgentLoop] Tool '{}' received empty input, injecting user message as fallback query", name);
                        serde_json::json!({ "_fallback_query": last_user_msg })
@@ -400,9 +422,7 @@ impl AgentLoop {
                    input
                };
-                // Check tool call safety — via middleware chain
+                let mw_ctx = middleware::MiddlewareContext {
                {
                    let mw_ctx_ref = middleware::MiddlewareContext {
                    agent_id: self.agent_id.clone(),
                    session_id: session_id.clone(),
                    user_input: input.to_string(),
@@ -412,29 +432,16 @@ impl AgentLoop {
                    input_tokens: total_input_tokens,
                    output_tokens: total_output_tokens,
                };
-                    match self.middleware_chain.run_before_tool_call(&mw_ctx_ref, &name, &input).await? {
+                match self.middleware_chain.run_before_tool_call(&mw_ctx, &name, &input).await? {
-                        middleware::ToolCallDecision::Allow => {}
+                    middleware::ToolCallDecision::Allow => {
                        plans.push(ToolPlan { idx, id, name, input });
                    }
                    middleware::ToolCallDecision::Block(msg) => {
                        tracing::warn!("[AgentLoop] Tool '{}' blocked by middleware: {}", name, msg);
-                            let error_output = serde_json::json!({ "error": msg });
+                        messages.push(Message::tool_result(&id, zclaw_types::ToolId::new(&name), serde_json::json!({ "error": msg }), true));
                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
                            continue;
                    }
                    middleware::ToolCallDecision::ReplaceInput(new_input) => {
-                            // Execute with replaced input (with timeout)
+                        plans.push(ToolPlan { idx, id, name, input: new_input });
                            let tool_result = match tokio::time::timeout(
                                std::time::Duration::from_secs(30),
                                self.execute_tool(&name, new_input, &tool_context),
                            ).await {
                                Ok(Ok(result)) => result,
                                Ok(Err(e)) => serde_json::json!({ "error": e.to_string() }),
                                Err(_) => {
                                    tracing::warn!("[AgentLoop] Tool '{}' (replaced input) timed out after 30s", name);
                                    serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", name) })
                                }
                            };
                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), tool_result, false));
                            continue;
                    }
                    middleware::ToolCallDecision::AbortLoop(reason) => {
                        tracing::warn!("[AgentLoop] Loop aborted by middleware: {}", reason);
@@ -450,21 +457,76 @@ impl AgentLoop {
                }
            }
            // Phase 2: Execute tools (parallel for ReadOnly, serial for others)
            if abort_result.is_none() && !plans.is_empty() {
                let (parallel_plans, sequential_plans): (Vec<_>, Vec<_>) = plans.iter()
                    .partition(|p| {
                        self.tools.get(&p.name)
                            .map(|t| t.concurrency())
                            .unwrap_or(ToolConcurrency::Exclusive) == ToolConcurrency::ReadOnly
                    });
                let mut results: std::collections::HashMap<usize, (String, String, serde_json::Value)> = std::collections::HashMap::new();
                // Execute parallel (ReadOnly) tools with JoinSet (max 3 concurrent)
                if !parallel_plans.is_empty() {
                    let semaphore = Arc::new(tokio::sync::Semaphore::new(3));
                    let mut join_set = tokio::task::JoinSet::new();
                    for plan in &parallel_plans {
                        let tool = self.tools.get(&plan.name).unwrap();
                        let ctx = tool_context.clone();
                        let input = plan.input.clone();
                        let idx = plan.idx;
                        let id = plan.id.clone();
                        let name = plan.name.clone();
                        let permit = semaphore.clone().acquire_owned().await.unwrap();
                        join_set.spawn(async move {
                            let result = tokio::time::timeout(
                                std::time::Duration::from_secs(30),
                                tool.execute(input, &ctx)
                            ).await;
                            drop(permit);
                            (idx, id, name, result)
                        });
                    }
                    while let Some(res) = join_set.join_next().await {
                        match res {
                            Ok((idx, id, name, Ok(Ok(value)))) => {
                                results.insert(idx, (id, name, value));
                            }
                            Ok((idx, id, name, Ok(Err(e)))) => {
                                results.insert(idx, (id, name, serde_json::json!({ "error": e.to_string() })));
                            }
                            Ok((idx, id, name, Err(_))) => {
                                tracing::warn!("[AgentLoop] Tool '{}' timed out after 30s (parallel)", name);
                                results.insert(idx, (id, name.clone(), serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", name) })));
                            }
                            Err(e) => {
                                tracing::warn!("[AgentLoop] JoinError in parallel tool execution: {}", e);
                            }
                        }
                    }
                }
                // Execute sequential (Exclusive/Interactive) tools
                for plan in &sequential_plans {
                    let tool_result = match tokio::time::timeout(
                        std::time::Duration::from_secs(30),
-                    self.execute_tool(&name, input, &tool_context),
+                        self.execute_tool(&plan.name, plan.input.clone(), &tool_context),
                    ).await {
                        Ok(Ok(result)) => result,
                        Ok(Err(e)) => serde_json::json!({ "error": e.to_string() }),
                        Err(_) => {
-                        tracing::warn!("[AgentLoop] Tool '{}' timed out after 30s", name);
+                            tracing::warn!("[AgentLoop] Tool '{}' timed out after 30s", plan.name);
-                        serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", name) })
+                            serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", plan.name) })
                        }
                    };
-                // Check if this is a clarification response — terminate loop immediately
+                    // Check if this is a clarification response
-                // so the LLM waits for user input instead of continuing to generate.
+                    if plan.name == "ask_clarification"
                if name == "ask_clarification"
                        && tool_result.get("status").and_then(|v| v.as_str()) == Some("clarification_needed")
                    {
                        tracing::info!("[AgentLoop] Clarification requested, terminating loop");
@@ -472,12 +534,7 @@ impl AgentLoop {
                            .and_then(|v| v.as_str())
                            .unwrap_or("需要更多信息")
                            .to_string();
-                    messages.push(Message::tool_result(
+                        results.insert(plan.idx, (plan.id.clone(), plan.name.clone(), tool_result));
                        id,
                        zclaw_types::ToolId::new(&name),
                        tool_result,
                        false,
                    ));
                        self.memory.append_message(&session_id, &Message::assistant(&question)).await?;
                        clarification_result = Some(AgentLoopResult {
                            response: question,
@@ -487,14 +544,30 @@ impl AgentLoop {
                        });
                        break;
                    }
                    results.insert(plan.idx, (plan.id.clone(), plan.name.clone(), tool_result));
                }
-                // Add tool result to messages
+                // Push results in original tool_call order
-                messages.push(Message::tool_result(
+                let mut sorted_indices: Vec<usize> = results.keys().copied().collect();
-                    id,
+                sorted_indices.sort();
-                    zclaw_types::ToolId::new(&name),
+                for idx in sorted_indices {
-                    tool_result,
+                    let (id, name, result) = results.remove(&idx).unwrap();
-                    false, // is_error - we include errors in the result itself
+                    // Run after_tool_call middleware (error counting, output guard, etc.)
-                ));
+                    let mut mw_ctx = middleware::MiddlewareContext {
                        agent_id: self.agent_id.clone(),
                        session_id: session_id.clone(),
                        user_input: String::new(),
                        system_prompt: enhanced_prompt.clone(),
                        messages: messages.clone(),
                        response_content: Vec::new(),
                        input_tokens: total_input_tokens,
                        output_tokens: total_output_tokens,
                    };
                    if let Err(e) = self.middleware_chain.run_after_tool_call(&mut mw_ctx, &name, &result).await {
                        tracing::warn!("[AgentLoop] after_tool_call middleware failed for '{}': {}", name, e);
                    }
                    messages.push(Message::tool_result(&id, zclaw_types::ToolId::new(&name), result, false));
                }
            }
            // Continue the loop - LLM will process tool results and generate final response
@@ -647,6 +720,7 @@ impl AgentLoop {
                let mut stream = driver.stream(request);
                let mut pending_tool_calls: Vec<(String, String, serde_json::Value)> = Vec::new();
                let mut completed_tool_ids: std::collections::HashSet<String> = std::collections::HashSet::new();
                let mut iteration_text = String::new();
                let mut reasoning_text = String::new(); // Track reasoning separately for API requirement
@@ -703,6 +777,7 @@ impl AgentLoop {
                                    // Update with final parsed input and emit ToolStart event
                                    if let Some(tool) = pending_tool_calls.iter_mut().find(|(tid, _, _)| tid == id) {
                                        tool.2 = input.clone();
                                        completed_tool_ids.insert(id.clone());
                                        if let Err(e) = tx.send(LoopEvent::ToolStart { name: tool.1.clone(), input: input.clone() }).await {
                                            tracing::warn!("[AgentLoop] Failed to send ToolStart event: {}", e);
                                        }
@@ -810,11 +885,27 @@ impl AgentLoop {
                    break 'outer;
                }
-                // Skip tool processing if stream errored or timed out
+                // Handle stream errors — execute complete tool calls, cancel incomplete ones
                if stream_errored {
-                    tracing::debug!("[AgentLoop] Stream errored, skipping tool processing and breaking");
+                    // Cancel incomplete tools (ToolStart sent but ToolUseEnd not received)
                    let incomplete: Vec<_> = pending_tool_calls.iter()
                        .filter(|(id, _, _)| !completed_tool_ids.contains(id))
                        .collect();
                    for (_, name, _) in &incomplete {
                        tracing::warn!("[AgentLoop] Cancelling incomplete tool '{}' due to stream error", name);
                        let error_output = serde_json::json!({ "error": "流式响应中断，工具调用未完成" });
                        if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output }).await {
                            tracing::warn!("[AgentLoop] Failed to send cancellation ToolEnd event: {}", e);
                        }
                    }
                    // Retain only complete tools for execution
                    pending_tool_calls.retain(|(id, _, _)| completed_tool_ids.contains(id));
                    if pending_tool_calls.is_empty() {
                        tracing::debug!("[AgentLoop] Stream errored with no complete tool calls, breaking");
                        break 'outer;
                    }
                    tracing::info!("[AgentLoop] Stream errored but executing {} complete tool calls", pending_tool_calls.len());
                }
                tracing::debug!("[AgentLoop] Processing {} tool calls (reasoning: {} chars)", pending_tool_calls.len(), reasoning_text.len());
@@ -830,12 +921,12 @@ impl AgentLoop {
                    messages.push(Message::tool_use(id, zclaw_types::ToolId::new(name), input.clone()));
                }
-                // Execute tools
+                // Execute tools — Phase 1: Pre-process through middleware (serial)
-                for (id, name, input) in pending_tool_calls {
+                struct StreamToolPlan { idx: usize, id: String, name: String, input: Value }
-                    tracing::debug!("[AgentLoop] Executing tool: name={}, input={:?}", name, input);
+                let mut plans: Vec<StreamToolPlan> = Vec::new();
-
+                let mut abort_loop = false;
-                    // Check tool call safety — via middleware chain
+                for (idx, (id, name, input)) in pending_tool_calls.into_iter().enumerate() {
-                    {
+                    if abort_loop { break; }
                    let mw_ctx = middleware::MiddlewareContext {
                        agent_id: agent_id.clone(),
                        session_id: session_id_clone.clone(),
@@ -847,7 +938,9 @@ impl AgentLoop {
                        output_tokens: total_output_tokens,
                    };
                    match middleware_chain.run_before_tool_call(&mw_ctx, &name, &input).await {
-                            Ok(middleware::ToolCallDecision::Allow) => {}
+                        Ok(middleware::ToolCallDecision::Allow) => {
                            plans.push(StreamToolPlan { idx, id, name, input });
                        }
                        Ok(middleware::ToolCallDecision::Block(msg)) => {
                            tracing::warn!("[AgentLoop] Tool '{}' blocked by middleware: {}", name, msg);
                            let error_output = serde_json::json!({ "error": msg });
@@ -855,59 +948,16 @@ impl AgentLoop {
                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                            }
                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
-                                continue;
+                        }
                        Ok(middleware::ToolCallDecision::ReplaceInput(new_input)) => {
                            plans.push(StreamToolPlan { idx, id, name, input: new_input });
                        }
                        Ok(middleware::ToolCallDecision::AbortLoop(reason)) => {
                            tracing::warn!("[AgentLoop] Loop aborted by middleware: {}", reason);
                            if let Err(e) = tx.send(LoopEvent::Error(reason)).await {
                                tracing::warn!("[AgentLoop] Failed to send Error event: {}", e);
                            }
-                                break 'outer;
+                            abort_loop = true;
                            }
                            Ok(middleware::ToolCallDecision::ReplaceInput(new_input)) => {
                                // Execute with replaced input (same path_validator logic below)
                                let pv = path_validator.clone().unwrap_or_else(|| {
                                    let home = std::env::var("USERPROFILE")
                                        .or_else(|_| std::env::var("HOME"))
                                        .unwrap_or_else(|_| ".".to_string());
                                    PathValidator::new().with_workspace(std::path::PathBuf::from(&home))
                                });
                                let working_dir = pv.workspace_root()
                                    .map(|p| p.to_string_lossy().to_string());
                                let tool_context = ToolContext {
                                    agent_id: agent_id.clone(),
                                    working_directory: working_dir,
                                    session_id: Some(session_id_clone.to_string()),
                                    skill_executor: skill_executor.clone(),
                                    hand_executor: hand_executor.clone(),
                                    path_validator: Some(pv),
                                    event_sender: Some(tx.clone()),
                                };
                                let (result, is_error) = if let Some(tool) = tools.get(&name) {
                                    match tool.execute(new_input, &tool_context).await {
                                        Ok(output) => {
                                            if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: output.clone() }).await {
                                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                                            }
                                            (output, false)
                                        }
                                        Err(e) => {
                                            let error_output = serde_json::json!({ "error": e.to_string() });
                                            if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
                                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                                            }
                                            (error_output, true)
                                        }
                                    }
                                } else {
                                    let error_output = serde_json::json!({ "error": format!("Unknown tool: {}", name) });
                                    if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
                                        tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                                    }
                                    (error_output, true)
                                };
                                messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), result, is_error));
                                continue;
                        }
                        Err(e) => {
                            tracing::error!("[AgentLoop] Middleware error for tool '{}': {}", name, e);
@@ -916,19 +966,23 @@ impl AgentLoop {
                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                            }
                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
                                continue;
                        }
                    }
                }
-                    // Use pre-resolved path_validator (already has default fallback from create_tool_context logic)
+                if abort_loop { break 'outer; }
                if plans.is_empty() {
                    tracing::debug!("[AgentLoop] No tools to execute after middleware filtering");
                    break 'outer;
                }
                // Build shared tool context
                let pv = path_validator.clone().unwrap_or_else(|| {
                    let home = std::env::var("USERPROFILE")
                        .or_else(|_| std::env::var("HOME"))
                        .unwrap_or_else(|_| ".".to_string());
                    PathValidator::new().with_workspace(std::path::PathBuf::from(&home))
                });
-                    let working_dir = pv.workspace_root()
+                let working_dir = pv.workspace_root().map(|p| p.to_string_lossy().to_string());
                        .map(|p| p.to_string_lossy().to_string());
                let tool_context = ToolContext {
                    agent_id: agent_id.clone(),
                    working_directory: working_dir,
@@ -939,78 +993,120 @@ impl AgentLoop {
                    event_sender: Some(tx.clone()),
                };
-                    let (result, is_error) = if let Some(tool) = tools.get(&name) {
+                // Phase 2: Execute tools (parallel for ReadOnly, serial for others)
-                        tracing::debug!("[AgentLoop] Tool '{}' found, executing...", name);
+                let (parallel_plans, sequential_plans): (Vec<_>, Vec<_>) = plans.iter()
-                        match tool.execute(input.clone(), &tool_context).await {
+                    .partition(|p| {
-                            Ok(output) => {
+                        tools.get(&p.name)
-                                tracing::debug!("[AgentLoop] Tool '{}' executed successfully: {:?}", name, output);
+                            .map(|t| t.concurrency())
-                                if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: output.clone() }).await {
+                            .unwrap_or(ToolConcurrency::Exclusive) == ToolConcurrency::ReadOnly
-                                    tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
+                    });
                let mut results: std::collections::HashMap<usize, (String, String, serde_json::Value, bool)> = std::collections::HashMap::new();
                // Execute parallel (ReadOnly) tools with JoinSet (max 3 concurrent)
                if !parallel_plans.is_empty() {
                    let sem = Arc::new(tokio::sync::Semaphore::new(3));
                    let mut join_set = tokio::task::JoinSet::new();
                    for plan in &parallel_plans {
                        let tool_ctx = tool_context.clone();
                        let input = plan.input.clone();
                        let idx = plan.idx;
                        let id = plan.id.clone();
                        let name = plan.name.clone();
                        let tools_ref = tools.clone();
                        let permit = sem.clone().acquire_owned().await.unwrap();
                        join_set.spawn(async move {
                            let result = if let Some(tool) = tools_ref.get(&name) {
                                tokio::time::timeout(std::time::Duration::from_secs(30), tool.execute(input, &tool_ctx)).await
                            } else {
                                Ok(Err(zclaw_types::ZclawError::Internal(format!("Unknown tool: {}", name))))
                            };
                            drop(permit);
                            (idx, id, name, result)
                        });
                    }
-                                (output, false)
+                    while let Some(res) = join_set.join_next().await {
                        match res {
                            Ok((idx, id, name, Ok(Ok(value)))) => {
                                results.insert(idx, (id, name, value, false));
                            }
                            Ok((idx, id, name, Ok(Err(e)))) => {
                                results.insert(idx, (id, name, serde_json::json!({ "error": e.to_string() }), true));
                            }
                            Ok((idx, id, name, Err(_))) => {
                                tracing::warn!("[AgentLoop] Tool '{}' timed out (parallel, 30s)", name);
                                results.insert(idx, (id, name.clone(), serde_json::json!({ "error": format!("工具 '{}' 执行超时", name) }), true));
                            }
                            Err(e) => {
-                                tracing::error!("[AgentLoop] Tool '{}' execution failed: {}", name, e);
+                                tracing::warn!("[AgentLoop] JoinError in parallel tool execution: {}", e);
                                let error_output = serde_json::json!({ "error": e.to_string() });
                                if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
                                    tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                            }
                                (error_output, true)
                        }
                    }
                }
                // Execute sequential (Exclusive/Interactive) tools
                for plan in &sequential_plans {
                    let (result, is_error) = if let Some(tool) = tools.get(&plan.name) {
                        match tool.execute(plan.input.clone(), &tool_context).await {
                            Ok(output) => (output, false),
                            Err(e) => (serde_json::json!({ "error": e.to_string() }), true),
                        }
                    } else {
-                        tracing::error!("[AgentLoop] Tool '{}' not found in registry", name);
+                        (serde_json::json!({ "error": format!("Unknown tool: {}", plan.name) }), true)
                        let error_output = serde_json::json!({ "error": format!("Unknown tool: {}", name) });
                        if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
                            tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                        }
                        (error_output, true)
                    };
-                    // Check if this is a clarification response — break outer loop
+                    // Check clarification (only from sequential tools — ask_clarification is Interactive)
-                    if name == "ask_clarification"
+                    if plan.name == "ask_clarification"
                        && result.get("status").and_then(|v| v.as_str()) == Some("clarification_needed")
                    {
                        tracing::info!("[AgentLoop] Streaming: Clarification requested, terminating loop");
-                        let question = result.get("question")
+                        let question = result.get("question").and_then(|v| v.as_str()).unwrap_or("需要更多信息").to_string();
-                            .and_then(|v| v.as_str())
+                        messages.push(Message::tool_result(plan.id.clone(), zclaw_types::ToolId::new(&plan.name), result, is_error));
-                            .unwrap_or("需要更多信息")
+                        if let Err(e) = tx.send(LoopEvent::Delta(question.clone())).await { tracing::warn!("{}", e); }
-                            .to_string();
+                        if let Err(e) = tx.send(LoopEvent::Complete(AgentLoopResult { response: question.clone(), input_tokens: total_input_tokens, output_tokens: total_output_tokens, iterations: iteration })).await { tracing::warn!("{}", e); }
-                        messages.push(Message::tool_result(
+                        if let Err(e) = memory.append_message(&session_id_clone, &Message::assistant(&question)).await { tracing::warn!("{}", e); }
                            id,
                            zclaw_types::ToolId::new(&name),
                            result,
                            is_error,
                        ));
                        // Send the question as final delta so the user sees it
                        if let Err(e) = tx.send(LoopEvent::Delta(question.clone())).await {
                            tracing::warn!("[AgentLoop] Failed to send Delta event: {}", e);
                        }
                        if let Err(e) = tx.send(LoopEvent::Complete(AgentLoopResult {
                            response: question.clone(),
                            input_tokens: total_input_tokens,
                            output_tokens: total_output_tokens,
                            iterations: iteration,
                        })).await {
                            tracing::warn!("[AgentLoop] Failed to send Complete event: {}", e);
                        }
                        if let Err(e) = memory.append_message(&session_id_clone, &Message::assistant(&question)).await {
                            tracing::warn!("[AgentLoop] Failed to save clarification message: {}", e);
                        }
                        break 'outer;
                    }
                    results.insert(plan.idx, (plan.id.clone(), plan.name.clone(), result, is_error));
                }
-                    // Add tool result to message history
+                // Phase 3: after_tool_call middleware + push results in original order
-                    tracing::debug!("[AgentLoop] Adding tool_result to history: id={}, name={}, is_error={}", id, name, is_error);
+                let mut sorted_indices: Vec<usize> = results.keys().copied().collect();
-                    messages.push(Message::tool_result(
+                sorted_indices.sort();
-                        id,
+                for idx in sorted_indices {
-                        zclaw_types::ToolId::new(&name),
+                    let (id, name, result, is_error) = results.remove(&idx).unwrap();
-                        result,
+
-                        is_error,
+                    // Emit ToolEnd event
-                    ));
+                    if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: result.clone() }).await {
                        tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
                    }
                    // Run after_tool_call middleware
                    {
                        let mut mw_ctx = middleware::MiddlewareContext {
                            agent_id: agent_id.clone(),
                            session_id: session_id_clone.clone(),
                            user_input: String::new(),
                            system_prompt: enhanced_prompt.clone(),
                            messages: messages.clone(),
                            response_content: Vec::new(),
                            input_tokens: total_input_tokens,
                            output_tokens: total_output_tokens,
                        };
                        if let Err(e) = middleware_chain.run_after_tool_call(&mut mw_ctx, &name, &result).await {
                            tracing::warn!("[AgentLoop] after_tool_call middleware failed for '{}': {}", name, e);
                        }
                    }
                    messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), result, is_error));
                }
                tracing::debug!("[AgentLoop] Continuing to next iteration for LLM to process tool results");
                // If stream errored, we executed complete tools but cannot continue the LLM loop
                if stream_errored {
                    tracing::info!("[AgentLoop] Stream was errored — executed salvageable tools, now breaking");
                    break 'outer;
                }
                // Continue loop - next iteration will call LLM with tool results
            }
        });
--- a/crates/zclaw-runtime/src/middleware.rs
+++ b/crates/zclaw-runtime/src/middleware.rs
@@ -12,6 +12,13 @@
 //! | 200-399 | Capability     | SkillIndex, Guardrail       |
 //! | 400-599 | Safety         | LoopGuard, Guardrail        |
 //! | 600-799 | Telemetry      | TokenCalibration, Tracking  |
 //!
 //! # Wave parallelization
 //!
 //! `before_completion` middlewares that only modify `system_prompt` (not `messages`)
 //! can declare `parallel_safe() == true`. The chain runs consecutive parallel-safe
 //! middlewares concurrently, merging their prompt contributions. This reduces
 //! sequential latency for the context-injection phase.
 use std::sync::Arc;
 use async_trait::async_trait;
@@ -50,6 +57,7 @@ pub enum ToolCallDecision {
 // ---------------------------------------------------------------------------
 /// Carries the mutable state that middleware may inspect or modify.
 #[derive(Clone)]
 pub struct MiddlewareContext {
    /// The agent that owns this loop.
    pub agent_id: AgentId,
@@ -101,6 +109,15 @@ pub trait AgentMiddleware: Send + Sync {
        500
    }
    /// Whether `before_completion` is safe to run concurrently with other
    /// parallel-safe middlewares. Only return `true` if the middleware:
    /// - Only modifies `ctx.system_prompt` (never `ctx.messages`)
    /// - Does not depend on prompt modifications from other middlewares
    /// - Does not return `MiddlewareDecision::Stop`
    fn parallel_safe(&self) -> bool {
        false
    }
    /// Hook executed **before** the LLM completion request is sent.
    ///
    /// Use this to inject context (memory, skill index, etc.) or to
@@ -163,9 +180,66 @@ impl MiddlewareChain {
        self.middlewares.insert(pos, mw);
    }
-    /// Run all `before_completion` hooks in order.
+    /// Run all `before_completion` hooks with wave-based parallelization.
    ///
    /// Consecutive `parallel_safe` middlewares run concurrently — each gets
    /// its own cloned context and appends to `system_prompt` independently.
    /// Their contributions are merged after all complete. Non-parallel-safe
    /// middlewares (and non-consecutive ones) run sequentially as before.
    pub async fn run_before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
-        for mw in &self.middlewares {
+        let mut idx = 0;
        while idx < self.middlewares.len() {
            // Find the extent of consecutive parallel-safe middlewares
            let wave_start = idx;
            let mut wave_end = idx;
            while wave_end < self.middlewares.len()
                && self.middlewares[wave_end].parallel_safe()
            {
                wave_end += 1;
            }
            if wave_end - wave_start >= 2 {
                // Run parallel wave (2+ consecutive parallel-safe middlewares)
                let base_prompt_len = ctx.system_prompt.len();
                let wave = &self.middlewares[wave_start..wave_end];
                // Spawn concurrent tasks — each owns its cloned context + Arc ref to middleware
                let mut join_handles = Vec::with_capacity(wave.len());
                for mw in wave.iter() {
                    let mut ctx_clone = ctx.clone();
                    let mw_arc = Arc::clone(mw);
                    join_handles.push(tokio::spawn(async move {
                        let result = mw_arc.before_completion(&mut ctx_clone).await;
                        (result, ctx_clone.system_prompt)
                    }));
                }
                // Await all and merge prompt contributions
                for (i, handle) in join_handles.into_iter().enumerate() {
                    let (result, modified_prompt): (Result<MiddlewareDecision>, String) = handle.await
                        .map_err(|e| zclaw_types::ZclawError::Internal(format!("Parallel middleware panicked: {}", e)))?;
                    match result? {
                        MiddlewareDecision::Continue => {}
                        MiddlewareDecision::Stop(reason) => {
                            tracing::info!(
                                "[MiddlewareChain] '{}' requested stop: {}",
                                self.middlewares[wave_start + i].name(),
                                reason
                            );
                            return Ok(MiddlewareDecision::Stop(reason));
                        }
                    }
                    // Merge system_prompt contribution from this clone
                    if modified_prompt.len() > base_prompt_len {
                        let contribution = &modified_prompt[base_prompt_len..];
                        ctx.system_prompt.push_str(contribution);
                    }
                }
                idx = wave_end;
            } else {
                // Run single middleware sequentially
                let mw = &self.middlewares[idx];
                match mw.before_completion(ctx).await? {
                    MiddlewareDecision::Continue => {}
                    MiddlewareDecision::Stop(reason) => {
@@ -173,6 +247,8 @@ impl MiddlewareChain {
                        return Ok(MiddlewareDecision::Stop(reason));
                    }
                }
                idx += 1;
            }
        }
        Ok(MiddlewareDecision::Continue)
    }
--- a/crates/zclaw-runtime/src/middleware/butler_router.rs
+++ b/crates/zclaw-runtime/src/middleware/butler_router.rs
@@ -290,6 +290,8 @@ impl AgentMiddleware for ButlerRouterMiddleware {
        80
    }
    fn parallel_safe(&self) -> bool { true }
    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
        // Only route on the first user message in a turn (not tool results)
        let user_input = &ctx.user_input;
--- a/crates/zclaw-runtime/src/middleware/compaction.rs
+++ b/crates/zclaw-runtime/src/middleware/compaction.rs
@@ -1,21 +1,49 @@
 //! Compaction middleware — wraps the existing compaction module.
 //!
 //! Supports debounce (cooldown + min-round checks), async LLM compression
 //! with cached fallback, and iterative summaries that carry forward key info.
 use async_trait::async_trait;
-use zclaw_types::Result;
+use std::sync::atomic::{AtomicU64, Ordering};
 use crate::middleware::{AgentMiddleware, MiddlewareContext, MiddlewareDecision};
 use crate::compaction::{self, CompactionConfig};
 use crate::growth::GrowthIntegration;
 use crate::driver::LlmDriver;
 use std::sync::Arc;
 use tokio::sync::RwLock;
 use zclaw_types::{Message, Result};
 use crate::compaction::{self, CompactionConfig};
 use crate::driver::LlmDriver;
 use crate::growth::GrowthIntegration;
 use crate::middleware::{AgentMiddleware, MiddlewareContext, MiddlewareDecision};
 /// Minimum seconds between consecutive compactions.
 const COMPACTION_COOLDOWN_SECS: u64 = 30;
 /// Minimum message pairs (user+assistant) since last compaction before triggering again.
 const COMPACTION_MIN_ROUNDS: u64 = 3;
 fn now_millis() -> u64 {
    std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .unwrap_or_default()
        .as_millis() as u64
 }
 /// Shared compaction debounce state (lock-free).
 struct CompactionState {
    last_compaction_ms: AtomicU64,
    last_compaction_msg_count: AtomicU64,
 }
 /// Cached result from a previous async LLM compaction.
 struct AsyncCompactionCache {
    last_result: RwLock<Option<Vec<Message>>>,
 }
 /// Middleware that compresses conversation history when it exceeds a token threshold.
 pub struct CompactionMiddleware {
    threshold: usize,
    config: CompactionConfig,
    /// Optional LLM driver for async compaction (LLM summarisation, memory flush).
    driver: Option<Arc<dyn LlmDriver>>,
    /// Optional growth integration for memory flushing during compaction.
    growth: Option<GrowthIntegration>,
    state: Arc<CompactionState>,
    cache: Arc<AsyncCompactionCache>,
 }
 impl CompactionMiddleware {
@@ -25,7 +53,39 @@ impl CompactionMiddleware {
        driver: Option<Arc<dyn LlmDriver>>,
        growth: Option<GrowthIntegration>,
    ) -> Self {
-        Self { threshold, config, driver, growth }
+        Self {
            threshold,
            config,
            driver,
            growth,
            state: Arc::new(CompactionState {
                last_compaction_ms: AtomicU64::new(0),
                last_compaction_msg_count: AtomicU64::new(0),
            }),
            cache: Arc::new(AsyncCompactionCache {
                last_result: RwLock::new(None),
            }),
        }
    }
    fn should_compact(&self, msg_count: u64) -> bool {
        let last_ms = self.state.last_compaction_ms.load(Ordering::Relaxed);
        let last_count = self.state.last_compaction_msg_count.load(Ordering::Relaxed);
        if now_millis().saturating_sub(last_ms) < COMPACTION_COOLDOWN_SECS * 1000 {
            return false;
        }
        if msg_count.saturating_sub(last_count) < COMPACTION_MIN_ROUNDS * 2 {
            return false;
        }
        true
    }
    fn record_compaction(&self, msg_count: u64) {
        self.state.last_compaction_ms.store(now_millis(), Ordering::Relaxed);
        self.state.last_compaction_msg_count.store(msg_count, Ordering::Relaxed);
    }
 }
@@ -39,6 +99,29 @@ impl AgentMiddleware for CompactionMiddleware {
            return Ok(MiddlewareDecision::Continue);
        }
        // Step 1: Prune old tool outputs (cheap, no LLM needed)
        let pruned = compaction::prune_tool_outputs(&mut ctx.messages);
        if pruned > 0 {
            tracing::info!("[CompactionMiddleware] Pruned {} old tool outputs", pruned);
        }
        // Step 2: Re-estimate tokens after pruning
        let tokens = compaction::estimate_messages_tokens_calibrated(&ctx.messages);
        if tokens < self.threshold {
            return Ok(MiddlewareDecision::Continue);
        }
        // Step 3: Debounce check
        if !self.should_compact(ctx.messages.len() as u64) {
            // Still over threshold but within cooldown — use cached result if available
            if let Some(cached) = self.cache.last_result.read().await.clone() {
                tracing::debug!("[CompactionMiddleware] Cooldown active, using cached compaction result");
                ctx.messages = cached;
            }
            return Ok(MiddlewareDecision::Continue);
        }
        // Step 4: Execute compaction
        let needs_async = self.config.use_llm || self.config.memory_flush_enabled;
        if needs_async {
            let outcome = compaction::maybe_compact_with_config(
@@ -56,6 +139,14 @@ impl AgentMiddleware for CompactionMiddleware {
            ctx.messages = compaction::maybe_compact(ctx.messages.clone(), self.threshold);
        }
        self.record_compaction(ctx.messages.len() as u64);
        // Cache result for cooldown fallback
        {
            let mut cache = self.cache.last_result.write().await;
            *cache = Some(ctx.messages.clone());
        }
        Ok(MiddlewareDecision::Continue)
    }
 }
--- a/crates/zclaw-runtime/src/middleware/evolution.rs
+++ b/crates/zclaw-runtime/src/middleware/evolution.rs
@@ -88,6 +88,8 @@ impl AgentMiddleware for EvolutionMiddleware {
        78 // 在 ButlerRouter(80) 之前
    }
    fn parallel_safe(&self) -> bool { true }
    async fn before_completion(
        &self,
        ctx: &mut MiddlewareContext,
--- a/crates/zclaw-runtime/src/middleware/memory.rs
+++ b/crates/zclaw-runtime/src/middleware/memory.rs
@@ -111,6 +111,7 @@ impl MemoryMiddleware {
 impl AgentMiddleware for MemoryMiddleware {
    fn name(&self) -> &str { "memory" }
    fn priority(&self) -> i32 { 150 }
    fn parallel_safe(&self) -> bool { true }
    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
        tracing::debug!(
--- a/crates/zclaw-runtime/src/middleware/skill_index.rs
+++ b/crates/zclaw-runtime/src/middleware/skill_index.rs
@@ -40,6 +40,7 @@ impl SkillIndexMiddleware {
 impl AgentMiddleware for SkillIndexMiddleware {
    fn name(&self) -> &str { "skill_index" }
    fn priority(&self) -> i32 { 200 }
    fn parallel_safe(&self) -> bool { true }
    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
        if self.entries.is_empty() {
--- a/crates/zclaw-runtime/src/middleware/title.rs
+++ b/crates/zclaw-runtime/src/middleware/title.rs
@@ -41,6 +41,7 @@ impl Default for TitleMiddleware {
 impl AgentMiddleware for TitleMiddleware {
    fn name(&self) -> &str { "title" }
    fn priority(&self) -> i32 { 180 }
    fn parallel_safe(&self) -> bool { true }
    // All hooks default to Continue — placeholder until LLM driver is wired in.
    async fn before_completion(&self, _ctx: &mut crate::middleware::MiddlewareContext) -> zclaw_types::Result<MiddlewareDecision> {
--- a/crates/zclaw-runtime/src/middleware/tool_error.rs
+++ b/crates/zclaw-runtime/src/middleware/tool_error.rs
@@ -13,6 +13,7 @@ use serde_json::Value;
 use zclaw_types::Result;
 use crate::driver::ContentBlock;
 use crate::middleware::{AgentMiddleware, MiddlewareContext, ToolCallDecision};
 use std::collections::HashMap;
 use std::sync::Mutex;
 /// Middleware that intercepts tool call errors and formats recovery messages.
@@ -23,8 +24,8 @@ pub struct ToolErrorMiddleware {
    max_error_length: usize,
    /// Maximum consecutive failures before aborting the loop.
    max_consecutive_failures: u32,
-    /// Tracks consecutive tool failures.
+    /// Tracks consecutive tool failures per session.
-    consecutive_failures: Mutex<u32>,
+    session_failures: Mutex<HashMap<String, u32>>,
 }
 impl ToolErrorMiddleware {
@@ -32,7 +33,7 @@ impl ToolErrorMiddleware {
        Self {
            max_error_length: 500,
            max_consecutive_failures: 3,
-            consecutive_failures: Mutex::new(0),
+            session_failures: Mutex::new(HashMap::new()),
        }
    }
@@ -66,7 +67,7 @@ impl AgentMiddleware for ToolErrorMiddleware {
    async fn before_tool_call(
        &self,
-        _ctx: &MiddlewareContext,
+        ctx: &MiddlewareContext,
        tool_name: &str,
        tool_input: &Value,
    ) -> Result<ToolCallDecision> {
@@ -79,15 +80,17 @@ impl AgentMiddleware for ToolErrorMiddleware {
            return Ok(ToolCallDecision::ReplaceInput(serde_json::json!({})));
        }
-        // Check consecutive failure count — abort if too many failures
+        // Check consecutive failure count — abort if too many failures (per session)
-        let failures = self.consecutive_failures.lock().unwrap_or_else(|e| e.into_inner());
+        let failures = self.session_failures.lock()
-        if *failures >= self.max_consecutive_failures {
+            .map(|m| m.get(&ctx.session_id.to_string()).copied().unwrap_or(0))
            .unwrap_or(0);
        if failures >= self.max_consecutive_failures {
            tracing::warn!(
                "[ToolErrorMiddleware] Aborting loop: {} consecutive tool failures",
-                *failures
+                failures
            );
            return Ok(ToolCallDecision::AbortLoop(
-                format!("连续 {} 次工具调用失败，已自动终止以避免无限重试", *failures)
+                format!("连续 {} 次工具调用失败，已自动终止以避免无限重试", failures)
            ));
        }
@@ -100,11 +103,16 @@ impl AgentMiddleware for ToolErrorMiddleware {
        tool_name: &str,
        result: &Value,
    ) -> Result<()> {
        let mut failures = self.consecutive_failures.lock().unwrap_or_else(|e| e.into_inner());
        // Check if the tool result indicates an error.
        if let Some(error) = result.get("error") {
-            *failures += 1;
+            let session_key = ctx.session_id.to_string();
            let failures = self.session_failures.lock()
                .map(|mut m| {
                    let count = m.entry(session_key.clone()).or_insert(0);
                    *count += 1;
                    *count
                })
                .unwrap_or(1);
            let error_msg = match error {
                Value::String(s) => s.clone(),
                other => other.to_string(),
@@ -118,7 +126,7 @@ impl AgentMiddleware for ToolErrorMiddleware {
            tracing::warn!(
                "[ToolErrorMiddleware] Tool '{}' failed ({}/{} consecutive): {}",
-                tool_name, *failures, self.max_consecutive_failures, truncated
+                tool_name, failures, self.max_consecutive_failures, truncated
            );
            let guided_message = self.format_tool_error(tool_name, &truncated);
@@ -126,8 +134,11 @@ impl AgentMiddleware for ToolErrorMiddleware {
                text: guided_message,
            });
        } else {
-            // Success — reset consecutive failure counter
+            // Success — reset consecutive failure counter for this session
-            *failures = 0;
+            let session_key = ctx.session_id.to_string();
            if let Ok(mut m) = self.session_failures.lock() {
                m.insert(session_key, 0);
            }
        }
        Ok(())
--- a/crates/zclaw-runtime/src/middleware/tool_output_guard.rs
+++ b/crates/zclaw-runtime/src/middleware/tool_output_guard.rs
@@ -21,35 +21,27 @@ use crate::middleware::{AgentMiddleware, MiddlewareContext, ToolCallDecision};
 /// Maximum safe output length in characters.
 const MAX_OUTPUT_LENGTH: usize = 50_000;
-/// Patterns that indicate sensitive information in tool output.
+/// Regex patterns that match actual secret values (not just keywords).
-const SENSITIVE_PATTERNS: &[&str] = &[
+/// These detect the *value format* of secrets, avoiding false positives
-    "api_key",
+/// from legitimate content that merely mentions "password" or "api_key".
-    "apikey",
+const SECRET_VALUE_PATTERNS: &[&str] = &[
-    "api-key",
+    r#"sk-[a-zA-Z0-9]{20,}"#,              // OpenAI API keys (sk-xxx, 20+ chars)
-    "secret_key",
+    r#"sk_live_[a-zA-Z0-9]{20,}"#,          // Stripe live keys
-    "secretkey",
+    r#"sk_test_[a-zA-Z0-9]{20,}"#,          // Stripe test keys
-    "access_token",
+    r#"AKIA[A-Z0-9]{16}"#,                   // AWS access keys (exact 20 chars)
-    "auth_token",
+    r#"-----BEGIN (RSA |EC )?PRIVATE KEY-----"#,  // PEM private keys
-    "password",
+    r#"(?:api_?key|secret_?key|access_?token|auth_?token|password)\s*[:=]\s*["'][^"']{8,}["']"#,  // key=value with actual secret
    "private_key",
    "-----BEGIN RSA",
    "-----BEGIN PRIVATE",
    "sk-",           // OpenAI API keys
    "sk_live_",      // Stripe keys
    "AKIA",          // AWS access keys
 ];
-/// Patterns that may indicate prompt injection in tool output.
+/// Keyword patterns that indicate prompt injection in tool output.
 /// These are specific enough to avoid false positives from normal content.
 const INJECTION_PATTERNS: &[&str] = &[
    "ignore previous instructions",
    "ignore all previous",
    "disregard your instructions",
    "you are now",
    "new instructions:",
    "system:",
    "[INST]",
    "</scratchpad>",
    "think step by step about",
 ];
 /// Tool output sanitization middleware.
@@ -105,22 +97,24 @@ impl AgentMiddleware for ToolOutputGuardMiddleware {
            );
        }
-        // Rule 2: Sensitive information detection — block output containing secrets (P2-22)
+        // Rule 2: Sensitive information detection — match actual secret values, not keywords
-        let output_lower = output_str.to_lowercase();
+        for pattern in SECRET_VALUE_PATTERNS {
-        for pattern in SENSITIVE_PATTERNS {
+            if let Ok(re) = regex::Regex::new(pattern) {
-            if output_lower.contains(pattern) {
+                if re.is_match(&output_str) {
                    tracing::error!(
-                    "[ToolOutputGuard] BLOCKED tool '{}' output: sensitive pattern '{}'",
+                        "[ToolOutputGuard] BLOCKED tool '{}' output: secret value matched pattern '{}'",
                        tool_name, pattern
                    );
                    return Err(zclaw_types::ZclawError::Internal(format!(
-                    "[ToolOutputGuard] Tool '{}' output blocked: sensitive information detected ('{}')",
+                        "[ToolOutputGuard] Tool '{}' output blocked: sensitive information detected",
-                    tool_name, pattern
+                        tool_name
                    )));
                }
            }
        }
-        // Rule 3: Injection marker detection — BLOCK the output (P2-22 fix)
+        // Rule 3: Injection marker detection — specific phrase matching
        let output_lower = output_str.to_lowercase();
        for pattern in INJECTION_PATTERNS {
            if output_lower.contains(pattern) {
                tracing::error!(
--- a/crates/zclaw-runtime/src/stream.rs
+++ b/crates/zclaw-runtime/src/stream.rs
@@ -24,6 +24,10 @@ pub enum StreamChunk {
        input_tokens: u32,
        output_tokens: u32,
        stop_reason: String,
        #[serde(default)]
        cache_creation_input_tokens: Option<u32>,
        #[serde(default)]
        cache_read_input_tokens: Option<u32>,
    },
    /// Error occurred
    Error { message: String },
--- a/crates/zclaw-runtime/src/test_util.rs
+++ b/crates/zclaw-runtime/src/test_util.rs
@@ -55,6 +55,8 @@ impl MockLlmDriver {
            input_tokens: 10,
            output_tokens: text.len() as u32 / 4,
            stop_reason: StopReason::EndTurn,
            cache_creation_input_tokens: None,
            cache_read_input_tokens: None,
        });
        self
    }
@@ -74,6 +76,8 @@ impl MockLlmDriver {
            input_tokens: 10,
            output_tokens: 20,
            stop_reason: StopReason::ToolUse,
            cache_creation_input_tokens: None,
            cache_read_input_tokens: None,
        });
        self
    }
@@ -86,6 +90,8 @@ impl MockLlmDriver {
            input_tokens: 0,
            output_tokens: 0,
            stop_reason: StopReason::Error,
            cache_creation_input_tokens: None,
            cache_read_input_tokens: None,
        });
        self
    }
@@ -142,6 +148,8 @@ impl MockLlmDriver {
                input_tokens: 0,
                output_tokens: 0,
                stop_reason: StopReason::EndTurn,
                cache_creation_input_tokens: None,
                cache_read_input_tokens: None,
            })
    }
 }
@@ -190,6 +198,8 @@ impl LlmDriver for MockLlmDriver {
                        input_tokens: 10,
                        output_tokens: 2,
                        stop_reason: "end_turn".to_string(),
                        cache_creation_input_tokens: None,
                        cache_read_input_tokens: None,
                    },
                ]
            })
--- a/crates/zclaw-runtime/src/tool.rs
+++ b/crates/zclaw-runtime/src/tool.rs
@@ -11,6 +11,17 @@ use crate::driver::ToolDefinition;
 use crate::loop_runner::LoopEvent;
 use crate::tool::builtin::PathValidator;
 /// Tool concurrency safety level
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub enum ToolConcurrency {
    /// Read-only operations, always safe to parallelize (file_read, web_fetch, etc.)
    ReadOnly,
    /// Exclusive operations, must be serial (file_write, shell_exec, etc.)
    Exclusive,
    /// Interactive operations, never parallelize (ask_clarification, etc.)
    Interactive,
 }
 /// Tool trait for implementing agent tools
 #[async_trait]
 pub trait Tool: Send + Sync {
@@ -25,6 +36,11 @@ pub trait Tool: Send + Sync {
    /// Execute the tool
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value>;
    /// Tool concurrency safety level. Default: ReadOnly.
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::ReadOnly
    }
 }
 /// Skill executor trait for runtime skill execution
--- a/crates/zclaw-runtime/src/tool/builtin/ask_clarification.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/ask_clarification.rs
@@ -9,7 +9,7 @@ use async_trait::async_trait;
 use serde_json::{json, Value};
 use zclaw_types::{Result, ZclawError};
-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 /// Clarification type — categorizes the reason for asking.
 #[derive(Debug, Clone, PartialEq)]
@@ -96,6 +96,10 @@ impl Tool for AskClarificationTool {
        })
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Interactive
    }
    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<Value> {
        let question = input["question"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'question' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/execute_skill.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/execute_skill.rs
@@ -4,7 +4,7 @@ use async_trait::async_trait;
 use serde_json::{json, Value};
 use zclaw_types::{Result, ZclawError};
-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 pub struct ExecuteSkillTool;
@@ -42,6 +42,10 @@ impl Tool for ExecuteSkillTool {
        })
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Exclusive
    }
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        let skill_id = input["skill_id"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'skill_id' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/file_write.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/file_write.rs
@@ -6,7 +6,7 @@ use zclaw_types::{Result, ZclawError};
 use std::fs;
 use std::io::Write;
-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 use super::path_validator::PathValidator;
 pub struct FileWriteTool;
@@ -55,6 +55,10 @@ impl Tool for FileWriteTool {
        })
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Exclusive
    }
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        let path = input["path"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'path' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/mcp_tool.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/mcp_tool.rs
@@ -8,7 +8,7 @@ use serde_json::Value;
 use std::sync::Arc;
 use zclaw_types::Result;
-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 /// Wraps an MCP tool adapter into the `Tool` trait.
 ///
@@ -42,6 +42,10 @@ impl Tool for McpToolWrapper {
        self.adapter.input_schema().clone()
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Exclusive
    }
    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<Value> {
        self.adapter.execute(input).await
    }
--- a/crates/zclaw-runtime/src/tool/builtin/path_validator.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/path_validator.rs
@@ -97,6 +97,17 @@ fn default_blocked_paths() -> Vec<PathBuf> {
    ]
 }
 /// Normalize Windows UNC path prefix for consistent comparison.
 /// `\\?\C:\Users\...` → `C:\Users\...`
 fn normalize_windows_path(path: &Path) -> std::borrow::Cow<'_, Path> {
    let s = path.to_string_lossy();
    if s.starts_with(r"\\?\") {
        std::borrow::Cow::Owned(PathBuf::from(&s[4..]))
    } else {
        std::borrow::Cow::Borrowed(path)
    }
 }
 /// Expand tilde in path to home directory
 fn expand_tilde(path: &str) -> PathBuf {
    if path.starts_with('~') {
@@ -154,9 +165,16 @@ impl PathValidator {
        }
    }
-    /// Set the workspace root directory
+    /// Set the workspace root directory.
    /// Canonicalizes the path to ensure consistent comparison on Windows
    /// (where canonicalize() returns `\\?\C:\...` UNC paths).
    pub fn with_workspace(mut self, workspace: PathBuf) -> Self {
-        self.workspace_root = Some(workspace);
+        let canonical = if workspace.exists() {
            workspace.canonicalize().unwrap_or(workspace)
        } else {
            workspace
        };
        self.workspace_root = Some(canonical);
        self
    }
@@ -230,7 +248,14 @@ impl PathValidator {
    fn resolve_and_validate(&self, path: &str) -> Result<PathBuf> {
        // Expand tilde
        let expanded = expand_tilde(path);
-        let path_buf = PathBuf::from(&expanded);
+        let mut path_buf = PathBuf::from(&expanded);
        // If relative path and workspace is configured, resolve against workspace
        if path_buf.is_relative() {
            if let Some(ref workspace) = self.workspace_root {
                path_buf = workspace.join(&path_buf);
            }
        }
        // Check for path traversal
        self.check_path_traversal(&path_buf)?;
@@ -280,10 +305,14 @@ impl PathValidator {
        Ok(())
    }
-    /// Check if path is in blocked list
+    /// Check if path is in blocked list.
    /// Normalizes Windows UNC prefix (`\\?\`) for consistent comparison.
    fn check_blocked(&self, path: &Path) -> Result<()> {
        // Strip Windows UNC prefix for consistent matching
        let normalized = normalize_windows_path(path);
        for blocked in &self.config.blocked_paths {
-            if path.starts_with(blocked) || path == blocked {
+            let blocked_norm = normalize_windows_path(blocked);
            if normalized.starts_with(&*blocked_norm) || normalized == blocked_norm {
                return Err(ZclawError::InvalidInput(format!(
                    "Access to this path is blocked: {}",
                    path.display()
@@ -303,11 +332,15 @@ impl PathValidator {
    /// - This prevents accidental exposure of the entire filesystem
    ///   when the validator is misconfigured or used without setup
    fn check_allowed(&self, path: &Path) -> Result<()> {
        let path_norm = normalize_windows_path(path);
        // If no allowed paths specified, check workspace
        if self.config.allowed_paths.is_empty() {
            if let Some(ref workspace) = self.workspace_root {
                // Workspace is configured - validate path is within it
-                if !path.starts_with(workspace) {
+                // Both sides are canonicalized (workspace via with_workspace, path via resolve_and_validate)
                let ws_norm = normalize_windows_path(workspace);
                if !path_norm.starts_with(&*ws_norm) {
                    return Err(ZclawError::InvalidInput(format!(
                        "Path outside workspace: {} (workspace: {})",
                        path.display(),
@@ -329,7 +362,8 @@ impl PathValidator {
        // Check against allowed paths
        for allowed in &self.config.allowed_paths {
-            if path.starts_with(allowed) {
+            let allowed_norm = normalize_windows_path(allowed);
            if path_norm.starts_with(&*allowed_norm) {
                return Ok(());
            }
        }
--- a/crates/zclaw-runtime/src/tool/builtin/shell_exec.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/shell_exec.rs
@@ -8,7 +8,7 @@ use std::process::{Command, Stdio};
 use std::time::{Duration, Instant};
 use zclaw_types::{Result, ZclawError};
-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 /// Parse a command string into program and arguments using proper shell quoting
 fn parse_command(command: &str) -> Result<(String, Vec<String>)> {
@@ -175,6 +175,10 @@ impl Tool for ShellExecTool {
        })
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Exclusive
    }
    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<Value> {
        let command = input["command"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'command' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/task.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/task.rs
@@ -11,7 +11,7 @@ use zclaw_memory::MemoryStore;
 use crate::driver::LlmDriver;
 use crate::loop_runner::{AgentLoop, LoopEvent};
-use crate::tool::{Tool, ToolContext, ToolRegistry};
+use crate::tool::{Tool, ToolContext, ToolRegistry, ToolConcurrency};
 use crate::tool::builtin::register_builtin_tools;
 use std::sync::Arc;
@@ -91,6 +91,10 @@ impl Tool for TaskTool {
        })
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Exclusive
    }
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        let description = input["description"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'description' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/hand_tool.rs
+++ b/crates/zclaw-runtime/src/tool/hand_tool.rs
@@ -7,7 +7,7 @@ use async_trait::async_trait;
 use serde_json::{json, Value};
 use zclaw_types::Result;
-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 /// Wrapper that exposes a Hand as a Tool in the agent's tool registry.
 ///
@@ -78,6 +78,10 @@ impl Tool for HandTool {
        self.input_schema.clone()
    }
    fn concurrency(&self) -> ToolConcurrency {
        ToolConcurrency::Exclusive
    }
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        // Delegate to the HandExecutor (bridged from HandRegistry via kernel).
        // If no hand_executor is available (e.g., standalone runtime without kernel),
--- a/crates/zclaw-types/src/error.rs
+++ b/crates/zclaw-types/src/error.rs
@@ -223,6 +223,33 @@ impl Serialize for ZclawError {
 /// Result type alias for ZCLAW operations
 pub type Result<T> = std::result::Result<T, ZclawError>;
 /// LLM 调用错误的细粒度分类，指导重试和恢复策略
 #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
 #[serde(rename_all = "snake_case")]
 pub enum LlmErrorKind {
    Auth,
    AuthPermanent,
    BillingExhausted,
    RateLimited,
    Overloaded,
    ServerError,
    Timeout,
    ContextOverflow,
    ModelNotFound,
    Unknown,
 }
 /// 分类后的 LLM 错误，附带恢复提示
 #[derive(Debug, Clone)]
 pub struct ClassifiedLlmError {
    pub kind: LlmErrorKind,
    pub retryable: bool,
    pub should_compress: bool,
    pub should_rotate_credential: bool,
    pub retry_after: Option<std::time::Duration>,
    pub message: String,
 }
 #[cfg(test)]
 mod tests {
    use super::*;
--- a/desktop/src-tauri/src/intelligence/experience.rs
+++ b/desktop/src-tauri/src/intelligence/experience.rs
@@ -16,6 +16,21 @@ use zclaw_types::Result;
 use super::pain_aggregator::PainPoint;
 use super::solution_generator::Proposal;
 /// Brief summary of a stored experience, for suggestion context enrichment.
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct ExperienceBrief {
    pub pain_pattern: String,
    pub solution_summary: String,
    pub reuse_count: u32,
 }
 static EXPERIENCE_EXTRACTOR: std::sync::OnceLock<std::sync::Arc<ExperienceExtractor>> = std::sync::OnceLock::new();
 /// Get the global ExperienceExtractor singleton (if initialized).
 pub(crate) fn get_experience_extractor() -> Option<std::sync::Arc<ExperienceExtractor>> {
    EXPERIENCE_EXTRACTOR.get().cloned()
 }
 // ---------------------------------------------------------------------------
 // Shared completion status
 // ---------------------------------------------------------------------------
@@ -263,6 +278,36 @@ fn xml_escape(s: &str) -> String {
     .replace('>', "&gt;")
 }
 /// Initialize the global ExperienceExtractor singleton.
 /// Called once during app startup, after viking storage is ready.
 pub async fn init_experience_extractor() -> Result<()> {
    let sqlite_storage = crate::viking_commands::get_storage().await
        .map_err(|e| zclaw_types::ZclawError::StorageError(e))?;
    let viking = std::sync::Arc::new(zclaw_growth::VikingAdapter::new(sqlite_storage));
    let store = std::sync::Arc::new(ExperienceStore::new(viking));
    let extractor = std::sync::Arc::new(ExperienceExtractor::new(store));
    EXPERIENCE_EXTRACTOR.set(extractor)
        .map_err(|_| zclaw_types::ZclawError::StorageError("ExperienceExtractor already initialized".into()))?;
    Ok(())
 }
 /// Find experiences relevant to the current conversation for suggestion enrichment.
 #[tauri::command]
 pub async fn experience_find_relevant(
    agent_id: String,
    query: String,
 ) -> std::result::Result<Vec<ExperienceBrief>, String> {
    let extractor = get_experience_extractor()
        .ok_or("ExperienceExtractor not initialized".to_string())?;
    let experiences = extractor.find_relevant_experiences(&agent_id, &query).await;
    Ok(experiences.into_iter().take(3).map(|e| ExperienceBrief {
        pain_pattern: e.pain_pattern,
        solution_summary: e.solution_steps.join("；")
            .chars().take(100).collect(),
        reuse_count: e.reuse_count,
    }).collect())
 }
 // ---------------------------------------------------------------------------
 // Tests
 // ---------------------------------------------------------------------------
@@ -407,4 +452,17 @@ mod tests {
        assert_eq!(truncate("hello", 10), "hello");
        assert_eq!(truncate("这是一个很长的字符串用于测试截断", 10).chars().count(), 11); // 10 + …
    }
    #[test]
    fn test_experience_brief_serialization() {
        let brief = super::ExperienceBrief {
            pain_pattern: "报表生成慢".to_string(),
            solution_summary: "使用 researcher 技能自动收集".to_string(),
            reuse_count: 3,
        };
        let json = serde_json::to_string(&brief).unwrap();
        let parsed: super::ExperienceBrief = serde_json::from_str(&json).unwrap();
        assert_eq!(parsed.pain_pattern, "报表生成慢");
        assert_eq!(parsed.reuse_count, 3);
    }
 }
--- a/desktop/src-tauri/src/intelligence_hooks.rs
+++ b/desktop/src-tauri/src/intelligence_hooks.rs
@@ -7,8 +7,10 @@
 use tracing::{debug, warn};
 use std::collections::HashMap;
 use std::sync::Arc;
 use tauri::Emitter;
 use tokio::sync::RwLock;
 use zclaw_growth::VikingStorage;
 use crate::intelligence::identity::IdentityManagerState;
@@ -16,6 +18,36 @@ use crate::intelligence::heartbeat::HeartbeatEngineState;
 use crate::intelligence::reflection::{MemoryEntryForAnalysis, ReflectionEngineState};
 use zclaw_runtime::driver::LlmDriver;
 // ---------------------------------------------------------------------------
 // Identity prompt cache — avoids mutex + disk I/O on every request
 // ---------------------------------------------------------------------------
 struct CachedIdentity {
    prompt: String,
    #[allow(dead_code)] // Reserved for future TTL-based cache validation
    soul_hash: u64,
 }
 static IDENTITY_CACHE: std::sync::LazyLock<RwLock<HashMap<String, CachedIdentity>>> =
    std::sync::LazyLock::new(|| RwLock::new(HashMap::new()));
 /// Invalidate cached identity prompt for a given agent (call when soul.md changes).
 pub fn invalidate_identity_cache(agent_id: &str) {
    let cache = &*IDENTITY_CACHE;
    // Non-blocking: spawn a task to remove the entry
    if let Ok(mut guard) = cache.try_write() {
        guard.remove(agent_id);
    }
 }
 /// Simple hash for cache invalidation — uses string content hash.
 fn content_hash(s: &str) -> u64 {
    use std::hash::{Hash, Hasher};
    let mut hasher = std::collections::hash_map::DefaultHasher::new();
    s.hash(&mut hasher);
    hasher.finish()
 }
 /// Run pre-conversation intelligence hooks
 ///
 /// Builds identity-enhanced system prompt (SOUL.md + instructions) and
@@ -29,10 +61,29 @@ pub async fn pre_conversation_hook(
    _user_message: &str,
    identity_state: &IdentityManagerState,
 ) -> Result<String, String> {
-    // Build identity-enhanced system prompt (SOUL.md + instructions)
+    // Check identity prompt cache first (avoids mutex + disk I/O)
-    // Memory context is injected by MemoryMiddleware in the kernel middleware chain,
+    let cache = &*IDENTITY_CACHE;
-    // not here, to avoid duplicate injection.
+    {
-    let enhanced_prompt = match build_identity_prompt(agent_id, "", identity_state).await {
+        let guard = cache.read().await;
        if let Some(cached) = guard.get(agent_id) {
            // Cache hit — still need continuity context, but skip identity build
            let continuity_context = build_continuity_context(agent_id, _user_message).await;
            let mut result = cached.prompt.clone();
            if !continuity_context.is_empty() {
                result.push_str(&continuity_context);
            }
            debug!("[intelligence_hooks] Identity cache HIT for agent {}", agent_id);
            return Ok(result);
        }
    }
    // Cache miss — build identity prompt and continuity context in parallel
    let (identity_result, continuity_context) = tokio::join!(
        build_identity_prompt_cached(agent_id, "", identity_state, cache),
        build_continuity_context(agent_id, _user_message)
    );
    let enhanced_prompt = match identity_result {
        Ok(prompt) => prompt,
        Err(e) => {
            warn!(
@@ -43,9 +94,6 @@ pub async fn pre_conversation_hook(
        }
    };
    // Cross-session continuity: check for unresolved pain points and recent experiences
    let continuity_context = build_continuity_context(agent_id, _user_message).await;
    let mut result = enhanced_prompt;
    if !continuity_context.is_empty() {
        result.push_str(&continuity_context);
@@ -240,6 +288,8 @@ pub async fn post_conversation_hook(
                        warn!("[intelligence_hooks] Failed to update soul with agent name: {}", e);
                    } else {
                        debug!("[intelligence_hooks] Updated agent name to '{}' in soul", name);
                        // Invalidate cache since soul.md changed
                        invalidate_identity_cache(agent_id);
                    }
                }
                drop(manager);
@@ -340,21 +390,34 @@ async fn build_memory_context(
    Ok(context)
 }
-/// Build identity-enhanced system prompt
+/// Build identity-enhanced system prompt and cache the result.
-async fn build_identity_prompt(
+async fn build_identity_prompt_cached(
    agent_id: &str,
    memory_context: &str,
    identity_state: &IdentityManagerState,
    cache: &RwLock<HashMap<String, CachedIdentity>>,
 ) -> Result<String, String> {
    // IdentityManagerState is Arc<tokio::sync::Mutex<AgentIdentityManager>>
    // tokio::sync::Mutex::lock() returns MutexGuard directly
    let mut manager = identity_state.lock().await;
    // Read current soul content for hashing
    let soul_content = manager.get_file(agent_id, crate::intelligence::identity::IdentityFile::Soul);
    let soul_hash = content_hash(&soul_content);
    let prompt = manager.build_system_prompt(
        agent_id,
        if memory_context.is_empty() { None } else { Some(memory_context) },
    ).await;
    // Cache the result
    drop(manager); // Release lock before acquiring write guard
    {
        let mut guard = cache.write().await;
        guard.insert(agent_id.to_string(), CachedIdentity {
            prompt: prompt.clone(),
            soul_hash,
        });
    }
    Ok(prompt)
 }
--- a/desktop/src-tauri/src/lib.rs
+++ b/desktop/src-tauri/src/lib.rs
@@ -212,6 +212,12 @@ pub fn run() {
                if let Err(e) = rt.block_on(intelligence::pain_aggregator::init_pain_storage(pool)) {
                    tracing::error!("[PainStorage] Init failed: {}, pain points will not persist", e);
                }
                // Initialize experience extractor for suggestion enrichment.
                // Graceful degradation: failure does not block app startup.
                if let Err(e) = rt.block_on(intelligence::experience::init_experience_extractor()) {
                    tracing::warn!("[ExperienceExtractor] Init failed: {}, suggestion context will be empty", e);
                }
            }
            Ok(())
@@ -435,6 +441,8 @@ pub fn run() {
            intelligence::pain_aggregator::butler_update_proposal_status,
            // Industry config loader
            viking_commands::viking_load_industry_keywords,
            // Experience finder for suggestion enrichment
            intelligence::experience::experience_find_relevant,
        ])
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
--- a/desktop/src/components/ChatArea.tsx
+++ b/desktop/src/components/ChatArea.tsx
@@ -665,6 +665,28 @@ function stripToolNarration(content: string): string {
  return result || content;
 }
 /**
 * Strip dangling clarification references from text when ask_clarification tool was called.
 * When the LLM calls ask_clarification, it often ends its text with phrases like
 * "比如：" / "以下信息" / "以下选项" that reference the tool output — but the tool output
 * is rendered in a separate ClarificationCard, so these become confusing dead-end sentences.
 */
 function stripDanglingClarificationRef(text: string, hasClarificationTool: boolean): string {
  if (!hasClarificationTool || !text) return text;
  // Match trailing dangling references in Chinese and English
  const patterns = [
    /[，,]\s*可以(?:提供以下|告诉我更多细节，)?(?:信息|选项|方向|细节|分类|类型)[：:]\s*$/,
    /[，,]\s*比如[：:]\s*$/,
    /[，,]\s*(?:例如|譬如|如以下)[：:]\s*$/,
    /,\s*(?:for example|such as|like|the following)[：:]?\s*$/i,
  ];
  for (const pat of patterns) {
    const stripped = text.replace(pat, '');
    if (stripped !== text) return stripped;
  }
  return text;
 }
 function MessageBubble({ message, onRetry }: { message: Message; setInput?: (text: string) => void; onRetry?: () => void }) {
  if (message.role === 'tool') {
    return null;
@@ -749,7 +771,10 @@ function MessageBubble({ message, onRetry }: { message: Message; setInput?: (tex
                ? (isUser
                    ? message.content
                    : <StreamingText
-                        content={stripToolNarration(message.content)}
+                        content={stripDanglingClarificationRef(
                          stripToolNarration(message.content),
                          toolCallSteps?.some(s => s.toolName === 'ask_clarification') ?? false,
                        )}
                        isStreaming={!!message.streaming}
                        className="text-gray-700 dark:text-gray-200"
                      />
--- a/desktop/src/components/ai/ArtifactPanel.tsx
+++ b/desktop/src/components/ai/ArtifactPanel.tsx
@@ -6,9 +6,10 @@ import {
  Image as ImageIcon,
  Download,
  Copy,
-  ChevronLeft,
+  ChevronDown,
  File,
 } from 'lucide-react';
 import { MarkdownRenderer } from './MarkdownRenderer';
 // ---------------------------------------------------------------------------
 // Types
@@ -76,6 +77,7 @@ export function ArtifactPanel({
  className = '',
 }: ArtifactPanelProps) {
  const [viewMode, setViewMode] = useState<'preview' | 'code'>('preview');
  const [fileMenuOpen, setFileMenuOpen] = useState(false);
  const selected = useMemo(
    () => artifacts.find((a) => a.id === selectedId),
    [artifacts, selectedId]
@@ -135,22 +137,59 @@ export function ArtifactPanel({
  return (
    <div className={`h-full flex flex-col ${className}`}>
-      {/* File header */}
+      {/* File header with inline file selector */}
      <div className="px-4 py-2 border-b border-gray-200 dark:border-gray-700 flex items-center gap-2 flex-shrink-0">
        <div className="relative">
          <button
-          onClick={() => onSelect('')}
+            onClick={() => setFileMenuOpen(!fileMenuOpen)}
-          className="p-1 rounded hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-400 hover:text-gray-600 dark:hover:text-gray-200 transition-colors"
+            className="flex items-center gap-1.5 text-sm font-medium text-gray-700 dark:text-gray-200 truncate hover:text-orange-500 transition-colors"
-          title="返回文件列表"
+            title="切换文件"
          >
          <ChevronLeft className="w-4 h-4" />
        </button>
            <Icon className="w-4 h-4 text-orange-500 flex-shrink-0" />
-        <span className="text-sm font-medium text-gray-700 dark:text-gray-200 truncate flex-1">
+            <span className="truncate max-w-[120px]">{selected.name}</span>
-          {selected.name}
+            {artifacts.length > 1 && (
              <ChevronDown className={`w-3.5 h-3.5 text-gray-400 transition-transform ${fileMenuOpen ? 'rotate-180' : ''}`} />
            )}
          </button>
          {/* File selector dropdown */}
          {fileMenuOpen && artifacts.length > 1 && (
            <>
              <div className="fixed inset-0 z-10" onClick={() => setFileMenuOpen(false)} />
              <div className="absolute top-full left-0 mt-1 w-56 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg z-20 py-1 max-h-60 overflow-y-auto">
                {artifacts.map((artifact) => {
                  const ItemIcon = getFileIcon(artifact.type);
                  return (
                    <button
                      key={artifact.id}
                      onClick={() => { onSelect(artifact.id); setFileMenuOpen(false); }}
                      className={`w-full flex items-center gap-2 px-3 py-2 text-left text-sm hover:bg-gray-50 dark:hover:bg-gray-700 transition-colors ${
                        artifact.id === selected.id ? 'bg-orange-50 dark:bg-orange-900/20 text-orange-700 dark:text-orange-300' : 'text-gray-700 dark:text-gray-200'
                      }`}
                    >
                      <ItemIcon className="w-4 h-4 flex-shrink-0" />
                      <span className="truncate flex-1">{artifact.name}</span>
                      <span className={`text-[10px] px-1 py-0.5 rounded ${getTypeColor(artifact.type)}`}>
                        {getTypeLabel(artifact.type)}
                      </span>
                    </button>
                  );
                })}
              </div>
            </>
          )}
        </div>
        <div className="flex-1" />
        <span className={`text-[10px] px-1.5 py-0.5 rounded font-medium ${getTypeColor(selected.type)}`}>
          {getTypeLabel(selected.type)}
        </span>
        {selected.language && (
          <span className="text-[10px] text-gray-400 dark:text-gray-500">
            {selected.language}
          </span>
        )}
      </div>
      {/* View mode toggle */}
@@ -180,19 +219,7 @@ export function ArtifactPanel({
      {/* Content area */}
      <div className="flex-1 overflow-y-auto custom-scrollbar p-4">
        {viewMode === 'preview' ? (
-          <div className="prose prose-sm dark:prose-invert max-w-none">
+          <ArtifactContentPreview artifact={selected} />
            {selected.type === 'markdown' ? (
              <MarkdownPreview content={selected.content} />
            ) : selected.type === 'code' ? (
              <pre className="bg-gray-50 dark:bg-gray-800 rounded-lg p-3 text-xs font-mono overflow-x-auto text-gray-700 dark:text-gray-200">
                {selected.content}
              </pre>
            ) : (
              <pre className="whitespace-pre-wrap text-sm text-gray-700 dark:text-gray-200">
                {selected.content}
              </pre>
            )}
          </div>
        ) : (
          <pre className="bg-gray-50 dark:bg-gray-800 rounded-lg p-3 text-xs font-mono overflow-x-auto text-gray-700 dark:text-gray-200 leading-relaxed">
            {selected.content}
@@ -217,6 +244,37 @@ export function ArtifactPanel({
  );
 }
 // ---------------------------------------------------------------------------
 // ArtifactContentPreview — renders artifact based on type
 // ---------------------------------------------------------------------------
 function ArtifactContentPreview({ artifact }: { artifact: ArtifactFile }) {
  if (artifact.type === 'markdown') {
    return <MarkdownRenderer content={artifact.content} />;
  }
  if (artifact.type === 'code') {
    return (
      <div className="relative">
        {artifact.language && (
          <div className="absolute top-2 right-2 text-[10px] text-gray-400 dark:text-gray-500 bg-gray-100 dark:bg-gray-700 px-1.5 py-0.5 rounded">
            {artifact.language}
          </div>
        )}
        <pre className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 text-xs font-mono overflow-x-auto text-gray-700 dark:text-gray-200 leading-relaxed border border-gray-200 dark:border-gray-700">
          {artifact.content}
        </pre>
      </div>
    );
  }
  return (
    <pre className="whitespace-pre-wrap text-sm text-gray-700 dark:text-gray-200">
      {artifact.content}
    </pre>
  );
 }
 // ---------------------------------------------------------------------------
 // ActionButton
 // ---------------------------------------------------------------------------
@@ -243,50 +301,6 @@ function ActionButton({ icon, label, onClick }: { icon: React.ReactNode; label:
  );
 }
 // ---------------------------------------------------------------------------
 // Simple Markdown preview (no external deps)
 // ---------------------------------------------------------------------------
 function MarkdownPreview({ content }: { content: string }) {
  // Basic markdown rendering: headings, bold, code blocks, lists
  const lines = content.split('\n');
  return (
    <div className="space-y-2">
      {lines.map((line, i) => {
        // Heading
        if (line.startsWith('### ')) {
          return <h3 key={i} className="text-sm font-bold text-gray-800 dark:text-gray-100 mt-3">{line.slice(4)}</h3>;
        }
        if (line.startsWith('## ')) {
          return <h2 key={i} className="text-base font-bold text-gray-800 dark:text-gray-100 mt-4">{line.slice(3)}</h2>;
        }
        if (line.startsWith('# ')) {
          return <h1 key={i} className="text-lg font-bold text-gray-800 dark:text-gray-100">{line.slice(2)}</h1>;
        }
        // Code block (simplified)
        if (line.startsWith('```')) return null;
        // List item
        if (line.startsWith('- ') || line.startsWith('* ')) {
          return <li key={i} className="text-sm text-gray-700 dark:text-gray-300 ml-4">{renderInline(line.slice(2))}</li>;
        }
        // Empty line
        if (!line.trim()) return <div key={i} className="h-2" />;
        // Regular paragraph
        return <p key={i} className="text-sm text-gray-700 dark:text-gray-300 leading-relaxed">{renderInline(line)}</p>;
      })}
    </div>
  );
 }
 function renderInline(text: string): React.ReactNode {
  // Bold
  const parts = text.split(/\*\*(.*?)\*\*/g);
  return parts.map((part, i) =>
    i % 2 === 1 ? <strong key={i} className="font-semibold">{part}</strong> : part
  );
 }
 // ---------------------------------------------------------------------------
 // Download helper
 // ---------------------------------------------------------------------------
--- a/desktop/src/components/ai/MarkdownRenderer.tsx
+++ b/desktop/src/components/ai/MarkdownRenderer.tsx
@@ -0,0 +1,123 @@
 /**
 * MarkdownRenderer — shared Markdown rendering with styled components.
 *
 * Extracted from StreamingText.tsx so ArtifactPanel and other consumers
 * can reuse the same rich rendering (GFM tables, syntax blocks, etc.)
 * without duplicating the component overrides.
 */
 import ReactMarkdown from 'react-markdown';
 import remarkGfm from 'remark-gfm';
 import type { Components } from 'react-markdown';
 // ---------------------------------------------------------------------------
 // Shared component overrides for react-markdown
 // ---------------------------------------------------------------------------
 export const markdownComponents: Components = {
  pre({ children }) {
    return (
      <pre className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 overflow-x-auto text-sm leading-relaxed border border-gray-200 dark:border-gray-700 my-3">
        {children}
      </pre>
    );
  },
  code({ className, children, ...props }) {
    const isBlock = className?.startsWith('language-');
    if (isBlock) {
      return (
        <code className={`${className || ''} text-gray-800 dark:text-gray-200`} {...props}>
          {children}
        </code>
      );
    }
    return (
      <code className="bg-gray-100 dark:bg-gray-800 text-gray-700 dark:text-gray-300 px-1.5 py-0.5 rounded text-[0.9em] font-mono" {...props}>
        {children}
      </code>
    );
  },
  table({ children }) {
    return (
      <div className="overflow-x-auto my-3 -mx-1">
        <table className="min-w-full border-collapse border border-gray-200 dark:border-gray-700 rounded-lg text-sm">
          {children}
        </table>
      </div>
    );
  },
  thead({ children }) {
    return <thead className="bg-gray-50 dark:bg-gray-800/50">{children}</thead>;
  },
  th({ children }) {
    return (
      <th className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-left font-semibold text-gray-700 dark:text-gray-300">
        {children}
      </th>
    );
  },
  td({ children }) {
    return (
      <td className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-gray-600 dark:text-gray-400">
        {children}
      </td>
    );
  },
  ul({ children }) {
    return <ul className="list-disc list-outside ml-5 my-2 space-y-1">{children}</ul>;
  },
  ol({ children }) {
    return <ol className="list-decimal list-outside ml-5 my-2 space-y-1">{children}</ol>;
  },
  li({ children }) {
    return <li className="leading-relaxed">{children}</li>;
  },
  h1({ children }) {
    return <h1 className="text-xl font-bold mt-5 mb-3 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h1>;
  },
  h2({ children }) {
    return <h2 className="text-lg font-bold mt-4 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h2>;
  },
  h3({ children }) {
    return <h3 className="text-base font-semibold mt-3 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h3>;
  },
  blockquote({ children }) {
    return (
      <blockquote className="border-l-4 border-gray-300 dark:border-gray-600 pl-4 py-1 my-3 text-gray-600 dark:text-gray-400 italic bg-gray-50 dark:bg-gray-800/30 rounded-r-lg">
        {children}
      </blockquote>
    );
  },
  p({ children }) {
    return <p className="my-2 leading-relaxed first:mt-0 last:mb-0">{children}</p>;
  },
  a({ href, children }) {
    return (
      <a href={href} target="_blank" rel="noopener noreferrer" className="text-blue-600 dark:text-blue-400 underline hover:text-blue-800 dark:hover:text-blue-300">
        {children}
      </a>
    );
  },
  hr() {
    return <hr className="my-4 border-gray-200 dark:border-gray-700" />;
  },
 };
 // ---------------------------------------------------------------------------
 // Convenience wrapper
 // ---------------------------------------------------------------------------
 interface MarkdownRendererProps {
  content: string;
  className?: string;
 }
 export function MarkdownRenderer({ content, className = '' }: MarkdownRendererProps) {
  return (
    <div className={`prose-sm prose-gray dark:prose-invert max-w-none ${className}`}>
      <ReactMarkdown remarkPlugins={[remarkGfm]} components={markdownComponents}>
        {content}
      </ReactMarkdown>
    </div>
  );
 }
--- a/desktop/src/components/ai/StreamingText.tsx
+++ b/desktop/src/components/ai/StreamingText.tsx
@@ -1,7 +1,5 @@
 import { useMemo, useRef, useEffect, useState } from 'react';
-import ReactMarkdown from 'react-markdown';
+import { MarkdownRenderer } from './MarkdownRenderer';
 import remarkGfm from 'remark-gfm';
 import type { Components } from 'react-markdown';
 /**
 * Streaming text with word-by-word reveal animation.
@@ -18,111 +16,6 @@ interface StreamingTextProps {
  asMarkdown?: boolean;
 }
 // ---------------------------------------------------------------------------
 // Markdown component overrides for rich rendering
 // ---------------------------------------------------------------------------
 const markdownComponents: Components = {
  // Code blocks (```...```)
  pre({ children }) {
    return (
      <pre className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 overflow-x-auto text-sm leading-relaxed border border-gray-200 dark:border-gray-700 my-3">
        {children}
      </pre>
    );
  },
  // Inline code (`...`)
  code({ className, children, ...props }) {
    // If it has a language class, it's inside a code block — render as block
    const isBlock = className?.startsWith('language-');
    if (isBlock) {
      return (
        <code className={`${className || ''} text-gray-800 dark:text-gray-200`} {...props}>
          {children}
        </code>
      );
    }
    return (
      <code className="bg-gray-100 dark:bg-gray-800 text-gray-700 dark:text-gray-300 px-1.5 py-0.5 rounded text-[0.9em] font-mono" {...props}>
        {children}
      </code>
    );
  },
  // Tables
  table({ children }) {
    return (
      <div className="overflow-x-auto my-3 -mx-1">
        <table className="min-w-full border-collapse border border-gray-200 dark:border-gray-700 rounded-lg text-sm">
          {children}
        </table>
      </div>
    );
  },
  thead({ children }) {
    return <thead className="bg-gray-50 dark:bg-gray-800/50">{children}</thead>;
  },
  th({ children }) {
    return (
      <th className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-left font-semibold text-gray-700 dark:text-gray-300">
        {children}
      </th>
    );
  },
  td({ children }) {
    return (
      <td className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-gray-600 dark:text-gray-400">
        {children}
      </td>
    );
  },
  // Unordered lists
  ul({ children }) {
    return <ul className="list-disc list-outside ml-5 my-2 space-y-1">{children}</ul>;
  },
  // Ordered lists
  ol({ children }) {
    return <ol className="list-decimal list-outside ml-5 my-2 space-y-1">{children}</ol>;
  },
  // List items
  li({ children }) {
    return <li className="leading-relaxed">{children}</li>;
  },
  // Headings
  h1({ children }) {
    return <h1 className="text-xl font-bold mt-5 mb-3 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h1>;
  },
  h2({ children }) {
    return <h2 className="text-lg font-bold mt-4 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h2>;
  },
  h3({ children }) {
    return <h3 className="text-base font-semibold mt-3 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h3>;
  },
  // Blockquotes
  blockquote({ children }) {
    return (
      <blockquote className="border-l-4 border-gray-300 dark:border-gray-600 pl-4 py-1 my-3 text-gray-600 dark:text-gray-400 italic bg-gray-50 dark:bg-gray-800/30 rounded-r-lg">
        {children}
      </blockquote>
    );
  },
  // Paragraphs
  p({ children }) {
    return <p className="my-2 leading-relaxed first:mt-0 last:mb-0">{children}</p>;
  },
  // Links
  a({ href, children }) {
    return (
      <a href={href} target="_blank" rel="noopener noreferrer" className="text-blue-600 dark:text-blue-400 underline hover:text-blue-800 dark:hover:text-blue-300">
        {children}
      </a>
    );
  },
  // Horizontal rules
  hr() {
    return <hr className="my-4 border-gray-200 dark:border-gray-700" />;
  },
 };
 // ---------------------------------------------------------------------------
 // Token splitter for streaming animation
 // ---------------------------------------------------------------------------
@@ -176,13 +69,7 @@ export function StreamingText({
 }: StreamingTextProps) {
  // For completed messages, use full markdown rendering with styled components
  if (!isStreaming && asMarkdown) {
-    return (
+    return <MarkdownRenderer content={content} className={className} />;
      <div className={`prose-sm prose-gray dark:prose-invert max-w-none ${className}`}>
        <ReactMarkdown remarkPlugins={[remarkGfm]} components={markdownComponents}>
          {content}
        </ReactMarkdown>
      </div>
    );
  }
  // For streaming messages, use token-by-token animation
--- a/desktop/src/components/ai/ToolCallChain.tsx
+++ b/desktop/src/components/ai/ToolCallChain.tsx
@@ -166,7 +166,8 @@ interface ToolStepRowProps {
 }
 function ToolStepRow({ step, isActive, showConnector }: ToolStepRowProps) {
-  const [expanded, setExpanded] = useState(false);
+  // Clarification cards default to expanded so users see options immediately
  const [expanded, setExpanded] = useState(step.toolName === 'ask_clarification');
  const Icon = getToolIcon(step.toolName);
  const label = getToolLabel(step.toolName);
  const isRunning = step.status === 'running';
--- a/desktop/src/components/ai/index.ts
+++ b/desktop/src/components/ai/index.ts
@@ -8,4 +8,5 @@ export { SuggestionChips } from './SuggestionChips';
 export { ResizableChatLayout } from './ResizableChatLayout';
 export { ToolCallChain, type ToolCallStep } from './ToolCallChain';
 export { ArtifactPanel, type ArtifactFile } from './ArtifactPanel';
 export { MarkdownRenderer, markdownComponents } from './MarkdownRenderer';
 export { TokenMeter } from './TokenMeter';
--- a/desktop/src/lib/gateway-client.ts
+++ b/desktop/src/lib/gateway-client.ts
@@ -696,13 +696,14 @@ export class GatewayClient {
        break;
      case 'tool_call':
-        // Tool call event
+        // Tool call start: onTool(name, input, '') — empty output signals start
        if (callbacks.onTool && data.tool) {
-          callbacks.onTool(data.tool, JSON.stringify(data.input || {}), data.output || '');
+          callbacks.onTool(data.tool, JSON.stringify(data.input || {}), '');
        }
        break;
      case 'tool_result':
        // Tool call end: onTool(name, '', output) — empty input signals end
        if (callbacks.onTool && data.tool) {
          callbacks.onTool(data.tool, '', String(data.result || data.output || ''));
        }
--- a/desktop/src/lib/llm-service.ts
+++ b/desktop/src/lib/llm-service.ts
@@ -646,18 +646,25 @@ const HARDCODED_PROMPTS: Record<string, { system: string; user: (arg: string) =>
  },
  suggestions: {
-    system: `你是对话分析助手。根据最近的对话内容，生成 3 个用户可能想继续探讨的问题。
+    system: `你是 ZCLAW 的管家助手，需要站在用户角度思考他们真正需要什么，生成 3 个个性化建议。
-要求：
+## 生成规则
- 每个问题必须与对话内容直接相关，具体且有针对性
+1. 第 1 条 — 深入追问：基于当前话题，提出一个有洞察力的追问，帮助用户深入探索
- 帮助用户深入理解、实际操作或拓展思路
+2. 第 2 条 — 实用行动：建议一个具体的、可操作的下一步（调用技能、执行工具、查看数据等）
- 每个问题不超过 30 个中文字符
+3. 第 3 条 — 管家关怀：
- 不要重复对话中已讨论过的内容
+   - 如果有未解决痛点 → 回访建议，如"上次提到的X，后来解决了吗？"
- 使用与用户相同的语言
+   - 如果有相关经验 → 引导复用，如"上次用X方法解决了类似问题，要再试试吗？"
   - 如果有匹配技能 → 推荐使用，如"试试 [技能名] 来处理这个"
   - 如果没有提供痛点/经验/技能信息 → 给出一个启发性的思考角度
 4. 每个不超过 30 个中文字符
 5. 不要重复对话中已讨论过的内容
 6. 不要生成空泛的建议（如"继续分析"、"换个角度"）
 7. 默认使用中文，不要混入英文词汇（如"workflow"用"工作流"、"report"用"报表"），除非用户在对话中明确使用英文
 8. 建议会被用户直接点击发送，因此不要包含任何称谓（如"领导"、"老板"、"老师"等），用无主语的问句或陈述句
 只输出 JSON 数组，包含恰好 3 个字符串。不要输出任何其他内容。
-示例：["如何在生产环境中部署？", "这个方案的成本如何？", "有没有更简单的替代方案？"]`,
+示例：["科室绩效分析可以按哪些维度拆解？", "用研究技能查一下相关文献？", "上次提到的排班冲突问题，需要继续想解决方案吗？"]`,
-    user: (context: string) => `以下是对话中最近的消息：\n\n${context}\n\n请生成 3 个后续问题。`,
+    user: (context: string) => `以下是对话中最近的消息：\n\n${context}\n\n请生成 3 个后续建议（1 深入追问 + 1 实用行动 + 1 管家关怀）。`,
  },
 };
--- a/desktop/src/lib/suggestion-context.ts
+++ b/desktop/src/lib/suggestion-context.ts
@@ -0,0 +1,131 @@
 /**
 * Suggestion context enrichment — fetches intelligence data for personalized suggestions.
 * All fetches are optional; failures silently degrade to empty context.
 */
 import { invoke } from '@tauri-apps/api/core';
 import { createLogger } from './logger';
 const log = createLogger('SuggestionContext');
 const CONTEXT_FETCH_TIMEOUT = 500;
 /** Pain point from butler intelligence layer. */
 interface PainPoint {
  summary: string;
  category: string;
  confidence: number;
  status: string;
  occurrence_count: number;
 }
 /** Brief experience from the experience store. */
 interface ExperienceBrief {
  pain_pattern: string;
  solution_summary: string;
  reuse_count: number;
 }
 /** Pipeline/skill match candidate. */
 interface PipelineCandidateInfo {
  id: string;
  display_name: string;
  description: string;
  category: string | null;
  match_reason: string | null;
 }
 /** Route intent response (only NoMatch variant has suggestions). */
 interface RouteResultResponse {
  type: 'Matched' | 'Ambiguous' | 'NoMatch' | 'NeedMoreInfo';
  suggestions?: PipelineCandidateInfo[];
 }
 /** Aggregated suggestion context from all intelligence sources. */
 export interface SuggestionContext {
  userProfile: string;
  painPoints: string;
  experiences: string;
  skillMatch: string;
 }
 function isTauriAvailable(): boolean {
  return typeof window !== 'undefined' && '__TAURI_INTERNALS__' in window;
 }
 function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T | null> {
  return Promise.race([
    promise,
    new Promise<null>(resolve => setTimeout(() => resolve(null), ms)),
  ]);
 }
 async function fetchUserProfile(agentId: string): Promise<string> {
  const profile = await invoke<string>('identity_get_file', {
    agentId,
    file: 'userprofile',
  });
  if (!profile || profile.trim().length === 0) return '';
  const text = profile.trim();
  return text.length > 200 ? text.slice(0, 200) : text;
 }
 async function fetchPainPoints(agentId: string): Promise<string> {
  const points = await invoke<PainPoint[]>('butler_list_pain_points', { agentId });
  if (!Array.isArray(points) || points.length === 0) return '';
  const active = points
    .filter(p => p.confidence >= 0.5 && p.status !== 'Solved' && p.status !== 'Dismissed')
    .sort((a, b) => b.confidence - a.confidence)
    .slice(0, 3);
  if (active.length === 0) return '';
  return active
    .map((p, i) => `${i + 1}. [${p.category}] ${p.summary}（出现${p.occurrence_count}次）`)
    .join('\n');
 }
 async function fetchExperiences(agentId: string, query: string): Promise<string> {
  const experiences = await invoke<ExperienceBrief[]>('experience_find_relevant', {
    agentId,
    query,
  });
  if (!Array.isArray(experiences) || experiences.length === 0) return '';
  return experiences.slice(0, 2)
    .map(e => `上次解决"${e.pain_pattern}"的方法：${e.solution_summary}（已复用${e.reuse_count}次）`)
    .join('\n');
 }
 async function fetchSkillMatch(userInput: string): Promise<string> {
  const result = await invoke<RouteResultResponse>('route_intent', { userInput });
  const suggestions = result?.suggestions;
  if (!Array.isArray(suggestions) || suggestions.length === 0) return '';
  const best = suggestions[0];
  return `你可能需要：${best.display_name} — ${best.description}`;
 }
 const EMPTY_CONTEXT: SuggestionContext = { userProfile: '', painPoints: '', experiences: '', skillMatch: '' };
 /**
 * Fetch all intelligence context in parallel for suggestion enrichment.
 * Returns empty strings for any source that fails — never throws.
 */
 export async function fetchSuggestionContext(
  agentId: string,
  lastUserMessage: string,
 ): Promise<SuggestionContext> {
  if (!isTauriAvailable()) {
    return EMPTY_CONTEXT;
  }
  const [userProfile, painPoints, experiences, skillMatch] = await Promise.all([
    withTimeout(fetchUserProfile(agentId).catch(e => { log.warn('User profile fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
    withTimeout(fetchPainPoints(agentId).catch(e => { log.warn('Pain points fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
    withTimeout(fetchExperiences(agentId, lastUserMessage).catch(e => { log.warn('Experiences fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
    withTimeout(fetchSkillMatch(lastUserMessage).catch(e => { log.warn('Skill match fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
  ]);
  return { userProfile: userProfile ?? '', painPoints: painPoints ?? '', experiences: experiences ?? '', skillMatch: skillMatch ?? '' };
 }
--- a/desktop/src/store/chat/artifactStore.ts
+++ b/desktop/src/store/chat/artifactStore.ts
@@ -1,13 +1,13 @@
 /**
- * ArtifactStore — manages the artifact panel state.
+ * ArtifactStore — manages the artifact panel state with IndexedDB persistence.
 *
 * Extracted from chatStore.ts as part of the structured refactor.
- * This store has zero external dependencies — the simplest slice to extract.
+ * Uses zustand/middleware persist + idb-storage for persistence across refreshes.
 *
 * @see docs/superpowers/specs/2026-04-02-chatstore-refactor-design.md §3.5
 */
 import { create } from 'zustand';
 import { persist, createJSONStorage } from 'zustand/middleware';
 import { createIdbStorageAdapter } from '../../lib/idb-storage';
 import type { ArtifactFile } from '../../components/ai/ArtifactPanel';
 // ---------------------------------------------------------------------------
@@ -33,7 +33,9 @@ export interface ArtifactState {
 // Store
 // ---------------------------------------------------------------------------
-export const useArtifactStore = create<ArtifactState>()((set) => ({
+export const useArtifactStore = create<ArtifactState>()(
  persist(
    (set) => ({
      artifacts: [],
      selectedArtifactId: null,
      artifactPanelOpen: false,
@@ -51,4 +53,13 @@ export const useArtifactStore = create<ArtifactState>()((set) => ({
      clearArtifacts: () =>
        set({ artifacts: [], selectedArtifactId: null, artifactPanelOpen: false }),
-}));
+    }),
    {
      name: 'zclaw-artifact-storage',
      storage: createJSONStorage(() => createIdbStorageAdapter()),
      partialize: (state) => ({
        artifacts: state.artifacts,
      }),
    },
  ),
 );
--- a/desktop/src/store/chat/streamStore.ts
+++ b/desktop/src/store/chat/streamStore.ts
@@ -34,11 +34,16 @@ import {
 } from './conversationStore';
 import { useMessageStore } from './messageStore';
 import { useArtifactStore } from './artifactStore';
-import { llmSuggest } from '../../lib/llm-service';
+import { llmSuggest, LLM_PROMPTS } from '../../lib/llm-service';
 import { detectNameSuggestion, detectAgentNameSuggestion } from '../../lib/cold-start-mapper';
 import { fetchSuggestionContext, type SuggestionContext } from '../../lib/suggestion-context';
 const log = createLogger('StreamStore');
 // Module-level prefetch for suggestion context — started during streaming,
 // consumed on stream completion. Saves ~0.5-1s vs fetching after stream ends.
 let _activeSuggestionContextPrefetch: Promise<SuggestionContext> | null = null;
 // ---------------------------------------------------------------------------
 // Error formatting — convert raw LLM/API errors to user-friendly messages
 // ---------------------------------------------------------------------------
@@ -214,6 +219,67 @@ class DeltaBuffer {
  }
 }
 // ---------------------------------------------------------------------------
 // Artifact creation from tool output (shared between sendMessage & agent stream)
 // ---------------------------------------------------------------------------
 const ARTIFACT_TYPE_MAP: Record<string, 'code' | 'markdown' | 'text' | 'table' | 'image'> = {
  ts: 'code', tsx: 'code', js: 'code', jsx: 'code',
  py: 'code', rs: 'code', go: 'code', java: 'code',
  md: 'markdown', txt: 'text', json: 'code',
  html: 'code', css: 'code', sql: 'code', sh: 'code',
  yaml: 'code', yml: 'code', toml: 'code', xml: 'code',
  csv: 'table', svg: 'image',
 };
 const ARTIFACT_LANG_MAP: Record<string, string> = {
  ts: 'typescript', tsx: 'typescript', js: 'javascript', jsx: 'javascript',
  py: 'python', rs: 'rust', go: 'go', java: 'java',
  html: 'html', css: 'css', sql: 'sql', sh: 'bash',
  json: 'json', yaml: 'yaml', yml: 'yaml', toml: 'toml',
  xml: 'xml', csv: 'csv', md: 'markdown', txt: 'text',
 };
 /** Attempt to create an artifact from a completed tool call. */
 function tryCreateArtifactFromToolOutput(toolName: string, toolInput: string, toolOutput: string): void {
  if (!toolOutput) return;
  const toolsWithArtifacts = ['file_write', 'write_file', 'str_replace', 'str_replace_editor'];
  if (!toolsWithArtifacts.includes(toolName)) return;
  try {
    const parsed = JSON.parse(toolOutput);
    const filePath = parsed?.path || parsed?.file_path || '';
    let content = parsed?.content || '';
    // For str_replace tools, content may be in input
    if (!content && toolInput) {
      try {
        const inputParsed = JSON.parse(toolInput);
        content = inputParsed?.new_text || inputParsed?.content || '';
      } catch { /* ignore */ }
    }
    if (!filePath || !content) return;
    // Deduplicate: skip if an artifact with the same path already exists
    const existing = useArtifactStore.getState().artifacts;
    if (existing.some(a => a.name === filePath.split('/').pop())) return;
    const fileName = filePath.split('/').pop() || filePath;
    const ext = fileName.split('.').pop()?.toLowerCase() || '';
    useArtifactStore.getState().addArtifact({
      id: `artifact_${Date.now()}`,
      name: fileName,
      content: typeof content === 'string' ? content : JSON.stringify(content, null, 2),
      type: ARTIFACT_TYPE_MAP[ext] || 'text',
      language: ARTIFACT_LANG_MAP[ext],
      createdAt: new Date(),
    });
  } catch { /* non-critical: artifact creation from tool output */ }
 }
 // ---------------------------------------------------------------------------
 // Stream event handlers (extracted from sendMessage)
 // ---------------------------------------------------------------------------
@@ -236,38 +302,8 @@ function createToolHandler(assistantId: string, chat: ChatStoreAccess) {
        })
      );
-      // Auto-create artifact when file_write tool produces output
+      // Auto-create artifact from tool output
-      if (tool === 'file_write') {
+      tryCreateArtifactFromToolOutput(tool, input, output);
        try {
          const parsed = JSON.parse(output);
          const filePath = parsed?.path || parsed?.file_path || '';
          const content = parsed?.content || '';
          if (filePath && content) {
            const fileName = filePath.split('/').pop() || filePath;
            const ext = fileName.split('.').pop()?.toLowerCase() || '';
            const typeMap: Record<string, 'code' | 'markdown' | 'text'> = {
              ts: 'code', tsx: 'code', js: 'code', jsx: 'code',
              py: 'code', rs: 'code', go: 'code', java: 'code',
              md: 'markdown', txt: 'text', json: 'code',
              html: 'code', css: 'code', sql: 'code', sh: 'code',
            };
            const langMap: Record<string, string> = {
              ts: 'typescript', tsx: 'typescript', js: 'javascript', jsx: 'javascript',
              py: 'python', rs: 'rust', go: 'go', java: 'java',
              html: 'html', css: 'css', sql: 'sql', sh: 'bash', json: 'json',
            };
            useArtifactStore.getState().addArtifact({
              id: `artifact_${Date.now()}`,
              name: fileName,
              content: typeof content === 'string' ? content : JSON.stringify(content, null, 2),
              type: typeMap[ext] || 'text',
              language: langMap[ext],
              createdAt: new Date(),
              sourceStepId: assistantId,
            });
          }
        } catch { /* non-critical: artifact creation from tool output */ }
      }
    } else {
      // toolStart: create new running step
      const step: ToolCallStep = {
@@ -399,37 +435,51 @@ function createCompleteHandler(
      }
    }
-    // Async memory extraction (independent — failures don't block name detection)
+    // Decoupled: suggestion generation runs immediately with prefetched context,
    // memory extraction + reflection run independently in background.
    const filtered = msgs
      .filter(m => m.role === 'user' || m.role === 'assistant')
      .map(m => ({ role: m.role, content: m.content }));
    const convId = useConversationStore.getState().currentConversationId;
    getMemoryExtractor().extractFromConversation(filtered, agentId, convId ?? undefined)
      .catch(err => log.warn('Memory extraction failed:', err));
-    intelligenceClient.reflection.recordConversation().catch(err => {
+    // Build conversation messages for suggestions
      log.warn('Recording conversation failed:', err);
    });
    intelligenceClient.reflection.shouldReflect().then(shouldReflect => {
      if (shouldReflect) {
        intelligenceClient.reflection.reflect(agentId, []).catch(err => {
          log.warn('Reflection failed:', err);
        });
      }
    });
    // Follow-up suggestions (async LLM call with keyword fallback)
    const latestMsgs = chat.getMessages() || [];
    const conversationMessages = latestMsgs
      .filter(m => m.role === 'user' || m.role === 'assistant')
      .filter(m => !m.streaming)
      .map(m => ({ role: m.role, content: m.content }));
-    generateLLMSuggestions(conversationMessages, set).catch(err => {
+    // Consume prefetched context (started in sendMessage during streaming)
    const prefetchPromise = _activeSuggestionContextPrefetch;
    _activeSuggestionContextPrefetch = null;
    // Fire suggestion generation immediately — don't wait for memory extraction
    const fireSuggestions = (ctx?: SuggestionContext) => {
      generateLLMSuggestions(conversationMessages, set, ctx).catch(err => {
        log.warn('Suggestion generation error:', err);
        set({ suggestionsLoading: false });
      });
    };
    if (prefetchPromise) {
      prefetchPromise.then(fireSuggestions).catch(() => fireSuggestions());
    } else {
      fireSuggestions();
    }
    // Background tasks run independently — never block suggestions
    getMemoryExtractor().extractFromConversation(filtered, agentId, convId ?? undefined)
      .catch(err => log.warn('Memory extraction failed:', err));
    intelligenceClient.reflection.recordConversation()
      .catch(err => log.warn('Recording conversation failed:', err))
      .then(() => intelligenceClient.reflection.shouldReflect())
      .then(shouldReflect => {
        if (shouldReflect) {
          intelligenceClient.reflection.reflect(agentId, []).catch(err => {
            log.warn('Reflection failed:', err);
          });
        }
      }).catch(() => {});
  };
 }
 export interface StreamState {
@@ -559,15 +609,32 @@ function parseSuggestionResponse(raw: string): string[] {
 async function generateLLMSuggestions(
  messages: Array<{ role: string; content: string }>,
  set: (partial: Partial<StreamState>) => void,
  context?: SuggestionContext,
 ): Promise<void> {
  set({ suggestionsLoading: true });
  try {
-    const recentMessages = messages.slice(-6);
+    const recentMessages = messages.slice(-20);
-    const context = recentMessages
+    const conversationContext = recentMessages
-      .map(m => `${m.role === 'user' ? '用户' : '助手'}: ${m.content}`)
+      .map(m => `${m.role === 'user' ? '用户' : '助手'}: ${m.content.slice(0, 200)}`)
      .join('\n\n');
    // Build dynamic user message with intelligence context
    const ctx = context ?? { userProfile: '', painPoints: '', experiences: '', skillMatch: '' };
    const hasContext = ctx.userProfile || ctx.painPoints || ctx.experiences || ctx.skillMatch;
    let userMessage: string;
    if (hasContext) {
      const sections: string[] = ['以下是用户的背景信息，请在生成建议时参考：\n'];
      if (ctx.userProfile) sections.push(`## 用户画像\n${ctx.userProfile}`);
      if (ctx.painPoints) sections.push(`## 活跃痛点\n${ctx.painPoints}`);
      if (ctx.experiences) sections.push(`## 相关经验\n${ctx.experiences}`);
      if (ctx.skillMatch) sections.push(`## 可用技能\n${ctx.skillMatch}`);
      sections.push(`\n最近对话：\n${conversationContext}`);
      userMessage = sections.join('\n\n');
    } else {
      userMessage = `以下是对话中最近的消息：\n\n${conversationContext}\n\n请生成 3 个后续问题。`;
    }
    const connectionMode = typeof localStorage !== 'undefined'
      ? localStorage.getItem('zclaw-connection-mode')
      : null;
@@ -575,9 +642,9 @@ async function generateLLMSuggestions(
    let raw: string;
    if (connectionMode === 'saas') {
-      raw = await llmSuggestViaSaaS(context);
+      raw = await llmSuggestViaSaaS(userMessage);
    } else {
-      raw = await llmSuggest(context);
+      raw = await llmSuggest(userMessage);
    }
    const suggestions = parseSuggestionResponse(raw);
@@ -601,7 +668,7 @@ async function generateLLMSuggestions(
 * with non-streaming requests. Collects the full response from SSE deltas,
 * then parses the suggestion JSON from the accumulated text.
 */
-async function llmSuggestViaSaaS(context: string): Promise<string> {
+async function llmSuggestViaSaaS(userMessage: string): Promise<string> {
  const { saasClient } = await import('../../lib/saas-client');
  const { useConversationStore } = await import('./conversationStore');
  const { useSaaSStore } = await import('../saasStore');
@@ -611,9 +678,6 @@ async function llmSuggestViaSaaS(context: string): Promise<string> {
  const model = currentModel || (availableModels.length > 0 ? availableModels[0]?.id : undefined);
  if (!model) throw new Error('No model available for suggestions');
  // Delay to avoid concurrent relay requests with memory extraction
  await new Promise(r => setTimeout(r, 2000));
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 60000);
@@ -623,7 +687,7 @@ async function llmSuggestViaSaaS(context: string): Promise<string> {
        model,
        messages: [
          { role: 'system', content: LLM_PROMPTS_SYSTEM },
-          { role: 'user', content: `以下是对话中最近的消息：\n\n${context}\n\n请生成 3 个后续问题。` },
+          { role: 'user', content: userMessage },
        ],
        max_tokens: 500,
        temperature: 0.7,
@@ -664,17 +728,7 @@ async function llmSuggestViaSaaS(context: string): Promise<string> {
  }
 }
-const LLM_PROMPTS_SYSTEM = `你是对话分析助手。根据最近的对话内容，生成 3 个用户可能想继续探讨的问题。
+const LLM_PROMPTS_SYSTEM = LLM_PROMPTS.suggestions.system;
 要求：
 - 每个问题必须与对话内容直接相关，具体且有针对性
 - 帮助用户深入理解、实际操作或拓展思路
 - 每个问题不超过 30 个中文字符
 - 不要重复对话中已讨论过的内容
 - 使用与用户相同的语言
 只输出 JSON 数组，包含恰好 3 个字符串。不要输出任何其他内容。
 示例：["如何在生产环境中部署？", "这个方案的成本如何？", "有没有更简单的替代方案？"]`;
 // ---------------------------------------------------------------------------
 // ChatStore injection (avoids circular imports)
@@ -786,6 +840,9 @@ export const useStreamStore = create<StreamState>()(
    });
    set({ isStreaming: true, activeRunId: null });
    // Prefetch suggestion context during streaming — saves ~0.5-1s post-stream
    _activeSuggestionContextPrefetch = fetchSuggestionContext(agentId, content);
    // Delta buffer — batches updates at ~60fps
    const buffer = new DeltaBuffer(assistantId, _chat);
@@ -1001,6 +1058,13 @@ export const useStreamStore = create<StreamState>()(
              return { ...m, toolSteps: steps };
            })
          );
          // Auto-create artifact from tool output (agent stream path)
          tryCreateArtifactFromToolOutput(
            delta.tool || 'unknown',
            delta.toolInput || '',
            delta.toolOutput,
          );
        } else {
          // toolStart: create new running step
          const step: ToolCallStep = {
@@ -1059,10 +1123,20 @@ export const useStreamStore = create<StreamState>()(
              .filter(m => !m.streaming)
              .map(m => ({ role: m.role, content: m.content }));
-            generateLLMSuggestions(conversationMessages, set).catch(err => {
+            // Path B: use prefetched context for agent stream — fixes zero-personalization
            const prefetchPromise = _activeSuggestionContextPrefetch;
            _activeSuggestionContextPrefetch = null;
            const fireSuggestions = (ctx?: SuggestionContext) => {
              generateLLMSuggestions(conversationMessages, set, ctx).catch(err => {
                log.warn('Suggestion generation error:', err);
                set({ suggestionsLoading: false });
              });
            };
            if (prefetchPromise) {
              prefetchPromise.then(fireSuggestions).catch(() => fireSuggestions());
            } else {
              fireSuggestions();
            }
          }
        }
      } else if (delta.stream === 'hand') {
--- a/docs/references/artifact-system-reference.md
+++ b/docs/references/artifact-system-reference.md
@@ -0,0 +1,309 @@
 # 产物系统参考文档
 > 调研 DeerFlow 和 Hermes Agent 的产物/输出面板实现，为 ZCLAW 产物系统重构提供参考。
 > 分析日期：2026-04-24
 ---
 ## 一、DeerFlow 产物系统
 DeerFlow 有完整的全栈产物管道，是主要参考对象。
 ### 1.1 端到端数据流
 ```
 Agent tool call (write_file / str_replace / present_files)
    ↓
 Backend: ThreadState.artifacts (LangGraph annotated list, merge_artifacts reducer 去重)
    ↓ 文件写入: {base_dir}/threads/{thread_id}/user-data/outputs/
    ↓ 虚拟路径: /mnt/user-data/outputs/filename.ext
    ↓
 Backend API: GET /api/threads/{thread_id}/artifacts/{virtual_path}
    ↓ MIME 检测 / .skill ZIP 解压 / download vs inline
    ↓
 Frontend: thread.values.artifacts (string[]) → ArtifactsProvider context
    ↓
 ChatBox (ResizablePanelGroup) → chat(60%) | artifact panel(40%)
    ↓
 ArtifactFileDetail → CodeMirror(代码) / Streamdown(Markdown) / iframe(HTML)
 ```
 ### 1.2 关键文件
 #### 前端核心
 | 文件 | 职责 |
 |------|------|
 | `frontend/src/core/artifacts/utils.ts` | URL 构建、产物列表提取、路径解析 |
 | `frontend/src/core/artifacts/loader.ts` | 从后端 API 获取产物文本；从 tool call args 直接提取内容 |
 | `frontend/src/core/artifacts/hooks.ts` | TanStack React Query hook，5 分钟缓存 |
 | `frontend/src/components/workspace/artifacts/context.tsx` | ArtifactsProvider + useArtifacts() — 管理列表、选中、开关、自动选中 |
 | `frontend/src/components/workspace/artifacts/artifact-file-detail.tsx` | 产物详情视图：头部(文件选择器+code/preview切换) + CodeEditor/Preview |
 | `frontend/src/components/workspace/artifacts/artifact-file-list.tsx` | 卡片式列表视图，每个卡片含图标/名称/扩展名/下载/安装按钮 |
 | `frontend/src/components/workspace/artifacts/artifact-trigger.tsx` | 头部触发按钮，仅在产物存在时显示 |
 #### 前端渲染
 | 文件 | 职责 |
 |------|------|
 | `frontend/src/components/workspace/code-editor.tsx` | CodeMirror 只读编辑器，支持 CSS/HTML/JS/JSON/MD/Python 语法高亮 |
 | `frontend/src/components/ai-elements/code-block.tsx` | Shiki 语法高亮代码块，双主题(light/dark) |
 | `frontend/src/components/ai-elements/web-preview.tsx` | iframe 网页预览，含地址栏和导航按钮 |
 | `frontend/src/components/workspace/messages/markdown-content.tsx` | Streamdown 渲染 Markdown (GFM + Math + Raw HTML + KaTeX) |
 | `frontend/src/core/utils/files.tsx` | 140+ 扩展名→语言映射，文件图标/类型判断 |
 #### 后端
 | 文件 | 职责 |
 |------|------|
 | `backend/.../thread_state.py` | ThreadState.artifacts 列表 + merge_artifacts 去重 reducer |
 | `backend/.../present_file_tool.py` | present_files 工具 — 标准化路径，返回 Command(update) |
 | `backend/.../paths.py` | 路径管理：threads/{id}/user-data/{workspace,uploads,outputs} |
 | `backend/app/gateway/routers/artifacts.py` | FastAPI 路由：GET 产物文件，MIME 检测，安全处理 |
 ### 1.3 支持的内容类型
 | 类型 | 渲染方式 |
 |------|----------|
 | 代码文件 (140+ 扩展名) | CodeMirror 只读 + 语法高亮 |
 | Markdown (.md) | Streamdown (GFM + Math + KaTeX + Raw HTML) |
 | HTML (.html/.htm) | 沙箱 `<iframe>` (srcDoc) |
 | 图片 (.png/.jpg/.svg/.webp) | `<img>` 标签，非代码文件用 iframe |
 | .skill 压缩包 | ZIP 解压，SKILL.md 渲染为 Markdown |
 | 二进制文件 (PDF 等) | 后端 inline Content-Disposition |
 | 文本文件 (.txt/.csv/.log) | CodeMirror 纯文本模式 |
 ### 1.4 持久化架构
 **磁盘存储：**
 ```
 {DEER_FLOW_HOME}/threads/{thread_id}/user-data/outputs/
 ```
 **状态持久化：** artifacts 列表是 LangGraph ThreadState 的一部分，由 checkpoint 系统自动持久化。
 **前端缓存：** TanStack React Query，5 分钟 stale time。
 ### 1.5 UI/UX 设计模式
 #### 分栏布局 (chat-box.tsx)
 - `react-resizable-panels` 水平分栏
 - 关闭态：chat=100%, artifacts=0%
 - 打开态：chat=60%, artifacts=40%
 - 300ms CSS 过渡动画
 #### 自动打开 + 自动选中
 - 检测到 `write_file` / `str_replace` tool call 时自动打开面板并选中文件
 - `autoOpen` / `autoSelect` 标志防止用户手动关闭后重复打开
 #### 代码/预览切换
 - HTML/Markdown 默认 Preview，其他默认 Code
 - Preview 用 Streamdown(MD) 或 iframe(HTML)
 #### 头部操作栏
 - 文件选择器下拉菜单（不用返回列表即可切换）
 - 复制 / 下载 / 新窗口打开 / 关闭
 #### 聊天内嵌展示
 - `present_files` tool call → 聊天流内渲染卡片网格
 - 点击卡片 → 侧栏打开该文件
 #### 双路径方案
 1. **真实文件路径** — 从后端 API 获取，React Query 缓存
 2. **`write-file:` 虚拟路径** — 直接从 tool call args 提取内容，无需后端请求，支持流式显示
 ### 1.6 Provider 层级
 ```
 ArtifactsProvider → 提供useArtifacts() context
  ChatBox → ResizablePanelGroup
    Panel(chat) → MessageList → ToolCall 自动打开产物面板
    Panel(artifacts) → ArtifactFileDetail → useArtifactContent() hook
 ```
 ---
 ## 二、Hermes Agent 产物机制
 > **结论：Hermes Agent 无产物面板、无 Web 前端、无分栏布局。** 它是终端 CLI 工具，所有输出在终端内联渲染。但有值得借鉴的大输出处理机制。
 ### 2.1 项目定位
 Hermes Agent 是 **Python CLI/TUI Agent**（类似 Claude Code），通过 prompt_toolkit TUI 运行，同时支持 Telegram/Discord/Slack/WhatsApp 等 IM 平台网关。
 **无 React/Next.js/Web UI。** 暴露 OpenAI 兼容 API 供 Open WebUI/LobeChat 等第三方 UI 接入。
 ### 2.2 大输出处理（3 层防御）
 这是唯一接近"产物管理"的机制，值得借鉴。
 **文件：`tools/tool_result_storage.py`**
 | 层级 | 机制 | 说明 |
 |------|------|------|
 | Layer 1 | 工具自身截断 | 每个工具限制自己的输出长度 |
 | Layer 2 | `maybe_persist_tool_result` | 单个结果超阈值 → 写入沙箱临时文件，上下文中替换为 `<persisted-output>` 预览块 |
 | Layer 3 | `enforce_turn_budget` | 整轮超过 200K 字符 → 最大的几个溢出到磁盘 |
 核心逻辑：
 ```python
 # 超阈值时：完整内容写入文件，上下文替换为预览
 remote_path = f"{storage_dir}/{tool_use_id}.txt"
 _write_to_sandbox(content, remote_path, env)
 return _build_persisted_message(preview, has_more, len(content), remote_path)
 # 后续 agent 可用 read_file + offset/limit 读取完整内容
 ```
 ### 2.3 预算配置
 **文件：`tools/budget_config.py`**
 | 参数 | 默认值 |
 |------|--------|
 | `DEFAULT_RESULT_SIZE_CHARS` | 100,000（单工具阈值）|
 | `DEFAULT_TURN_BUDGET_CHARS` | 200,000（整轮上限）|
 | `DEFAULT_PREVIEW_SIZE_CHARS` | 1,500（内联预览长度）|
 ### 2.4 CLI 渲染方式
 **文件：`agent/display.py`**
 - **工具进度**：KawaiiSpinner 动画 + 一行摘要
 - **文件编辑**：内联 colored unified diff（write_file / patch 工具）
 - **最终响应**：Rich Panel 边框包裹，主题色可换（7 套 skin）
 ### 2.5 会话持久化
 **文件：`hermes_state.py`**
 SQLite (`~/.hermes/state.db`) + FTS5 全文搜索：
 - sessions 表：元数据、模型配置、token 计数、费用、标题
 - messages 表：role、content、tool_call_id、reasoning、时间戳
 ### 2.6 值得借鉴的点
 | 点 | 借鉴价值 |
 |----|----------|
 | 大输出溢出到磁盘 + 内联预览 | 解决 context window 溢出问题 |
 | 3 层递进防御 | 对 ZCLAW 中间件链有参考价值 |
 | 预算配置化 | 阈值可调，不同场景不同策略 |
 ---
 ## 三、对比分析：ZCLAW 现状 vs 参考方案
 ### 3.1 现状差距
 | 维度 | DeerFlow | ZCLAW 现状 | 差距 |
 |------|----------|------------|------|
 | 数据源 | 3 个工具(present_files/write_file/str_replace)主动注册 | 仅 streamStore 解析 tool output 的 filePath | 极窄，几乎不触发 |
 | 持久化 | 磁盘文件 + LangGraph checkpoint | 纯内存 Zustand | 刷新即丢失 |
 | 渲染-代码 | CodeMirror 只读 + 语法高亮 (140+ 语言) | 纯 `<pre>` 标签，无高亮 | 无高亮 |
 | 渲染-Markdown | Streamdown (GFM+Math+KaTeX+RawHTML) | 手写 30 行正则渲染器 | 仅标题/粗体/列表 |
 | 渲染-HTML | 沙箱 iframe | 不支持 | 无 |
 | 渲染-图片 | `<img>` + iframe | 类型声明了无实现 | 无 |
 | 渲染-表格 | GFM 表格 | 纯文本 `<pre>` | 无 |
 | 面板布局 | react-resizable-panels 60/40 | react-resizable-panels 65/35 | 已有，可复用 |
 | 自动打开 | write_file/str_replace 触发 | addArtifact 时打开 | 已有 |
 | 文件选择 | 下拉菜单不离开详情视图 | 必须返回列表再选 | 体验差 |
 | 聊天内嵌 | present_files → 卡片网格 | 无 | 缺失 |
 | 缓存 | React Query 5min | 无 | 缺失 |
 | 双路径 | 真实路径 + write-file: 虚拟路径 | 仅运行时内存 | 缺失 |
 | 右面板重叠 | 单一面板 | ArtifactPanel + RightPanel"文件"tab 职责交叉 | 架构问题 |
 ### 3.2 核心差距总结
 **按优先级排列：**
 1. **P0 数据源断裂** — 产物几乎没有来源，是最根本的问题
 2. **P0 无持久化** — 产物刷新即丢
 3. **P1 Markdown 渲染残缺** — 30 行正则 vs 完整 GFM 渲染器
 4. **P1 代码无语法高亮** — 纯 `<pre>` vs CodeMirror/Shiki
 5. **P2 双面板职责交叉** — ArtifactPanel vs RightPanel"文件"tab
 6. **P2 缺少详情内文件切换** — 需返回列表才能切换文件
 7. **P3 聊天内嵌产物卡片缺失**
 8. **P3 HTML/图片/表格渲染缺失**
 ### 3.3 推荐方案
 #### 方案 A：最小可行（基于现有架构补全）
 在现有 ArtifactPanel + artifactStore 上修补：
 - **数据源**：扩展 streamStore 中的 tool output 解析，覆盖更多工具类型
 - **持久化**：artifactStore 追加 IndexedDB 写入（复用 messageStore 模式）
 - **Markdown**：引入 `react-markdown` + `remark-gfm` 替换手写渲染器
 - **代码高亮**：引入 `shiki` 或 `highlight.js`
 - **合并面板**：RightPanel "文件"tab 功能合并到 ArtifactPanel，删除 RightPanel 的 files tab
 **工作量**：~2-3 天
 #### 方案 B：参照 DeerFlow 重构（推荐）
 借鉴 DeerFlow 架构但适配 ZCLAW Tauri 本地架构：
 | DeerFlow 组件 | ZCLAW 适配 |
 |---------------|------------|
 | FastAPI 产物路由 | Tauri 命令 `artifact_list` / `artifact_read` / `artifact_serve` |
 | 磁盘 outputs/ 目录 | `{workspace}/artifacts/{session_key}/` |
 | LangGraph checkpoint | SQLite (已有 zclaw-memory) |
 | React Query 缓存 | TanStack Query 或 Zustand + stale cache |
 | CodeMirror 只读 | 引入 @uiw/react-codemirror |
 | Streamdown MD | react-markdown + remark-gfm + rehype-katex |
 | iframe HTML 预览 | Tauri webview window (安全隔离) |
 **核心改动清单：**
 1. **Rust 侧**（zclaw-kernel）：
   - 新增 `artifact_create` / `artifact_list` / `artifact_read` Tauri 命令
   - 产物写入 `{workspace}/artifacts/{session_key}/`
   - 中间件链中 ToolEnd 事件触发产物注册
 2. **前端 Store**：
   - artifactStore 增加 IndexedDB 持久化
   - 从 streamStore 解耦产物创建逻辑到独立 hook
 3. **前端组件**：
   - 替换 MarkdownPreview → react-markdown + GFM
   - 引入 CodeMirror/shiki 代码高亮
   - 详情视图增加文件下拉切换
   - RightPanel "文件" tab 合并或移除
 **工作量**：~5-7 天
 #### 方案 C：借鉴 Hermes 防御机制（附加）
 无论选 A 还是 B，都可叠加 Hermes 的大输出防御：
 - 中间件链 ToolOutputGuard 层增加溢出检测
 - 超阈值产物自动持久化到磁盘，上下文替换为 `<persisted-output>` 预览
 - agent 可通过 read_file 回读完整内容
 ---
 ## 四、关键依赖库参考
 | 库 | 用途 | DeerFlow 使用 | 推荐 |
 |----|------|--------------|------|
 | react-markdown | Markdown 渲染 | ✅ (Streamdown) | ✅ |
 | remark-gfm | GFM 表格/删除线/任务列表 | ✅ | ✅ |
 | rehype-katex | 数学公式渲染 | ✅ | 按需 |
 | @uiw/react-codemirror | 代码编辑器/高亮 | ✅ | ✅ |
 | shiki | 静态代码高亮 | ✅ (chat 内代码块) | ✅ |
 | react-resizable-panels | 分栏布局 | ✅ | 已有 |
 | @tanstack/react-query | 数据缓存 | ✅ | 可选 |
 ---
 ## 五、文件索引
 | 参考项目 | 关键路径 |
 |----------|----------|
 | DeerFlow 前端 | `G:/deerflow/frontend/src/components/workspace/artifacts/` |
 | DeerFlow 前端工具 | `G:/deerflow/frontend/src/core/artifacts/` |
 | DeerFlow 布局 | `G:/deerflow/frontend/src/components/workspace/chats/chat-box.tsx` |
 | DeerFlow 代码编辑 | `G:/deerflow/frontend/src/components/workspace/code-editor.tsx` |
 | DeerFlow 后端路由 | `G:/deerflow/backend/app/gateway/routers/artifacts.py` |
 | DeerFlow 后端工具 | `G:/deerflow/backend/packages/harness/deerflow/tools/builtins/present_file_tool.py` |
 | Hermes 输出管理 | `G:/hermes-agent-main/tools/tool_result_storage.py` |
 | Hermes 预算配置 | `G:/hermes-agent-main/tools/budget_config.py` |
--- a/docs/references/deerflow-toolcall-reference.md
+++ b/docs/references/deerflow-toolcall-reference.md
@@ -0,0 +1,212 @@
 # DeerFlow 工具调用系统参考文档
 > 调研 DeerFlow 的工具调用完整流程，为 ZCLAW 工具调用问题排查提供参考。
 > 分析日期：2026-04-24
 ---
 ## 一、端到端数据流
 ```
 用户消息
  → FastAPI Gateway (/api/threads/{id}/runs/stream)
    → services.start_run() → asyncio.create_task(run_agent(...))
      → LangGraph Agent Graph (create_agent)
        → LLM Model (ChatOpenAI / Claude)
          → AIMessage (含 tool_calls 列表)
            → 14 层 Middleware 链处理
              → ToolNode (LangGraph 内置, 按 tool_call.name 路由)
                → ToolMessage (执行结果)
                  → 再次调用 LLM (带着 ToolMessage 继续)
                    → StreamBridge.publish() → asyncio.Queue
                      → SSE → 前端 useStream hook
                        → React 组件渲染
 ```
 ## 二、工具注册与执行
 ### 2.1 注册入口
 **文件**: `G:/deerflow/backend/packages/harness/deerflow/tools/tools.py` — `get_available_tools()`
 工具来自四个来源：
 | 来源 | 加载方式 | 示例 |
 |------|----------|------|
 | Config 工具 | YAML 配置 + 反射导入 (`module:variable`) | `deerflow.sandbox.tools:bash_tool` |
 | Builtin 工具 | 硬编码导入 | `present_file_tool`, `ask_clarification_tool` |
 | MCP 工具 | `MultiServerMCPClient` 从 MCP 服务器缓存获取 | 第三方 MCP 工具 |
 | ACP 工具 | `build_invoke_acp_agent_tool()` 动态构建 | 外部 agent 调用 |
 ### 2.2 Sandbox 工具清单
 **文件**: `G:/deerflow/backend/packages/harness/deerflow/sandbox/tools.py`
 | 工具名 | 功能 |
 |--------|------|
 | `bash` | 沙箱中执行命令 |
 | `ls` | 列出目录 |
 | `read_file` | 读取文件 |
 | `write_file` | 写入文件（触发产物面板自动打开） |
 | `str_replace` | 字符串替换（触发产物面板自动打开） |
 ### 2.3 Builtin 工具
 **文件**: `G:/deerflow/backend/packages/harness/deerflow/tools/builtins/`
 | 工具 | 功能 |
 |------|------|
 | `ask_clarification` | 向用户提问澄清（中断执行等待回复） |
 | `present_file` | 展示文件给用户（触发产物卡片） |
 | `setup_agent` | 自定义 agent 创建 |
 | `task_tool` | 子 agent 任务委派 |
 | `view_image` | 图片查看（仅视觉模型） |
 | `tool_search` | 延迟工具搜索（MCP 工具按需暴露） |
 ## 三、中间件链（14 层）
 **文件**: `G:/deerflow/backend/packages/harness/deerflow/agents/lead_agent/agent.py` — `_build_middlewares()`
 与工具调用相关的关键中间件：
 ### 3.1 DanglingToolCallMiddleware
 **文件**: `dangling_tool_call_middleware.py`
 在 `wrap_model_call` 中检测消息历史中缺失 ToolMessage 的 AIMessage，自动注入占位 ToolMessage：
 ```python
 ToolMessage(
    content="[Tool call was interrupted and did not return a result.]",
    tool_call_id=tc_id,
    name=tc.get("name", "unknown"),
    status="error",
 )
 ```
 ### 3.2 ToolErrorHandlingMiddleware
 **文件**: `tool_error_handling_middleware.py`
 在 `wrap_tool_call` 中捕获工具执行异常，转换为错误 ToolMessage 而非让整个 run 崩溃。
 ### 3.3 LoopDetectionMiddleware
 **文件**: `loop_detection_middleware.py`
 在 `after_model` 中检测重复工具调用：
 - 阈值 3 次 → 注入警告 HumanMessage
 - 阈值 5 次 → 直接清空 tool_calls，强制 LLM 产出文本回答
 ### 3.4 DeferredToolFilterMiddleware
 **文件**: `deferred_tool_filter_middleware.py`
 在 `wrap_model_call` 中过滤延迟注册的 MCP 工具 schema，仅在 LLM 通过 `tool_search` 发现后才暴露。
 ### 3.5 ClarificationMiddleware
 拦截 `ask_clarification` 工具调用，中断执行等待用户回复。
 ### 3.6 SubagentLimitMiddleware
 截断过多的并行子 agent 调用。
 ## 四、工具结果回传
 ### 4.1 格式
 LangChain 的 `ToolMessage`，包含：
 - `content`: 执行结果文本
 - `tool_call_id`: 匹配 AIMessage 中的 tool_call ID
 - `name`: 工具名称
 - `status`: `"error"` 或省略
 ### 4.2 特殊工具
 `present_file_tool` 返回 `Command` 而非纯字符串，同时更新 `artifacts` 和 `messages` 两个 state channel。
 ## 五、前端工具调用展示
 ### 5.1 消息分组
 **文件**: `G:/deerflow/frontend/src/core/messages/utils.ts` — `groupMessages()`
 | 分组类型 | 触发条件 | 展示 |
 |----------|----------|------|
 | `assistant:processing` | AI 消息含 tool_calls 或 reasoning | MessageGroup (折叠) |
 | `assistant` | AI 消息有文本无 tool_calls | MessageListItem (气泡) |
 | `assistant:present-files` | 含 present_files tool call | ArtifactFileList |
 | `assistant:clarification` | ask_clarification 结果 | MarkdownContent |
 | `assistant:subagent` | 含 task tool call | SubtaskCard |
 ### 5.2 工具状态推断
 前端**没有显式状态机**。通过消息序列推断：
 - AI 消息含 tool_calls 但无对应 ToolMessage → 正在执行
 - ToolMessage 出现 → 执行完成
 - `assistant:processing` 组由 `ChainOfThought` 折叠组件包裹
 ### 5.3 工具调用 UI
 **文件**: `message-group.tsx` 第 186-423 行
 按工具名渲染不同图标和内容：
 - `bash` → 终端图标 + 命令代码块
 - `read_file`/`write_file`/`str_replace` → 文件图标 + 路径链接（点击打开产物面板）
 - `web_search` → 搜索图标 + 结果链接
 - 默认 → 扳手图标 + 工具名
 ## 六、流式处理中的工具调用
 ### 6.1 架构
 ```
 agent.astream(stream_mode=["values"])
  → StreamBridge (asyncio.Queue per run, maxsize=256)
    → sse_consumer() → SSE frames → 前端
 ```
 ### 6.2 关键特征
 - 工具调用**不中断**流。LangGraph 自动在 agent_node 和 tool_node 之间路由
 - 每次状态变更产出完整的 `values` 快照，前端通过 `seen_ids` 去重
 - 15 秒心跳包保持 SSE 连接
 ### 6.3 前端看到的事件序列
 1. `values` 事件: 含 `tool_calls` 的 AIMessage
 2. `values` 事件: ToolMessage（工具结果）
 3. `values` 事件: LLM 基于工具结果的最终回答
 整个过程连续，不中断 SSE 连接。
 ## 七、与 ZCLAW 对比（工具调用）
 | 维度 | DeerFlow | ZCLAW |
 |------|----------|-------|
 | 框架 | LangGraph (graph-based) | 自研 loop_runner (循环) |
 | 工具生命周期 | LangGraph ToolNode 自动管理 | 手动 ToolRegistry + loop_runner |
 | after_tool_call 中间件 | ✅ wrap_tool_call 钩子完整 | ❌ 流式和非流式模式均未调用 |
 | 并行工具执行 | LangGraph 自动处理 | 非流式有 JoinSet，流式全串行 |
 | 悬挂修复 | DanglingToolCallMiddleware | DanglingToolMiddleware (有) |
 | 错误恢复 | ToolErrorHandlingMiddleware (异常→ToolMessage) | ToolErrorMiddleware (计数器) |
 | 循环检测 | LoopDetectionMiddleware (3次警告/5次强停) | LoopGuardMiddleware (有) |
 | 前端状态 | 消息序列推断 | 显式 ToolCallStep 状态机 |
 | MCP 工具 | 延迟注册 + tool_search 按需暴露 | 全量注册 |
 ## 八、关键文件索引
 | 功能 | DeerFlow 文件 |
 |------|-------------|
 | Agent 工厂 | `backend/packages/harness/deerflow/agents/lead_agent/agent.py` |
 | 中间件组装 | `backend/packages/harness/deerflow/agents/factory.py` |
 | 工具注册 | `backend/packages/harness/deerflow/tools/tools.py` |
 | Sandbox 工具 | `backend/packages/harness/deerflow/sandbox/tools.py` |
 | Builtin 工具 | `backend/packages/harness/deerflow/tools/builtins/` |
 | 错误处理中间件 | `agents/middlewares/tool_error_handling_middleware.py` |
 | 悬挂修复中间件 | `agents/middlewares/dangling_tool_call_middleware.py` |
 | 循环检测中间件 | `agents/middlewares/loop_detection_middleware.py` |
 | 延迟过滤中间件 | `agents/middlewares/deferred_tool_filter_middleware.py` |
 | 流式 Bridge | `runtime/stream_bridge/memory.py` |
 | 前端消息分组 | `frontend/src/core/messages/utils.ts` |
 | 前端工具调用组件 | `frontend/src/components/workspace/messages/message-group.tsx` |
--- a/docs/references/zclaw-toolcall-issues.md
+++ b/docs/references/zclaw-toolcall-issues.md
@@ -0,0 +1,141 @@
 # ZCLAW 工具调用问题分析
 > 对比 DeerFlow 工具调用系统，排查 ZCLAW 工具调用问题。
 > 分析日期：2026-04-24
 > 更新日期：2026-04-24（P0+P0-stream_errored 已修复）
 ---
 ## 一、发现的问题
 ### P0: `after_tool_call` 中间件从未被调用 — ✅ 已修复 (2026-04-24)
 **文件**: `crates/zclaw-runtime/src/loop_runner.rs`
 在 `run()`（非流式，第 400-558 行）和 `run_streaming`（流式，第 893-1070 行）中，工具执行后直接 push `Message::tool_result` 到消息历史，**没有调用 `middleware_chain.run_after_tool_call()`**。
 **影响**:
 - `ToolErrorMiddleware.after_tool_call` 的错误计数和恢复消息逻辑不生效
 - `ToolOutputGuardMiddleware.after_tool_call` 的敏感信息检测不生效
 - 工具错误只能靠工具自身的错误返回传递，中间件层的防护形同虚设
 **DeerFlow 对比**: `ToolErrorHandlingMiddleware` 通过 `wrap_tool_call` 钩子完整包裹每次工具执行。
 ### P0: `stream_errored` 跳过所有工具执行 — ✅ 已修复 (2026-04-24)
 **文件**: `crates/zclaw-runtime/src/loop_runner.rs` 第 872-876 行
 流式模式中，当 LLM 流出现任何错误（网络超时、API 错误、驱动错误）时，`stream_errored = true`，然后 `break 'outer` 直接退出循环，**跳过所有已解析的工具调用**。
 **影响**:
 - ToolStart 事件已发送给前端（用户看到"执行中"按钮），但工具从未实际执行
 - ToolEnd 事件永远不会发送 → 前端工具状态卡在"执行中"
 - 已完整接收（ToolUseEnd）的工具调用也被丢弃
 **修复**: 区分完整工具（收到 ToolUseEnd）和不完整工具（仅收到 ToolUseStart/Delta）。完整工具照常执行，不完整工具发送取消 ToolEnd 事件。
 ### P1: 流式模式工具全串行 — ✅ 已修复 (2026-04-24)
 **文件**: `loop_runner.rs` 流式模式工具执行段
 非流式模式有 `JoinSet` + `Semaphore(3)` 并行执行 ReadOnly 工具，但流式模式用简单 `for` 循环串行执行所有工具。
 **修复**: 流式模式采用三阶段执行：Phase 1 中间件预检(serial) → Phase 2 并行+串行分区执行 → Phase 3 after_tool_call + 结果排序推送。
 ### P2: OpenAI 驱动工具参数静默替换 — ✅ 已修复 (2026-04-24)
 **文件**: `crates/zclaw-runtime/src/driver/openai.rs` 第 222-228 行
 ```rust
 let parsed_args = if args.is_empty() {
    serde_json::json!({})
 } else {
    serde_json::from_str(args).unwrap_or_else(|e| {
        tracing::warn!("Failed to parse tool args '{}': {}", args, e);
        serde_json::json!({})
    })
 };
 ```
 JSON 解析失败时静默替换为 `{}`，结合 loop_runner.rs 的空参数处理（第 412-423 行），会注入 `_fallback_query` 替代实际参数。
 **修复**: 解析失败时返回 `_parse_error` + `_raw_args` 字段，让工具和 LLM 能感知到参数问题并自我修正。
 ### P2: ToolOutputGuard 过于激进 — ✅ 已修复 (2026-04-24)
 **文件**: `crates/zclaw-runtime/src/middleware/tool_output_guard.rs` 第 109 行
 使用 `to_lowercase()` 匹配敏感模式，合法内容中包含 "password"、"system:" 等字符串会被误拦。
 **修复**: 改用 `regex` 精确匹配实际密钥值格式（如 `sk-[a-zA-Z0-9]{20,}`、`AKIA[A-Z0-9]{16}`、`key=value` 模式），不再误拦仅包含关键词的合法内容。移除了 "system:" 等过于宽泛的注入检测模式。
 ### P2: ToolErrorMiddleware 失败计数器是全局的 — ✅ 已修复 (2026-04-24)
 **文件**: `crates/zclaw-runtime/src/middleware/tool_error.rs` 第 27 行
 `consecutive_failures: AtomicU32` 是结构体字段，所有 session 共享。高并发下 A session 失败 2 次 + B session 失败 1 次就会触发 AbortLoop（阈值 3）。
 **修复**: 改用 `Mutex<HashMap<String, u32>>` 以 session_id 为 key 存储计数，每个会话独立跟踪。
 ### P3: Gateway 客户端 onTool 回调语义不一致 — ✅ 已修复 (2026-04-24)
 **文件**: `desktop/src/lib/gateway-client.ts` 第 698-707 行
 `tool_call` 和 `tool_result` 两个 case 共用 `onTool` 回调，但参数约定不同，调用者必须通过 `output` 是否为空判断 start/end。
 **修复**: 明确 `tool_call` 的 output 始终为 `''`（修复了可能传递 data.output 的问题），添加清晰注释说明 start/end 语义约定。
 ---
 ## 二、根因分析
 工具调用问题最常见的故障模式：
 1. **LLM 返回的 tool_call 参数格式错误** → OpenAI 驱动静默替换为 `{}` → 工具以空参数执行 → 结果不符合预期
 2. **工具执行异常** → after_tool_call 中间件未调用 → 错误未格式化 → LLM 收到原始错误信息无法恢复
 3. **流被中断后重连** → DanglingToolMiddleware 修复悬挂 → 但如果修复逻辑本身有 bug（如重复修补），会导致消息膨胀
 ## 三、修复建议
 ### 修复 1: 在 loop_runner 中调用 after_tool_call
 **优先级**: P0
 **影响文件**: `loop_runner.rs`
 在非流式模式的工具执行循环中（约第 530 行），工具执行后调用：
 ```rust
 let after_result = middleware_chain.run_after_tool_call(
    &name, &input_json, &output_str, &mut ctx
 ).await;
 ```
 在流式模式的工具执行后（约第 1020 行），同样调用。
 ### 修复 2: 将 ToolErrorMiddleware 计数器改为 per-session
 **优先级**: P2
 **影响文件**: `middleware/tool_error.rs`
 使用 `HashMap<String, u32>` 以 session_id 为 key 存储计数。
 ### 修复 3: ToolOutputGuard 改为精确匹配
 **优先级**: P2
 **影响文件**: `middleware/tool_output_guard.rs`
 只在检测到独立的密钥值时触发（如 `sk-[48字符]`），而非单词级匹配。
 ---
 ## 四、关键文件
 | 文件 | 作用 |
 |------|------|
 | `crates/zclaw-runtime/src/loop_runner.rs` | 主循环，工具调度 |
 | `crates/zclaw-runtime/src/tool.rs` | ToolRegistry + Tool trait |
 | `crates/zclaw-runtime/src/middleware/tool_error.rs` | 工具错误处理 |
 | `crates/zclaw-runtime/src/middleware/tool_output_guard.rs` | 输出安全检查 |
 | `crates/zclaw-runtime/src/middleware/dangling_tool.rs` | 断裂工具修复 |
 | `crates/zclaw-runtime/src/driver/openai.rs` | OpenAI 兼容驱动 |
 | `desktop/src/lib/gateway-client.ts` | 前端通信客户端 |
 | `desktop/src/store/chat/streamStore.ts` | 前端流式处理 |
--- a/wiki/chat.md
+++ b/wiki/chat.md
@@ -1,6 +1,6 @@
 ---
 title: 聊天系统
-updated: 2026-04-22
+updated: 2026-04-23
 status: active
 tags: [module, chat, stream]
 ---
@@ -17,6 +17,7 @@ tags: [module, chat, stream]
 | 5 Store 拆分 | 原 908 行 ChatStore → stream/conversation/message/chat/artifact，单一职责 |
 | 5 分钟超时守护 | 防止流挂起: kernel-chat.ts:76，超时自动 cancelStream |
 | 统一回调接口 | 3 种实现共享 `{ onDelta, onThinkingDelta, onTool, onHand, onComplete, onError }` |
 | LLM 动态建议 | 替换硬编码关键词匹配，用 LLM 生成个性化建议（1深入追问+1实用行动+1管家关怀），4路并行预取智能上下文 |
 ### ChatStream 实现
@@ -33,11 +34,14 @@ tags: [module, chat, stream]
 | 文件 | 职责 |
 |------|------|
-| `desktop/src/store/chat/streamStore.ts` | 流式消息编排、发送、取消 |
+| `desktop/src/store/chat/streamStore.ts` | 流式消息编排、发送、取消、LLM 动态建议生成 |
 | `desktop/src/store/chat/conversationStore.ts` | 会话管理、当前模型、sessionKey |
 | `desktop/src/store/chat/messageStore.ts` | 消息持久化 (IndexedDB) |
 | `desktop/src/lib/kernel-chat.ts` | KernelClient ChatStream (Tauri) |
 | `desktop/src/lib/suggestion-context.ts` | 4路并行智能上下文拉取 (用户画像/痛点/经验/技能匹配) |
 | `desktop/src/lib/cold-start-mapper.ts` | 冷启动配置映射 (行业检测/命名/个性/技能) |
 | `desktop/src/components/ChatArea.tsx` | 聊天区域 UI |
 | `desktop/src/components/ai/SuggestionChips.tsx` | 动态建议芯片展示 |
 | `crates/zclaw-runtime/src/loop_runner.rs` | Rust 主聊天循环 + 中间件链 |
 ### 发送消息流
@@ -100,6 +104,20 @@ UI 选择模型 → conversationStore.currentModel = newModel
 - cancelStream 设置原子标志位，与 onDelta 回调无竞态
 - 3 种 ChatStream 共享同一套回调接口，上层代码无需感知实现差异
 - 消息持久化走 messageStore → IndexedDB，与流式渲染解耦
 - 动态建议 4 路并行预取 (userProfile/painPoints/experiences/skillMatch)，500ms 超时降级为空串
 - 建议生成与 memory extraction 解耦 — 不等 memory LLM 调用完成即启动建议
 ### LLM 动态建议
 ```
 sendMessage → isStreaming=true + _activeSuggestionContextPrefetch = fetchSuggestionContext(...)
  → 流式响应中 prefetch 在后台执行
 onComplete → createCompleteHandler
  → generateLLMSuggestions(prefetchedContext) — 立即启动不等 memory
    → prompt: 1 深入追问 + 1 实用行动 + 1 管家关怀
  → memory/reflection 后台独立运行 (Promise.all)
  → SuggestionChips 渲染
 ```
 ### Tauri 命令
@@ -114,6 +132,8 @@ UI 选择模型 → conversationStore.currentModel = newModel
 | 问题 | 状态 | 说明 |
 |------|------|------|
 | after_tool_call 中间件未调用 | ✅ 已修复 (04-24) | 流式+非流式均添加调用，ToolErrorMiddleware/ToolOutputGuard 现在生效 |
 | stream_errored 跳过所有工具 | ✅ 已修复 (04-24) | 完整工具照常执行，不完整工具发送取消事件 |
 | B-CHAT-07 混合域截断 | P2 Open | 跨域消息时可能截断上下文 |
 | SSE Token 统计为 0 | ✅ 已修复 | SseUsageCapture stream_done flag |
 | Tauri invoke 参数名 | ✅ 已修复 (f6c5dd2) | camelCase 格式 |
@@ -122,14 +142,15 @@ UI 选择模型 → conversationStore.currentModel = newModel
 **注意事项:**
 - 辅助 LLM 调用 (记忆摘要/提取、管家路由) 复用 `kernel_init` 的 model+base_url，与聊天同链路
 - 课堂聊天是独立 Tauri 命令 (`classroom_chat`)，不走 `agent_chat_stream`
 - Agent tab 已移除 — 跨会话身份由 soul.md 接管，不再通过 RightPanel 管理
 ## 5. 变更日志
 | 日期 | 变更 |
 |------|------|
 | 04-24 | 工具调用 P0 修复: after_tool_call 中间件接入(流式+非流式) + stream_errored 工具抢救(完整工具执行+不完整工具取消) |
 | 04-24 | 产物系统优化: MarkdownRenderer 提取共享 + ArtifactPanel react-markdown 渲染 + 文件选择器下拉 + 数据源扩展(file_write/str_replace 两路径) + artifactStore IndexedDB 持久化 |
 | 04-23 | 建议 prefetch: sendMessage 时启动 context 预取，流结束后立即消费，不等 memory extraction |
 | 04-23 | 建议 prompt 重写: 1深入追问+1实用行动+1管家关怀，上下文窗口 6→20 条 |
 | 04-23 | 身份信号: detectAgentNameSuggestion 前端即时检测 + RightPanel 监听 Tauri 事件刷新名称 |
-| 04-22 | Wiki 重写: 5 节模板，增加集成契约和不变量 |
+| 04-23 | Agent tab 移除: RightPanel 清理 ~280 行 dead code，身份由 soul.md 接管 |
 | 04-21 | 上一轮更新 |
 | 04-17 | ChatStore 拆分为 5 Store (stream/conversation/message/chat/artifact) |
 | 04-16 | Provider Key 解密修复 (b69dc61) |
 | 04-16 | Tauri invoke 参数名修复 (f6c5dd2) |
--- a/wiki/hands-skills.md
+++ b/wiki/hands-skills.md
@@ -133,6 +133,18 @@ skills/ -> SkillRegistry 加载 -> SkillIndexMiddleware@200 注入系统提示
 - MCP 限定名 `service_name.tool_name` 避免与内置工具冲突
 - 已删除空壳 Hands (04-17): Whiteboard/Slideshow/Speech，净减 ~5400 行
 ### ⚡ 新增工具/技能必须声明 concurrency 级别
 `Tool` trait 的 `concurrency()` 方法决定并行执行策略 (04-24 Hermes Phase 2A):
 | 级别 | 含义 | 适用场景 |
 |------|------|---------|
 | `ReadOnly` (默认) | 只读，始终可并行 | file_read, web_search, calculator |
 | `Exclusive` | 有副作用，必须串行 | file_write, shell_exec, send_message, execute_skill, task |
 | `Interactive` | 需要用户交互，永不并行 | ask_clarification |
 **新增工具时**：在 `impl Tool for YourTool` 中覆盖 `concurrency()` 方法。默认 `ReadOnly`，如果有写操作/副作用必须返回 `ToolConcurrency::Exclusive`。未正确声明会导致并行执行时产生竞态条件。
 ## 4. 活跃问题 + 陷阱
 ### 活跃
@@ -155,6 +167,7 @@ skills/ -> SkillRegistry 加载 -> SkillIndexMiddleware@200 注入系统提示
 | 日期 | 变更 | 关联 |
 |------|------|------|
 | 2026-04-24 | Hermes Phase 2A: ToolConcurrency 枚举 + 并行执行 + concurrency() 声明要求 | commit 9060935 |
 | 2026-04-22 | Wiki 5-section 重构: 281->~195 行，语义路由细节引用 [[butler]] | wiki/ |
 | 2026-04-22 | Researcher 搜索修复: schema 扁平化 + 空参数回退 + 排版修复 | commit 5816f56+81005c3 |
 | 2026-04-17 | 空壳 Hand 清理: Whiteboard/Slideshow/Speech 删除，净减 ~5400 行 | Phase 5 清理 |
--- a/wiki/index.md
+++ b/wiki/index.md
@@ -1,6 +1,6 @@
 ---
 title: ZCLAW 项目知识库
-updated: 2026-04-22
+updated: 2026-04-24
 status: active
 ---
@@ -8,29 +8,29 @@ status: active
 > 面向中文用户的 AI Agent 桌面客户端。管家模式 + 多模型 + 7 自主能力 + 75 技能。
 > **使用方式**: 找到你要处理的模块，读对应页面，直接开始工作。
-> **数据来源**: 2026-04-22 代码全量扫描验证，非文档推测。
+> **数据来源**: 2026-04-23 代码全量扫描验证，非文档推测。
 ## 项目画像
 | 维度 | 值 |
 |------|-----|
 | 定位 | AI Agent 桌面客户端 (Tauri 2.x) |
-| 技术栈 | Rust 10 crates + src-tauri (~102K行, 357 .rs) + React 19 + TypeScript + PostgreSQL |
+| 技术栈 | Rust 10 crates + src-tauri (~148K行, 384 .rs) + React 19 + TypeScript + PostgreSQL |
 | 阶段 | 发布前稳定化，功能冻结中 |
-## 关键数字（2026-04-22 代码验证）
+## 关键数字（2026-04-23 代码验证）
 | 指标 | 值 |
 |------|-----|
 | Rust Crates | 10 + src-tauri |
-| Rust 代码 | 101,967 行 (357 .rs文件) |
+| Rust 代码 | 148,185 行 (384 .rs文件) |
-| Rust 测试 | 987 定义 / 797 通过 |
+| Rust 测试 | 997 定义 (619 #[test] + 378 #[tokio::test]) |
-| Tauri 命令 | 190 定义 / 97 @reserved / 104 invoke |
+| Tauri 命令 | 193 定义 / 104 invoke |
 | SaaS API | 137 .route() / 16 模块 / 38 SQL 迁移 / 42 表 |
 | 中间件 | 14 层 runtime + 10 层 SaaS HTTP |
 | SKILL / HAND | 75 技能目录 / 7 注册 Hand (6 TOML + _reminder) |
 | Pipeline | 18 YAML 模板 (8 目录) |
-| 前端 | 25 Store / 102 组件 / 75 lib / 17 Admin 页面 |
+| 前端 | 25 Store / 103 组件 / 78 lib / 17 Admin 页面 |
 | Intelligence | 16 .rs 文件 |
 | 质量指标 | 0 cargo warnings / 2 TODO/FIXME / 0 dead_code |
@@ -38,13 +38,13 @@ status: active
 | 类别 | 功能 | 入口 | Wiki |
 |------|------|------|------|
-| 对话 | 发消息、流式响应、多模型切换 | 聊天面板 | [[chat]] |
+| 对话 | 发消息、流式响应、多模型切换、LLM 动态建议 | 聊天面板 | [[chat]] |
-| 分身 | 创建/切换/配置 Agent | 侧边栏 Agent 列表 | [[chat]] |
+| 分身 | 创建/切换/配置 Agent、跨会话身份记忆 (soul.md) | 侧边栏 Agent 列表 | [[chat]] |
 | 自主 | 触发 Browser/Collector/Twitter 等 | 自动化面板 | [[hands-skills]] |
-| 记忆 | 搜索历史、自动注入上下文 | 设置 > 语义记忆 | [[memory]] |
+| 记忆 | 搜索历史、自动注入上下文、身份信号提取 | 设置 > 语义记忆 | [[memory]] |
 | 配置 | 模型/API/工作区/安全存储 | 设置面板 (19 页) | [[development]] |
 | SaaS | 登录注册、订阅计费、Admin 管理 | SaaS 平台 / Admin 后台 | [[saas]] |
-| 管家 | 痛点积累、行业配置、简洁/专业模式 | 聊天面板 (默认模式) | [[butler]] |
+| 管家 | 痛点积累、行业配置、简洁/专业模式、跨会话身份、动态建议 | 聊天面板 (默认模式) | [[butler]] |
 | Pipeline | YAML 模板选择、配置、DAG 执行 | 工作流面板 | [[pipeline]] |
 | 安全 | JWT 认证、TOTP 2FA、操作审计 | 设置 > 安全存储 | [[security]] |
 | 数据 | PostgreSQL (42表) + SQLite/FTS5 (本地记忆) | — | [[data-model]] |
@@ -97,5 +97,7 @@ ZCLAW
 | Agent 创建失败 | [[chat]] | [[saas]] | 权限或持久化问题 |
 | Pipeline 执行卡住 | [[pipeline]] | [[middleware]] | DAG 循环 / 依赖缺失 |
 | Admin 页面 403 | [[saas]] | [[security]] | JWT 过期 / admin_guard 拦截 |
 | Agent 名字不记住 | [[butler]] | [[memory]] | soul.md 写入失败 / identity signal 未提取 |
 | 建议不个性化 | [[chat]] | [[butler]] | 4路上下文超时 / ExperienceExtractor 未初始化 |
 > 数字真相源: `docs/TRUTH.md` — 如有冲突以代码实际为准
--- a/wiki/log.md
+++ b/wiki/log.md
@@ -1,6 +1,6 @@
 ---
 title: 变更日志
-updated: 2026-04-22
+updated: 2026-04-24
 status: active
 tags: [log, history]
 ---
@@ -9,10 +9,55 @@ tags: [log, history]
 > Append-only 操作记录。格式: `## [日期] 类型 | 描述`
 ## [2026-04-24] fix(runtime+middleware) | 工具调用 P1/P2/P3 全面修复
 - **P1 流式工具并行**: 三阶段执行 (中间件预检→并行+串行分区→结果排序)，ReadOnly 工具 JoinSet+Semaphore(3)
 - **P2 OpenAI 驱动**: 参数解析失败不再静默替换为 `{}`，改为返回 `_parse_error`+`_raw_args` 让 LLM 自我修正
 - **P2 ToolOutputGuard**: 从关键词匹配改为 regex 精确匹配实际密钥值 (sk-xxx/AKIA/PEM 等)，消除误拦
 - **P2 ToolErrorMiddleware**: 失败计数器从全局 AtomicU32 改为 per-session HashMap，消除跨会话误触发
 - **P3 Gateway client**: 明确 tool_call/tool_result 的 onTool 回调语义约定 (output='' 为 start, input='' 为 end)
 - **测试**: 91 tests PASS, tsc --noEmit PASS
 ## [2026-04-24] fix(runtime) | 工具调用两个 P0 修复
 - **P0: after_tool_call 中间件从未调用**: 流式+非流式模式均添加 `middleware_chain.run_after_tool_call()` 调用，ToolErrorMiddleware 和 ToolOutputGuardMiddleware 的 after 逻辑现在生效
 - **P0: stream_errored 跳过所有工具**: 流式模式中 `stream_errored` 不再 `break 'outer`，改为区分完整工具（ToolUseEnd 已接收）和不完整工具；完整工具照常执行，不完整工具发送取消 ToolEnd 事件
 - **影响文件**: `loop_runner.rs`
 - **测试**: 91 tests PASS, 0 cargo warnings
 ## [2026-04-24] feat(artifact) | 产物系统优化完善
 - **MarkdownRenderer**: 从 StreamingText 提取共享 Markdown 渲染组件（react-markdown + remark-gfm），ArtifactPanel 复用
 - **ArtifactPanel**: 替换手写 30 行 MarkdownPreview → 完整 GFM 渲染（表格/代码块/列表/引用）；添加文件选择器下拉菜单
 - **数据源扩展**: 产物创建从 file_write 单工具 → file_write/str_replace/write_file/str_replace_editor；从 sendMessage 单路径 → sendMessage + initStreamListener 双路径
 - **持久化**: artifactStore 添加 zustand persist + IndexedDB (复用 idb-storage)，刷新后产物保留
 - **验证**: tsc --noEmit PASS, 343 vitest PASS
 ## [2026-04-24] perf | Hermes 高价值设计实施 Phase 1-4
 - **Phase 1**: Anthropic prompt caching — cache_control ephemeral + cache token tracking (CompletionResponse + StreamChunk)
 - **Phase 2A**: 并行工具执行 — ToolConcurrency 枚举 (ReadOnly/Exclusive/Interactive) + JoinSet + Semaphore(3) + AtomicU32
 - **Phase 2B**: 工具输出修剪 — prune_tool_outputs() (2000→500 chars) + 集成到 CompactionMiddleware
 - **Phase 3**: 错误分类+智能重试 — LlmErrorKind + ClassifiedLlmError + RetryDriver (jittered backoff) + CONTEXT_OVERFLOW recovery
 - **Phase 4**: 异步压缩+迭代摘要 — 30s 防抖 + cached fallback + previous_summary 迭代累积
 - **新增文件**: error_classifier.rs, retry_driver.rs
 - **验证**: 997 workspace tests PASS
 ## [2026-04-23] perf | 回复效率+建议生成并行化优化 (三部分)
 - **perf(src-tauri)**: identity prompt 缓存 (`LazyLock<RwLock<HashMap>>`) + `pre_conversation_hook` 并行化 (`tokio::join!`)
 - **perf(runtime)**: middleware `before_completion` 分波并行 — `parallel_safe()` trait + wave detection + `tokio::spawn`，5 层 safe 中间件可并行
 - **perf(desktop)**: suggestion context 预取 (sendMessage 时启动) + generateLLMSuggestions 与 memory extraction 解耦
 - **feat(desktop)**: suggestion prompt 重写 (1深入追问+1实用行动+1管家关怀) + 上下文窗口 6→20 条
 - **文件**: intelligence_hooks.rs, middleware.rs, 5 个 middleware 子模块, streamStore.ts, llm-service.ts
 - **验证**: cargo test --workspace --exclude zclaw-saas 0 fail, tsc --noEmit 0 error
 ## [2026-04-23] fix | Agent 命名检测重构+跨会话记忆修复+Agent tab 移除
 - **fix(desktop)**: `detectAgentNameSuggestion` 从 6 个固定正则改为 trigger+extract 两步法 (10 个 trigger)
 - **fix(desktop)**: 名字检测从 memory extraction 解耦 — 502 不再阻断面板刷新
 - **fix(src-tauri)**: `agent_update` 同步写入 soul.md — config.name → system prompt 断链修复
 ## [2026-04-23] feat | 动态建议智能化
 - **feat(src-tauri)**: 新增 `experience_find_relevant` Tauri 命令 + `ExperienceBrief` 结构 + OnceLock 单例
 - **feat(desktop)**: 新增 `suggestion-context.ts` — 4 路并行拉取智能上下文（用户画像/痛点/经验/技能匹配）
 - **feat(desktop)**: `streamStore.ts` createCompleteHandler 并行化 + generateLLMSuggestions 增强
 - **feat(desktop)**: suggestion prompt 改为混合型（2 续问 + 1 管家关怀）
 - **文件**: experience.rs, lib.rs, suggestion-context.ts, streamStore.ts, llm-service.ts
 - **refactor(desktop)**: 移除 Agent tab (简洁模式/专业模式)，清理 dead code (~280 行)
 - **验证**: cargo check 0 error, tsc --noEmit 0 error
--- a/wiki/middleware.md
+++ b/wiki/middleware.md
@@ -1,6 +1,6 @@
 ---
 title: 中间件链
-updated: 2026-04-22
+updated: 2026-04-23
 status: active
 tags: [module, middleware, runtime]
 ---
@@ -17,6 +17,7 @@ tags: [module, middleware, runtime]
 - **WHY 注册顺序 != 执行顺序**: `kernel/mod.rs` 中 14 次 `chain.register()` 的代码顺序与运行时顺序无关，chain 按 `priority()` 升序排列后执行。
 - **WHY 6 类 14 层**: 进化(70-79) -> 路由(80-99) -> 上下文(100-199) -> 能力(200-399) -> 安全(400-599) -> 遥测(600-799)，优先级范围即执行阶段。
 - **WHY Stop/Block/AbortLoop**: 细粒度流控 -- Stop 中断 LLM 循环，Block 阻止单次工具调用，AbortLoop 终止整个 Agent 循环。命中后跳过所有后续中间件。
 - **WHY 分波并行 (parallel_safe)**: `before_completion` 阶段，只修改 `system_prompt` 的中间件可声明 `parallel_safe() == true`，连续的 parallel-safe 中间件通过 `tokio::spawn` 并行执行，各自持有 `MiddlewareContext` clone，完成后合并 prompt 贡献。降低串行延迟 ~1-3s。
 ## 2. 关键文件 + 数据流
@@ -34,7 +35,9 @@ tags: [module, middleware, runtime]
 ```
 用户消息 -> AgentLoop
  -> chain.run_before_completion(ctx)
-    -> [按 priority 升序] 每层 middleware.before_completion()
+    -> [分波并行] 检测连续 parallel_safe 中间件
      -> Wave 并行 (2+ safe): tokio::spawn 各自 ctx.clone() → 合并 prompt
      -> 串行 (unsafe / 单个 safe): 逐个执行
    -> Continue: 下一层 | Stop(reason): 中断循环
  -> LLM 调用
  -> (工具调用时) chain.run_before_tool_call()
@@ -57,22 +60,22 @@ tags: [module, middleware, runtime]
 ### 14 层 Runtime 中间件
-| 优先级 | 中间件 | 文件 | 职责 | 注册条件 |
+| 优先级 | 中间件 | 文件 | 职责 | parallel_safe | 注册条件 |
-|--------|--------|------|------|----------|
+|--------|--------|------|------|---------------|----------|
-| @78 | EvolutionMiddleware | `evolution.rs` | 推送进化候选项到 system prompt | 始终 |
+| @78 | EvolutionMiddleware | `evolution.rs` | 推送进化候选项到 system prompt | ✅ | 始终 |
-| @80 | ButlerRouter | `butler_router.rs` | 语义技能路由 + system prompt 增强 + XML fencing | 始终 |
+| @80 | ButlerRouter | `butler_router.rs` | 语义技能路由 + system prompt 增强 + XML fencing | ✅ | 始终 |
-| @100 | Compaction | `compaction.rs` | 超阈值时压缩对话历史 | `compaction_threshold > 0` |
+| @100 | Compaction | `compaction.rs` | 超阈值时压缩对话历史 | ❌ | `compaction_threshold > 0` |
-| @150 | Memory | `memory.rs` | 对话后自动提取记忆 + 注入检索结果 | 始终 |
+| @150 | Memory | `memory.rs` | 对话后自动提取记忆 + 注入检索结果 | ✅ | 始终 |
-| @180 | Title | `title.rs` | 自动生成会话标题 | 始终 |
+| @180 | Title | `title.rs` | 自动生成会话标题 | ✅ | 始终 |
-| @200 | SkillIndex | `skill_index.rs` | 注入技能索引到 system prompt | `!skill_index.is_empty()` |
+| @200 | SkillIndex | `skill_index.rs` | 注入技能索引到 system prompt | ✅ | `!skill_index.is_empty()` |
-| @300 | DanglingTool | `dangling_tool.rs` | 修复缺失的工具调用结果 | 始终 |
+| @300 | DanglingTool | `dangling_tool.rs` | 修复缺失的工具调用结果 | ❌ | 始终 |
-| @350 | ToolError | `tool_error.rs` | 格式化工具错误供 LLM 恢复 | 始终 |
+| @350 | ToolError | `tool_error.rs` | 格式化工具错误供 LLM 恢复 | ❌ | 始终 |
-| @360 | ToolOutputGuard | `tool_output_guard.rs` | 工具输出安全检查 | 始终 |
+| @360 | ToolOutputGuard | `tool_output_guard.rs` | 工具输出安全检查 | ❌ | 始终 |
-| @400 | Guardrail | `guardrail.rs` | shell_exec/file_write/web_fetch 安全规则 | 始终 |
+| @400 | Guardrail | `guardrail.rs` | shell_exec/file_write/web_fetch 安全规则 | ❌ | 始终 |
-| @500 | LoopGuard | `loop_guard.rs` | 防止工具调用无限循环 | 始终 |
+| @500 | LoopGuard | `loop_guard.rs` | 防止工具调用无限循环 | ❌ | 始终 |
-| @550 | SubagentLimit | `subagent_limit.rs` | 限制并发子 agent | 始终 |
+| @550 | SubagentLimit | `subagent_limit.rs` | 限制并发子 agent | ❌ | 始终 |
-| @650 | TrajectoryRecorder | `trajectory_recorder.rs` | 轨迹记录 + 压缩 | 始终 |
+| @650 | TrajectoryRecorder | `trajectory_recorder.rs` | 轨迹记录 + 压缩 | ❌ | 始终 |
-| @700 | TokenCalibration | `token_calibration.rs` | Token 用量校准 | 始终 |
+| @700 | TokenCalibration | `token_calibration.rs` | Token 用量校准 | ❌ | 始终 |
 > 注册顺序 (代码) 与执行顺序 (priority) 不同。Chain 按 priority 升序排列后执行。
@@ -96,6 +99,8 @@ tags: [module, middleware, runtime]
 - Priority 升序: 0-999, 数值越小越先执行
 - 注册顺序 != 执行顺序; chain 按 priority 运行时排序
 - Stop/Block/AbortLoop 立即中断, 不执行后续中间件
 - parallel_safe 中间件只修改 system_prompt，不修改 messages，不返回 Stop
 - 分波合并: 并行 wave 中每个中间件 clone context，完成后按 base_prompt_len 截取增量合并
 ### 核心接口
@@ -103,6 +108,7 @@ tags: [module, middleware, runtime]
 trait AgentMiddleware: Send + Sync {
    fn name(&self) -> &str;
    fn priority(&self) -> i32 { 500 }
    fn parallel_safe(&self) -> bool { false }
    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision>;
    async fn before_tool_call(&self, ctx: &MiddlewareContext, tool_name: &str, tool_input: &Value) -> Result<ToolCallDecision>;
    async fn after_tool_call(&self, ctx: &mut MiddlewareContext, tool_name: &str, result: &Value) -> Result<()>;
@@ -129,8 +135,8 @@ trait AgentMiddleware: Send + Sync {
 | 日期 | 变更 | 影响 |
 |------|------|------|
 | 04-23 | 分波并行执行: parallel_safe() + wave detection + tokio::spawn | before_completion 阶段 5 层 safe 中间件可并行，延迟降低 ~1-3s |
 | 04-22 | DataMasking 中间件移除 | 14->14 层 (替换为无), 减少 1 层无收益处理 |
 | 04-22 | 跨会话记忆修复 | Memory 中间件去重+跨会话注入修复 |
 | 04-22 | Wiki 一致性校准 | 数字与代码验证对齐 |
 | 04-21 | Embedding 接通 | SkillIndex 路由 TF-IDF->Embedding+LLM fallback |
 | 04-15 | Heartbeat 统一健康系统 | TrajectoryRecorder 痛点感知增强 |
Author	SHA1	Message	Date
iven	7b0d452845	fix(tool): Windows UNC 路径规范 — PathValidator 路径比较一致性 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - with_workspace() 对 workspace_root 做 canonicalize，确保与 resolve_and_validate 产出的 canonical 路径格式一致 - 新增 normalize_windows_path() 剥离 \?\ 前缀，解决 Windows 上 starts_with 比较失败问题 - check_blocked/check_allowed 统一使用规范化路径比较	2026-04-24 17:02:24 +08:00
iven	855c89e8fb	fix(tool): 相对路径文件写入失败 — PathValidator 先基于 workspace 解析 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details 当 file_write 收到相对路径如 test_tool.txt 时，PathValidator 的 resolve_and_validate 尝试对空父目录 canonicalize 导致失败。修复：相对路径先基于 workspace_root 解析为绝对路径，再进行安全校验。	2026-04-24 16:02:09 +08:00
iven	3eb098f020	fix(runtime): 工具调用 P1/P2/P3 全面修复 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details P1: 流式模式工具并行执行 - 三阶段执行: Phase 1 中间件预检(serial) → Phase 2 并行+串行分区 → Phase 3 结果排序 - ReadOnly 工具用 JoinSet + Semaphore(3) 并行，Exclusive/Interactive 串行 - 与非流式模式保持一致的执行策略 P2: OpenAI 驱动工具参数解析 - 解析失败不再静默替换为 {}，改为返回 _parse_error + _raw_args - 让 LLM 和工具能感知参数问题并自我修正 P2: ToolOutputGuard 精确匹配 - 从 to_lowercase() 关键词匹配改为 regex 精确匹配实际密钥值 - 检测 sk-xxx(20+), AKIA(16), PEM 私钥, key=value 模式 - 移除 "system:", "you are now" 等过于宽泛的注入检测 - 消除合法内容包含 "password" 等词汇时的误拦 P2: ToolErrorMiddleware per-session 计数 - 从全局 AtomicU32 改为 Mutex<HashMap<session_id, u32>> - 每个会话独立跟踪连续失败次数，消除跨会话误触发 AbortLoop P3: Gateway client onTool 回调语义 - 明确 tool_call 的 output 始终为空串 (start 信号) - 添加注释说明 start/end 语义约定	2026-04-24 12:56:07 +08:00
iven	c12b64150b	fix(runtime): 工具调用 P0 修复 — after_tool_call 接入 + stream_errored 工具抢救 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details P0-1: after_tool_call 中间件从未被调用 - 流式模式(run_streaming)和非流式模式(run)均添加 middleware_chain.run_after_tool_call() - ToolErrorMiddleware 错误计数恢复逻辑现在生效 - ToolOutputGuardMiddleware 敏感信息检测现在生效 P0-2: stream_errored 跳过所有工具执行 - 新增 completed_tool_ids 跟踪哪些工具已收到完整 ToolUseEnd - 流式错误时区分完整工具和不完整工具 - 完整工具照常执行（产物创建等不受影响） - 不完整工具发送取消 ToolEnd 事件（前端不再卡"执行中"） - 工具执行后若 stream_errored，break outer 阻止无效 LLM 循环参考文档: - docs/references/zclaw-toolcall-issues.md (10项问题分析) - docs/references/deerflow-toolcall-reference.md (DeerFlow工具调用完整参考)	2026-04-24 12:20:14 +08:00
iven	4c31471cd6	feat(artifact): 产物系统优化 — 共享渲染 + 数据源扩展 + 持久化 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - MarkdownRenderer: 从 StreamingText 提取共享 react-markdown + remark-gfm 组件 - ArtifactPanel: 替换手写 MarkdownPreview 为完整 GFM 渲染，添加文件选择器下拉菜单 - 数据源: file_write/str_replace 双工具 + sendMessage/initStreamListener 双路径 - 持久化: artifactStore 添加 zustand persist + IndexedDB (复用 idb-storage)	2026-04-24 10:59:27 +08:00
iven	b60b96225d	docs(wiki): Hermes Phase 1-4 wiki 同步 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - hands-skills: 新增 concurrency() 声明要求不变量 - log: 追加 Hermes Phase 1-4 变更记录 - index: 更新日期	2026-04-24 08:54:48 +08:00
iven	06e93a21af	perf(compaction): Hermes Phase 4 — debounce + async cache + iterative summary Step 4.1: Compaction debounce - 30s cooldown between consecutive compactions - Minimum 3 rounds (6 messages) since last compaction before re-triggering - AtomicU64 lock-free state tracking Step 4.2: Async compaction with cached fallback - During cooldown, use cached result from previous compaction - RwLock<Option<Vec<Message>>> for thread-safe cache access - Cache updated after each successful compaction Step 4.3: Iterative summary - generate_summary/generate_llm_summary accept previous_summary parameter - LLM prompt includes previous summary for cumulative context preservation - Rule-based summary carries forward [上轮摘要保留] section - previous_summary extracted from leading System messages in message history	2026-04-24 08:53:37 +08:00
iven	9060935401	perf(runtime): Hermes Phase 1-3 — prompt caching + parallel tools + smart retry Phase 1: Anthropic prompt caching - Add cache_control ephemeral on system prompt blocks - Track cache_creation/cache_read tokens in CompletionResponse + StreamChunk Phase 2A: Parallel tool execution - Add ToolConcurrency enum (ReadOnly/Exclusive/Interactive) - JoinSet + Semaphore(3) for bounded parallel tool calls - 7 tools annotated with correct concurrency level - AtomicU32 for lock-free failure tracking in ToolErrorMiddleware Phase 2B: Tool output pruning - prune_tool_outputs() trims old ToolResult > 2000 chars to 500 chars - Integrated into CompactionMiddleware before token estimation Phase 3: Error classification + smart retry - LlmErrorKind + ClassifiedLlmError for structured error mapping - RetryDriver decorator with jittered exponential backoff - Kernel wraps all LLM calls with RetryDriver - CONTEXT_OVERFLOW recovery triggers emergency compaction in loop_runner	2026-04-24 08:39:56 +08:00
iven	6d6673bf5b	fix(suggest): 建议默认使用中文，不混入英文词汇 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details 规则 7 从"使用与用户相同的语言"改为明确要求中文优先，英文术语需翻译（如 workflow→工作流）。示例同步更新为纯中文表达。	2026-04-24 00:01:22 +08:00
iven	15f84bf8c1	fix(suggest): 建议芯片去掉称谓，避免用户发送时角色错位 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details suggestion prompt 新增规则：建议会被用户直接点击发送，因此不包含"领导/老板/老师"等称谓，改用无主语句式。同步更新示例和关怀模板中的表达方式。	2026-04-23 23:53:07 +08:00
iven	9a313e3c92	docs(wiki): 回复效率+建议并行化优化 wiki 同步 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - middleware.md: 分波并行执行设计决策 + parallel_safe 标注 + 不变量 + 执行流 - chat.md: suggestion prefetch + 解耦 memory + prompt 重写 - log.md: 追加变更记录 - CLAUDE.md: §13 架构快照 + 最近变更	2026-04-23 23:45:28 +08:00
iven	ee5611a2f8	perf(middleware): before_completion 分波并行执行 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - MiddlewareContext 加 Clone derive, 支持并行克隆上下文 - AgentMiddleware trait 新增 parallel_safe() 默认方法 (false) - MiddlewareChain::run_before_completion 改为分波执行: 连续 2+ 个 parallel_safe 中间件用 tokio::spawn 并发执行, 各自独立修改 system_prompt, 执行完成后合并贡献 - 5 个只修改 system_prompt 的中间件标记 parallel_safe: evolution(P78), butler_router(P80), memory(P150), title(P180), skill_index(P200) - 非 parallel_safe 中间件 (compaction, dangling_tool 等) 保持串行分波效果: Wave 1: evolution + butler_router → 并行 (省 ~0.5-1s) Wave 2: compaction → 串行 (可能修改 messages) Wave 3: memory + title + skill_index → 并行 (省 ~0.5-2s) Wave 4+: 工具/安全中间件 → 串行	2026-04-23 23:37:57 +08:00
iven	5cf7adff69	perf(chat): 回复效率 + 建议生成并行化优化 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - identity prompt 缓存: LazyLock<RwLock<HashMap>> 缓存已构建的 identity prompt, soul.md 更新时自动失效, 省去每次请求的 mutex + 磁盘 I/O (~0.5-1s) - pre-conversation hook 并行化: tokio::join! 并行执行 identity build 和 continuity context 查询, 不再串行等待 (~1-2s) - suggestion context 预取: 流式回复期间提前启动 fetchSuggestionContext, 回复结束时 context 已就绪 (~0.5-1s) - 建议生成与 memory extraction 解耦: generateLLMSuggestions 不再等待 memory extraction LLM 调用完成, 独立启动 (~3-8s) - Path B (agent stream) 补全 context: lifecycle:end 路径使用预取 context, 修复零个性化问题 - 上下文窗口扩展: slice(-6) → slice(-20), 每条截断 200 字符 - suggestion prompt 重写: 1 深入追问 + 1 实用行动 + 1 管家关怀, 明确角色定位, 禁止空泛建议	2026-04-23 23:13:20 +08:00
iven	10497362bb	fix(chat): 澄清问题卡片 UX 优化 — 去悬空引用 + 默认展开 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - 提示词增加 ask_clarification 引用规则，避免 LLM 在文本中生成 "以下信息"/"比如："等悬空引用短语 - 新增 stripDanglingClarificationRef 前端安全网，当消息包含 ask_clarification 工具调用时自动移除末尾悬空引用 - 澄清卡片默认展开，让用户直接看到选项无需额外点击	2026-04-23 19:21:10 +08:00
iven	d7dbdf8600	docs(wiki): 动态建议智能化变更日志 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details	2026-04-23 18:01:44 +08:00
iven	8c25b20fe2	feat(suggest): 更新 suggestion prompt 为混合型（2续问+1管家关怀） - llm-service.ts: HARDCODED_PROMPTS.suggestions.system 改为混合型 - 2条对话续问 + 1条管家关怀（痛点回访/经验复用/技能推荐） - streamStore.ts: LLM_PROMPTS_SYSTEM 改为引用 llm-service 导出 - 单一真相源，OTA 更新时自动生效	2026-04-23 17:58:58 +08:00
iven	87110ffdff	feat(suggest): 改造 createCompleteHandler 并行化 + generateLLMSuggestions 增强 - createCompleteHandler: 记忆提取+上下文拉取 Promise.all 并行 - generateLLMSuggestions: 新增 SuggestionContext 参数，构建增强 user message - llmSuggestViaSaaS: 删除 2s 人为延迟（并行化后不再需要） - 变量重命名 context→conversationContext 避免与 SuggestionContext 冲突	2026-04-23 17:57:17 +08:00
iven	980a8135fa	feat(suggest): 新增 fetchSuggestionContext 聚合函数 + 类型定义 - 4 路并行拉取智能上下文：用户画像、痛点、经验、技能匹配 - 500ms 超时保护 + 静默降级（失败不阻断建议生成） - Tauri 不可用时直接返回空上下文	2026-04-23 17:54:57 +08:00
iven	e9e7ffd609	feat(intelligence): 新增 experience_find_relevant Tauri 命令 + ExperienceBrief - 新增 ExperienceBrief 结构（痛点模式+方案摘要+复用次数） - OnceLock 单例 + init_experience_extractor() 启动初始化 - experience_find_relevant 命令：按 agent_id + query 检索相关经验 - 注册到 invoke_handler + setup 阶段优雅降级初始化 - 新增序列化测试（10 tests PASS）	2026-04-23 17:52:33 +08:00