fix(tool): Windows UNC 路径规范 — PathValidator 路径比较一致性

- with_workspace() 对 workspace_root 做 canonicalize，确保与 resolve_and_validate 产出的 canonical 路径格式一致 - 新增 normalize_windows_path() 剥离 \?\ 前缀，解决 Windows 上 starts_with 比较失败问题 - check_blocked/check_allowed 统一使用规范化路径比较
fix(tool): 相对路径文件写入失败 — PathValidator 先基于 workspace 解析
2026-04-24 17:02:24 +08:00 · 2026-04-24 16:02:09 +08:00 · 2026-04-24 12:56:07 +08:00 · 2026-04-24 12:20:14 +08:00 · 2026-04-24 10:59:27 +08:00 · 2026-04-24 08:54:48 +08:00
56 changed files with 2838 additions and 755 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -165,10 +165,25 @@ desktop/src-tauri    (→ kernel, skills, hands, protocols)
 2. **自动验证** — `cargo check` / `cargo test` / `tsc --noEmit` / `vitest run` 必须通过
 3. **回归测试** — 跑受影响 crate 的全量测试，确认无回归

-#### 阶段 4: 提交 + 同步（立即，不积压）
+#### 阶段 4: Wiki 同步 + 提交（立即，不积压）

-1. **提交推送** — 按 §11 规范提交，**立即 `git push`**
-2. **文档同步** — 按 §8.3 检查并更新相关文档，提交并推送
+**Wiki 同步评估（硬门槛，不可跳过）**
+
+代码改完后、提交前，逐条回答以下问题。任何一条为"是"→ 必须更新对应 wiki 页面：
+
+| 评估问题 | 为"是"时更新 |
+|----------|-------------|
+| 这个改动修复或引入了 bug？ | 对应模块页"活跃问题+陷阱"节 + `wiki/known-issues.md` |
+| 这个改动改变了某个模块的行为或设计理由？ | 对应模块页"设计决策"节 |
+| 这个改动增删了文件或改变了目录结构？ | 对应模块页"关键文件"表 |
+| 这个改动影响了跨模块接口（谁调谁、参数形状、触发时机）？ | 涉及双方的"集成契约"表 |
+| 这个改动涉及一个必须始终成立的约束？ | 对应模块页"代码逻辑"节的 ⚡ 不变量 |
+| 这个改动改变了功能链路（前端→后端的完整路径）？ | `wiki/feature-map.md` 索引表 |
+| 这个改动改变了关键数字（命令数/Store数/测试数等）？ | `wiki/index.md` 关键数字表 + `docs/TRUTH.md` |
+
+全部回答完后，无论是否有更新，都追加一条到 `wiki/log.md` + 更新模块页"变更记录"节（保持 5 条）。
+
+**提交推送** — 按 §11 规范提交，**立即 `git push`**。详细文档同步规则见 §8.3。

 **铁律：不允许"等一下再提交"或"最后一起推送"。每个独立工作单元完成后立即推送。**

@@ -374,34 +389,44 @@ docs/

 每次完成功能实现、架构变更、问题修复后，**必须立即执行以下收尾**：

-#### 步骤 A：文档同步（代码提交前）
+#### 步骤 A：Wiki 同步（最高优先，代码提交前）

-检查以下文档是否需要更新，有变更则立即修改：
+> **为什么 wiki 排第一**：wiki 是新 AI 会话的启动燃料。如果 wiki 与代码不一致，后续所有会话都会基于错误上下文工作，错误会积累放大。
+
+在 §3.3 阶段 4 的评估表基础上，执行具体更新：
+
+| 触发事件 | 更新目标 | 更新内容 |
+|----------|---------|---------|
+| 修复 bug | 对应模块页"活跃问题+陷阱" | 修复→移除条目；新增→添加条目 |
+| 架构/设计变更 | 对应模块页"设计决策" | WHY 变了 + 新的权衡取舍 |
+| 文件增删/移动 | 对应模块页"关键文件"表 | 更新文件列表 |
+| 跨模块接口变化 | **涉及双方**的"集成契约"表 | 方向/接口/触发时机 |
+| 发现新的不变量 | 对应模块页"代码逻辑"节 | ⚡ 标记 + 一句话描述 |
+| 功能链路变化 | `wiki/feature-map.md` | 更新索引表对应行 |
+| 关键数字变化 | `wiki/index.md` + `docs/TRUTH.md` | 更新数字 + 验证命令 |
+| **每次收尾** | `wiki/log.md` + 模块页"变更记录" | 追加日志条目 + 变更记录保持 5 条 |
+
+**wiki 更新原则**：
+- 只记录代码不能告诉你的东西（WHY、跨模块关系、不变量、历史教训）
+- 模块页控制在 100-200 行，超出则归档到 `wiki/archive/`
+- 同一信息只出现在一个页面（单一真相源），其他页面只引用
+
+#### 步骤 B：其他文档同步

 1. **CLAUDE.md** — 项目结构、技术栈、工作流程、命令变化时
-2. **CLAUDE.md §13 架构快照** — 涉及子系统变更时，更新 `<!-- ARCH-SNAPSHOT-START/END -->` 标记区域（可执行 `/sync-arch` 技能自动分析）
+2. **CLAUDE.md §13 架构快照** — 涉及子系统变更时（可执行 `/sync-arch` 技能自动分析）
 3. **docs/ARCHITECTURE_BRIEF.md** — 架构决策或关键组件变更时
 4. **docs/features/** — 功能状态变化时
 5. **docs/knowledge-base/** — 新的排查经验或配置说明
-6. **wiki/** — 编译后知识库维护（按触发规则更新对应页面，每页统一 5 节: 设计决策 / 关键文件+集成契约 / 代码逻辑 / 活跃问题+陷阱 / 变更记录）：
-   - 修复 bug → 更新对应模块页"活跃问题"节 + `wiki/known-issues.md` 索引
-   - 架构变更 → 更新对应模块页"设计决策"节
-   - 文件结构变化 → 更新对应模块页"关键文件"表
-   - 跨模块接口变化 → 更新对应模块页"集成契约"表
-   - 新增不变量发现 → 更新对应模块页"代码逻辑"节的 ⚡ 标记项
-   - 功能链路变化 → 更新 `wiki/feature-map.md` 索引表
-   - 数字变化 → 更新 `wiki/index.md` 关键数字表 + `docs/TRUTH.md`
-   - 每次更新 → 在 `wiki/log.md` 追加一条记录 + 模块页"变更记录"节更新最近 5 条
-6. **docs/TRUTH.md** — 数字（命令数、Store 数、crates 数等）变化时

-#### 步骤 B：提交（按逻辑分组）
+#### 步骤 C：提交（按逻辑分组）

 ```
 代码变更 → 一个或多个逻辑提交
 文档变更 → 独立提交（如果和代码分开更清晰）
 ```

-#### 步骤 C：推送（立即）
+#### 步骤 D：推送（立即）

 ```
 git push
@@ -559,7 +584,7 @@ refactor(store): 统一 Store 数据获取方式
 ***

 <!-- ARCH-SNAPSHOT-START -->
-<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-15 -->
+<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-23 -->

 ## 13. 当前架构快照

@@ -567,51 +592,53 @@ refactor(store): 统一 Store 数据获取方式

 | 子系统 | 状态 | 最新变更 |
 |--------|------|----------|
-| 管家模式 (Butler) | ✅ 活跃 | 04-12 行业配置4行业 + 跨会话连续性 + <butler-context> XML fencing |
-| Hermes 管线 | ✅ 活跃 | 04-12 触发信号持久化 + 经验行业维度 + 注入格式优化 |
+| 管家模式 (Butler) | ✅ 活跃 | 04-23 跨会话身份(soul.md) + 动态建议(4路并行LLM驱动) + Agent tab 移除 |
+| Hermes 管线 | ✅ 活跃 | 04-23 experience_find_relevant Tauri 命令 + ExperienceBrief + OnceLock 单例 |
 | Intelligence Heartbeat | ✅ 活跃 | 04-15 统一健康快照 (health_snapshot.rs) + HeartbeatManager 重构 + HealthPanel 前端 |
-| 聊天流 (ChatStream) | ✅ 稳定 | 04-02 ChatStore 拆分为 4 Store (stream/conversation/message/chat) |
-| 记忆管道 (Memory) | ✅ 稳定 | 04-17 E2E 验证: 存储+FTS5+TF-IDF+注入闭环，去重+跨会话注入已修复 |
+| 聊天流 (ChatStream) | ✅ 活跃 | 04-23 LLM 动态建议(替换硬编码) + 澄清卡片 UX 优化 |
+| 记忆管道 (Memory) | ✅ 活跃 | 04-23 身份信号提取(agent_name/user_name) + ProfileSignals 增强 |
 | SaaS 认证 (Auth) | ✅ 稳定 | Token池 RPM/TPM 轮换 + JWT password_version 失效机制 |
-| Pipeline DSL | ✅ 稳定 | 04-01 17 个 YAML 模板 + DAG 执行器 |
-| Hands 系统 | ✅ 稳定 | 7 注册 (6 HAND.toml + _reminder)，Whiteboard/Slideshow/Speech 开发中 |
+| Pipeline DSL | ✅ 稳定 | 04-01 18 个 YAML 模板 + DAG 执行器 |
+| Hands 系统 | ✅ 稳定 | 7 注册 (6 HAND.toml + _reminder)，Whiteboard/Slideshow/Speech 已删除 |
 | 技能系统 (Skills) | ✅ 稳定 | 75 个 SKILL.md + 语义路由 |
-| 中间件链 | ✅ 稳定 | 13 层 (ButlerRouter@80, Compaction@100, Memory@150, Title@180, SkillIndex@200, DanglingTool@300, ToolError@350, ToolOutputGuard@360, Guardrail@400, LoopGuard@500, SubagentLimit@550, TrajectoryRecorder@650, TokenCalibration@700) |
+| 中间件链 | ✅ 稳定 | 14 层 + 分波并行 (Evolution@78✅, ButlerRouter@80✅, Compaction@100, Memory@150✅, Title@180✅, SkillIndex@200✅, DanglingTool@300, ToolError@350, ToolOutputGuard@360, Guardrail@400, LoopGuard@500, SubagentLimit@550, TrajectoryRecorder@650, TokenCalibration@700) — ✅=parallel_safe |

 ### 关键架构模式

 - **Hermes 管线**: 4模块闭环 — ExperienceStore(FTS5经验存取) + UserProfiler(结构化用户画像) + NlScheduleParser(中文时间→cron) + TrajectoryRecorder+Compressor(轨迹记录压缩)。通过中间件链+intelligence hooks调用
- **管家模式**: 双模式UI (默认简洁/解锁专业) + ButlerRouter 动态行业关键词(4内置+自定义) + <butler-context> XML fencing注入 + 跨会话连续性(痛点回访+经验检索) + 触发信号持久化(VikingStorage) + 冷启动4阶段hook
- **聊天流**: 3种实现 → GatewayClient(WebSocket) / KernelClient(Tauri Event) / SaaSRelay(SSE) + 5min超时守护。详见 [ARCHITECTURE_BRIEF.md](docs/ARCHITECTURE_BRIEF.md)
+- **管家模式**: 双模式UI (默认简洁/解锁专业) + ButlerRouter 动态行业关键词(4内置+自定义) + <butler-context> XML fencing注入 + 跨会话连续性(痛点回访+经验检索) + 触发信号持久化(VikingStorage) + 冷启动4阶段hook + 跨会话身份(soul.md) + 动态建议(4路并行LLM驱动2续问+1关怀)
+- **聊天流**: 3种实现 → GatewayClient(WebSocket) / KernelClient(Tauri Event) / SaaSRelay(SSE) + 5min超时守护。动态建议: prefetch context + generateLLMSuggestions(1追问+1行动+1关怀) 与 memory extraction 解耦。详见 [ARCHITECTURE_BRIEF.md](docs/ARCHITECTURE_BRIEF.md)
 - **客户端路由**: `getClient()` 4分支决策树 → Admin路由 / SaaS Relay(可降级到本地) / Local Kernel / External Gateway
 - **SaaS 认证**: JWT→OS keyring 存储 + HttpOnly cookie + Token池 RPM/TPM 限流轮换 + SaaS unreachable 自动降级
- **记忆闭环**: 对话→extraction_adapter→FTS5全文+TF-IDF权重→检索→注入系统提示（E2E 04-17 验证通过，去重+跨会话注入已修复）
+- **记忆闭环**: 对话→extraction_adapter→FTS5全文+TF-IDF权重→检索→注入系统提示 + 身份信号提取(agent_name/user_name)→VikingStorage→soul.md→跨会话名字记忆
 - **LLM 驱动**: 4 Rust Driver (Anthropic/OpenAI/Gemini/Local) + 国内兼容 (DeepSeek/Qwen/Moonshot 通过 base_url)

 ### 最近变更

-1. [04-21] Embedding 接通 + 自学习自动化 A线+B线: 记忆检索Embedding(GrowthIntegration→MemoryRetriever→SemanticScorer) + Skill路由Embedding+LLM Fallback(替换new_tf_idf_only) + evolution_bridge(SkillCandidate→SkillManifest) + generate_and_register_skill()全链路 + EvolutionMiddleware双模式(auto/suggest) + QualityGate加固(长度/标题/置信度上限)。验证: 934 tests PASS
-2. [04-21] Phase 0+1 突破之路 8 项基础链路修复: 经验积累覆盖修复(reuse_count累积) + Skill工具调用桥接(complete_with_tools) + Hand字段映射(runId) + Heartbeat痛点感知 + Browser委托消息 + 跨会话检索增强(IdentityRecall 26→43模式+弱身份fallback) + Twitter凭据持久化。验证: 912 tests PASS
-2. [04-17] 全系统 E2E 测试 129 链路: 82 PASS / 20 PARTIAL / 1 FAIL / 26 SKIP，有效通过率 79.1%。7 项 Bug 修复 (Dashboard 404/记忆去重/记忆注入/invoice_id/Prompt版本/agent隔离/行业字段)
-2. [04-16] 3 项 P0 修复 + 5 项 E2E Bug 修复 + Agent 面板刷新 + TRUTH.md 数字校准
-3. [04-15] Heartbeat 统一健康系统: health_snapshot.rs 统一收集器(LLM连接/记忆/会话/系统资源) + heartbeat.rs HeartbeatManager 重构 + HealthPanel.tsx 前端面板 + Tauri 命令 182→183 + intelligence 模块 15→16 文件 + 删除 intelligence-client/ 9 废弃文件
-4. [04-12] 行业配置+管家主动性 全栈 5 Phase: 行业数据模型+4内置配置+ButlerRouter动态关键词+触发信号+Tauri加载+Admin管理页面+跨会话连续性+XML fencing注入格式
-5. [04-09] Hermes Intelligence Pipeline 4 Chunk: ExperienceStore+Extractor, UserProfileStore+Profiler, NlScheduleParser, TrajectoryRecorder+Compressor (684 tests, 0 failed)
-6. [04-09] 管家模式6交付物完成: ButlerRouter + 冷启动 + 简洁模式UI + 桥测试 + 发布文档
+1. [04-23] 回复效率+建议生成并行化: identity prompt 缓存 + pre-hook 并行(tokio::join!) + middleware 分波并行(parallel_safe, 5层✅) + suggestion context 预取 + 建议与 memory 解耦 + prompt 重写(1追问+1行动+1关怀)
+2. [04-23] 动态建议智能化: fetchSuggestionContext 4路并行(用户画像/痛点/经验/技能匹配) + generateLLMSuggestions 混合型 prompt (2续问+1管家关怀) + experience_find_relevant Tauri 命令 + ExperienceBrief
+3. [04-23] 跨会话身份: detectAgentNameSuggestion trigger+extract 两步法(10 trigger) + ProfileSignals agent_name/user_name + soul.md 写回 + Agent tab 移除 (~280 行 dead code 清理)
+4. [04-22] Wiki 全面重构: 5节模板+集成契约+症状导航+归档压缩，净减 ~1,200 行
+4. [04-22] 跨会话记忆断裂修复 + DataMasking 中间件移除 + 搜索功能修复(多引擎+质量过滤+SSE行缓冲)
+5. [04-21] Embedding 接通 + 自学习自动化 A线+B线 + Phase 0+1 突破之路 8 项链路修复。验证: 934 tests PASS
+6. [04-20] 50 轮功能链路审计 7 项断链修复 (42/50 = 84% 通过率)
+7. [04-17] 全系统 E2E 测试 129 链路: 82 PASS / 20 PARTIAL / 1 FAIL / 26 SKIP，有效通过率 79.1%
+
+<!-- ARCH-SNAPSHOT-END -->

 <!-- ARCH-SNAPSHOT-END -->

 <!-- ANTI-PATTERN-START -->
-<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-09 -->
+<!-- 此区域由 auto-sync 自动更新，请勿手动编辑。更新时间: 2026-04-23 -->

 ## 14. AI 协作注意事项

 ### 反模式警告

- ❌ **不要**建议新增 SaaS API 端点 — 已有 140 个，稳定化约束禁止新增
+- ❌ **不要**建议新增 SaaS API 端点 — 已有 137 个，稳定化约束禁止新增
 - ❌ **不要**忽略管家模式 — 已上线且为默认模式，所有聊天经过 ButlerRouter
 - ❌ **不要**假设 Tauri 直连 LLM — 实际通过 SaaS Token 池中转，SaaS unreachable 时降级到本地 Kernel
- ❌ **不要**建议从零实现已有能力 — 先查 Hand(9个)/Skill(75个)/Pipeline(17模板) 现有库
+- ❌ **不要**建议从零实现已有能力 — 先查 Hand(7注册)/Skill(75个)/Pipeline(18模板) 现有库
 - ❌ **不要**在 CLAUDE.md 以外创建项目级配置或规则文件 — 单一入口原则

 ### 场景化指令
@@ -620,6 +647,75 @@ refactor(store): 统一 Store 数据获取方式
 - 当遇到**认证相关** → 记住 Tauri 模式用 OS keyring 存 JWT，SaaS 模式用 HttpOnly cookie
 - 当遇到**新功能建议** → 先查 [TRUTH.md](docs/TRUTH.md) 确认可用能力清单，避免重复建设
 - 当遇到**记忆/上下文相关** → 记住闭环已接通: FTS5+TF-IDF+embedding，不是空壳
- 当遇到**管家/Butler** → 管家模式是默认模式，ButlerRouter 在中间件链中做关键词分类+system prompt 增强
+- 当遇到**管家/Butler** → 管家模式是默认模式，ButlerRouter 在中间件链中做关键词分类+system prompt 增强。跨会话身份走 soul.md，动态建议走 4 路并行上下文+LLM

 <!-- ANTI-PATTERN-END -->
+
+***
+
+## 15. Karpathy 编码原则
+
+> 源自 Andrej Karpathy 对 LLM 编码问题的观察。偏向谨慎而非速度，简单任务可灵活判断。
+
+### 15.1 Think Before Coding
+
+**Don't assume. Don't hide confusion. Surface tradeoffs.**
+
+- State assumptions explicitly. If uncertain, ask.
+- If multiple interpretations exist, present them — don't pick silently.
+- If a simpler approach exists, say so. Push back when warranted.
+- If something is unclear, stop. Name what's confusing. Ask.
+
+### 15.2 Simplicity First
+
+**Minimum code that solves the problem. Nothing speculative.**
+
+- No features beyond what was asked.
+- No abstractions for single-use code.
+- No "flexibility" or "configurability" that wasn't requested.
+- No error handling for impossible scenarios.
+- If you write 200 lines and it could be 50, rewrite it.
+
+Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
+
+### 15.3 Surgical Changes
+
+**Touch only what you must. Clean up only your own mess.**
+
+When editing existing code:
+
+- Don't "improve" adjacent code, comments, or formatting.
+- Don't refactor things that aren't broken.
+- Match existing style, even if you'd do it differently.
+- If you notice unrelated dead code, mention it — don't delete it.
+
+When your changes create orphans:
+
+- Remove imports/variables/functions that YOUR changes made unused.
+- Don't remove pre-existing dead code unless asked.
+
+The test: Every changed line should trace directly to the user's request.
+
+### 15.4 Goal-Driven Execution
+
+**Define success criteria. Loop until verified.**
+
+Transform tasks into verifiable goals:
+
+- "Add validation" → "Write tests for invalid inputs, then make them pass"
+- "Fix the bug" → "Write a test that reproduces it, then make it pass"
+- "Refactor X" → "Ensure tests pass before and after"
+
+For multi-step tasks, state a brief plan:
+
+```
+1. [Step] → verify: [check]
+2. [Step] → verify: [check]
+3. [Step] → verify: [check]
+```
+
+Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
+
+---
+
+**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
--- a/crates/zclaw-kernel/src/kernel/messaging.rs
+++ b/crates/zclaw-kernel/src/kernel/messaging.rs
@@ -117,7 +117,9 @@ impl Kernel {
    }
 }

-use zclaw_runtime::{AgentLoop, tool::builtin::PathValidator};
+use std::sync::Arc;
+use zclaw_runtime::{AgentLoop, LlmDriver, tool::builtin::PathValidator};
+use zclaw_runtime::driver::{RetryDriver, RetryConfig};

 use super::Kernel;
 use super::super::MessageResponse;
@@ -161,9 +163,12 @@ impl Kernel {
        let subagent_enabled = chat_mode.as_ref().and_then(|m| m.subagent_enabled).unwrap_or(false);
        let tools = self.create_tool_registry(subagent_enabled);
        self.skill_executor.set_tool_registry(tools.clone());
+        let driver: Arc<dyn LlmDriver> = Arc::new(
+            RetryDriver::new(self.driver.clone(), RetryConfig::default())
+        );
        let mut loop_runner = AgentLoop::new(
            *agent_id,
-            self.driver.clone(),
+            driver,
            tools,
            self.memory.clone(),
        )
@@ -275,9 +280,12 @@ impl Kernel {
        let subagent_enabled = chat_mode.as_ref().and_then(|m| m.subagent_enabled).unwrap_or(false);
        let tools = self.create_tool_registry(subagent_enabled);
        self.skill_executor.set_tool_registry(tools.clone());
+        let driver: Arc<dyn LlmDriver> = Arc::new(
+            RetryDriver::new(self.driver.clone(), RetryConfig::default())
+        );
        let mut loop_runner = AgentLoop::new(
            *agent_id,
-            self.driver.clone(),
+            driver,
            tools,
            self.memory.clone(),
        )
@@ -426,6 +434,7 @@ impl Kernel {
        prompt.push_str("- Provide clear options when possible\n");
        prompt.push_str("- Include brief context about why you're asking\n");
        prompt.push_str("- After receiving clarification, proceed immediately\n");
+        prompt.push_str("- CRITICAL: When calling ask_clarification, do NOT repeat the options in your text response. The options will be shown in a dedicated card above your reply. Simply greet the user and briefly explain why you need clarification — avoid phrases like \"以下信息\" or \"the following options\" that imply a list follows in your text\n");

        prompt
    }
--- a/crates/zclaw-kernel/tests/hand_chain.rs
+++ b/crates/zclaw-kernel/tests/hand_chain.rs
@@ -31,6 +31,8 @@ async fn seam_hand_tool_routing() {
                input_tokens: 10,
                output_tokens: 20,
                stop_reason: "tool_use".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ])
        // Second stream: final text after tool executes
@@ -40,6 +42,8 @@ async fn seam_hand_tool_routing() {
                input_tokens: 10,
                output_tokens: 5,
                stop_reason: "end_turn".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ]);

@@ -105,6 +109,8 @@ async fn seam_hand_execution_callback() {
                input_tokens: 10,
                output_tokens: 5,
                stop_reason: "tool_use".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ])
        .with_stream_chunks(vec![
@@ -113,6 +119,8 @@ async fn seam_hand_execution_callback() {
                input_tokens: 5,
                output_tokens: 1,
                stop_reason: "end_turn".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ]);

@@ -173,6 +181,8 @@ async fn seam_generic_tool_routing() {
                input_tokens: 10,
                output_tokens: 5,
                stop_reason: "tool_use".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ])
        .with_stream_chunks(vec![
@@ -181,6 +191,8 @@ async fn seam_generic_tool_routing() {
                input_tokens: 5,
                output_tokens: 3,
                stop_reason: "end_turn".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ]);

--- a/crates/zclaw-kernel/tests/smoke_hands.rs
+++ b/crates/zclaw-kernel/tests/smoke_hands.rs
@@ -27,6 +27,8 @@ async fn smoke_hands_full_lifecycle() {
                input_tokens: 15,
                output_tokens: 10,
                stop_reason: "tool_use".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ])
        // After hand_quiz returns, LLM generates final response
@@ -36,6 +38,8 @@ async fn smoke_hands_full_lifecycle() {
                input_tokens: 20,
                output_tokens: 5,
                stop_reason: "end_turn".to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            },
        ]);

--- a/crates/zclaw-runtime/src/compaction.rs
+++ b/crates/zclaw-runtime/src/compaction.rs
@@ -14,6 +14,7 @@

 use std::sync::Arc;
 use std::sync::atomic::{AtomicU64, Ordering};
+use serde_json::Value;
 use zclaw_types::{AgentId, Message, SessionId};

 use crate::driver::{CompletionRequest, ContentBlock, LlmDriver};
@@ -136,7 +137,7 @@ pub fn update_calibration(estimated: usize, actual: u32) {
 }

 /// Estimate total tokens for messages with calibration applied.
-fn estimate_messages_tokens_calibrated(messages: &[Message]) -> usize {
+pub fn estimate_messages_tokens_calibrated(messages: &[Message]) -> usize {
    let raw = estimate_messages_tokens(messages);
    let factor = get_calibration_factor();
    if (factor - 1.0).abs() < f64::EPSILON {
@@ -178,7 +179,7 @@ pub fn compact_messages(messages: Vec<Message>, keep_recent: usize) -> (Vec<Mess
    let old_messages = &messages[..split_index];
    let recent_messages = &messages[split_index..];

-    let summary = generate_summary(old_messages);
+    let summary = generate_summary(old_messages, None);
    let removed_count = old_messages.len();

    let mut compacted = Vec::with_capacity(1 + recent_messages.len());
@@ -188,6 +189,38 @@ pub fn compact_messages(messages: Vec<Message>, keep_recent: usize) -> (Vec<Mess
    (compacted, removed_count)
 }

+/// Prune old tool outputs to reduce token consumption. Runs before compaction.
+/// Only prunes ToolResult messages older than PRUNE_AGE_THRESHOLD messages.
+const PRUNE_AGE_THRESHOLD: usize = 8;
+const PRUNE_MAX_CHARS: usize = 2000;
+const PRUNE_KEEP_HEAD_CHARS: usize = 500;
+
+pub fn prune_tool_outputs(messages: &mut [Message]) -> usize {
+    let total = messages.len();
+    let mut pruned_count = 0;
+
+    for i in 0..total.saturating_sub(PRUNE_AGE_THRESHOLD) {
+        if let Message::ToolResult { output, is_error, .. } = &mut messages[i] {
+            if *is_error { continue; }
+
+            let text = match output {
+                Value::String(ref s) => s.clone(),
+                ref other => other.to_string(),
+            };
+            if text.len() <= PRUNE_MAX_CHARS { continue; }
+
+            let end = text.floor_char_boundary(PRUNE_KEEP_HEAD_CHARS.min(text.len()));
+            *output = serde_json::json!({
+                "_pruned": true,
+                "_original_chars": text.len(),
+                "head": &text[..end],
+            });
+            pruned_count += 1;
+        }
+    }
+    pruned_count
+}
+
 /// Check if compaction should be triggered and perform it if needed.
 ///
 /// Returns the (possibly compacted) message list.
@@ -315,6 +348,18 @@ pub async fn maybe_compact_with_config(
        .iter()
        .take_while(|m| matches!(m, Message::System { .. }))
        .count();
+
+    // Extract previous summary from leading system messages for iterative summarization
+    let previous_summary = messages.iter()
+        .take(leading_system_count)
+        .filter_map(|m| match m {
+            Message::System { content } if content.starts_with("[以下是之前对话的摘要]") => {
+                Some(content.clone())
+            }
+            _ => None,
+        })
+        .next();
+
    let keep_from_end = DEFAULT_KEEP_RECENT
        .min(messages.len().saturating_sub(leading_system_count));
    let split_index = messages.len().saturating_sub(keep_from_end);
@@ -333,14 +378,16 @@ pub async fn maybe_compact_with_config(
    let recent_messages = &messages[split_index..];
    let removed_count = old_messages.len();

-    // Step 3: Generate summary (LLM or rule-based)
+    // Step 3: Generate summary (LLM or rule-based), with iterative context
+    let prev_ref = previous_summary.as_deref();
    let summary = if config.use_llm {
        if let Some(driver) = driver {
-            match generate_llm_summary(driver, old_messages, config.summary_max_tokens).await {
+            match generate_llm_summary(driver, old_messages, prev_ref, config.summary_max_tokens).await {
                Ok(llm_summary) => {
                    tracing::info!(
-                        "[Compaction] Generated LLM summary ({} chars)",
-                        llm_summary.len()
+                        "[Compaction] Generated LLM summary ({} chars, iterative={})",
+                        llm_summary.len(),
+                        previous_summary.is_some()
                    );
                    llm_summary
                }
@@ -350,7 +397,7 @@ pub async fn maybe_compact_with_config(
                            "[Compaction] LLM summary failed: {}, falling back to rules",
                            e
                        );
-                        generate_summary(old_messages)
+                        generate_summary(old_messages, prev_ref)
                    } else {
                        tracing::warn!(
                            "[Compaction] LLM summary failed: {}, returning original messages",
@@ -369,10 +416,10 @@ pub async fn maybe_compact_with_config(
            tracing::warn!(
                "[Compaction] LLM compaction requested but no driver available, using rules"
            );
-            generate_summary(old_messages)
+            generate_summary(old_messages, prev_ref)
        }
    } else {
-        generate_summary(old_messages)
+        generate_summary(old_messages, prev_ref)
    };

    let used_llm = config.use_llm && driver.is_some();
@@ -398,9 +445,11 @@ pub async fn maybe_compact_with_config(
 }

 /// Generate a summary using an LLM driver.
+/// If `previous_summary` is provided, builds on it iteratively.
 async fn generate_llm_summary(
    driver: &Arc<dyn LlmDriver>,
    messages: &[Message],
+    previous_summary: Option<&str>,
    max_tokens: u32,
 ) -> Result<String, String> {
    let mut conversation_text = String::new();
@@ -437,11 +486,21 @@ async fn generate_llm_summary(
        conversation_text.push_str("\n...(对话已截断)");
    }

-    let prompt = format!(
-        "请用简洁的中文总结以下对话的关键信息。保留重要的讨论主题、决策、结论和待办事项。\
-         输出格式为段落式摘要，不超过200字。\n\n{}",
-        conversation_text
-    );
+    let prompt = match previous_summary {
+        Some(prev) => format!(
+            "你是一个对话摘要助手。\n\n\
+             ## 上一轮摘要\n{}\n\n\
+             ## 新增对话内容\n{}\n\n\
+             请在上一轮摘要的基础上更新，保留所有关键决策、用户偏好和文件操作。\
+             输出200字以内的中文摘要。",
+            prev, conversation_text
+        ),
+        None => format!(
+            "请用简洁的中文总结以下对话的关键信息。保留重要的讨论主题、决策、结论和待办事项。\
+             输出格式为段落式摘要，不超过200字。\n\n{}",
+            conversation_text
+        ),
+    };

    let request = CompletionRequest {
        model: String::new(),
@@ -484,13 +543,22 @@ async fn generate_llm_summary(
 }

 /// Generate a rule-based summary of old messages.
-fn generate_summary(messages: &[Message]) -> String {
+/// If `previous_summary` is provided, carries forward key info.
+fn generate_summary(messages: &[Message], previous_summary: Option<&str>) -> String {
    if messages.is_empty() {
        return "[对话开始]".to_string();
    }

    let mut sections: Vec<String> = vec!["[以下是之前对话的摘要]".to_string()];

+    // Carry forward previous summary if available
+    if let Some(prev) = previous_summary {
+        // Strip the header line from previous summary for cleaner nesting
+        let prev_body = prev.strip_prefix("[以下是之前对话的摘要]\n")
+            .unwrap_or(prev);
+        sections.push(format!("[上轮摘要保留]: {}", truncate(prev_body, 200)));
+    }
+
    let mut user_count = 0;
    let mut assistant_count = 0;
    let mut topics: Vec<String> = Vec::new();
@@ -696,8 +764,21 @@ mod tests {
            Message::user("How does ownership work?"),
            Message::assistant("Ownership is Rust's memory management system"),
        ];
-        let summary = generate_summary(&messages);
+        let summary = generate_summary(&messages, None);
        assert!(summary.contains("摘要"));
        assert!(summary.contains("2"));
    }
+
+    #[test]
+    fn test_generate_summary_iterative() {
+        let messages = vec![
+            Message::user("What is async/await?"),
+            Message::assistant("Async/await is a concurrency model"),
+        ];
+        let prev = "[以下是之前对话的摘要]\n讨论主题: Rust; 所有权\n(已压缩 4 条消息)";
+        let summary = generate_summary(&messages, Some(prev));
+        assert!(summary.contains("摘要"));
+        assert!(summary.contains("上轮摘要保留"));
+        assert!(summary.contains("所有权"));
+    }
 }
--- a/crates/zclaw-runtime/src/driver/anthropic.rs
+++ b/crates/zclaw-runtime/src/driver/anthropic.rs
@@ -121,6 +121,8 @@ impl LlmDriver for AnthropicDriver {
            let mut byte_stream = response.bytes_stream();
            let mut current_tool_id: Option<String> = None;
            let mut tool_input_buffer = String::new();
+            let mut cache_creation_input_tokens: Option<u32> = None;
+            let mut cache_read_input_tokens: Option<u32> = None;

            while let Some(chunk_result) = byte_stream.next().await {
                let chunk = match chunk_result {
@@ -141,6 +143,15 @@ impl LlmDriver for AnthropicDriver {
                        match serde_json::from_str::<AnthropicStreamEvent>(data) {
                            Ok(event) => {
                                match event.event_type.as_str() {
+                                    "message_start" => {
+                                        // Capture cache token info from message_start event
+                                        if let Some(msg) = event.message {
+                                            if let Some(usage) = msg.usage {
+                                                cache_creation_input_tokens = usage.cache_creation_input_tokens;
+                                                cache_read_input_tokens = usage.cache_read_input_tokens;
+                                            }
+                                        }
+                                    }
                                    "content_block_delta" => {
                                        if let Some(delta) = event.delta {
                                            if let Some(text) = delta.text {
@@ -186,6 +197,8 @@ impl LlmDriver for AnthropicDriver {
                                                    input_tokens: msg.usage.as_ref().map(|u| u.input_tokens).unwrap_or(0),
                                                    output_tokens: msg.usage.as_ref().map(|u| u.output_tokens).unwrap_or(0),
                                                    stop_reason: msg.stop_reason.unwrap_or_else(|| "end_turn".to_string()),
+                                                    cache_creation_input_tokens,
+                                                    cache_read_input_tokens,
                                                });
                                            }
                                        }
@@ -298,7 +311,15 @@ impl AnthropicDriver {
        AnthropicRequest {
            model: request.model.clone(),
            max_tokens: effective_max,
-            system: request.system.clone(),
+            system: request.system.as_ref().map(|s| {
+                vec![SystemContentBlock {
+                    r#type: "text".to_string(),
+                    text: s.clone(),
+                    cache_control: Some(CacheControl {
+                        r#type: "ephemeral".to_string(),
+                    }),
+                }]
+            }),
            messages,
            tools: if tools.is_empty() { None } else { Some(tools) },
            temperature: request.temperature,
@@ -337,18 +358,35 @@ impl AnthropicDriver {
            input_tokens: api_response.usage.input_tokens,
            output_tokens: api_response.usage.output_tokens,
            stop_reason,
+            cache_creation_input_tokens: api_response.usage.cache_creation_input_tokens,
+            cache_read_input_tokens: api_response.usage.cache_read_input_tokens,
        }
    }
 }

 // Anthropic API types

+/// Anthropic cache_control 标记
+#[derive(Serialize, Clone)]
+struct CacheControl {
+    r#type: String, // "ephemeral"
+}
+
+/// Anthropic system prompt 内容块（支持 cache_control）
+#[derive(Serialize, Clone)]
+struct SystemContentBlock {
+    r#type: String, // "text"
+    text: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    cache_control: Option<CacheControl>,
+}
+
 #[derive(Serialize)]
 struct AnthropicRequest {
    model: String,
    max_tokens: u32,
    #[serde(skip_serializing_if = "Option::is_none")]
-    system: Option<String>,
+    system: Option<Vec<SystemContentBlock>>,
    messages: Vec<AnthropicMessage>,
    #[serde(skip_serializing_if = "Option::is_none")]
    tools: Option<Vec<AnthropicTool>>,
@@ -404,6 +442,10 @@ struct AnthropicContentBlock {
 struct AnthropicUsage {
    input_tokens: u32,
    output_tokens: u32,
+    #[serde(default)]
+    cache_creation_input_tokens: Option<u32>,
+    #[serde(default)]
+    cache_read_input_tokens: Option<u32>,
 }

 // Streaming types
@@ -458,4 +500,8 @@ struct AnthropicStreamUsage {
    input_tokens: u32,
    #[serde(default)]
    output_tokens: u32,
+    #[serde(default)]
+    cache_creation_input_tokens: Option<u32>,
+    #[serde(default)]
+    cache_read_input_tokens: Option<u32>,
 }
--- a/crates/zclaw-runtime/src/driver/error_classifier.rs
+++ b/crates/zclaw-runtime/src/driver/error_classifier.rs
@@ -0,0 +1,139 @@
+//! LLM 错误分类器。将 HTTP 状态码 + 错误体映射为 LlmErrorKind。
+
+use std::time::Duration;
+use zclaw_types::{LlmErrorKind, ClassifiedLlmError};
+
+/// 分类 LLM 错误
+pub fn classify_llm_error(
+    provider: &str,
+    status: u16,
+    body: &str,
+    is_timeout: bool,
+) -> ClassifiedLlmError {
+    let _ = provider; // reserved for per-provider overrides
+
+    if is_timeout {
+        return ClassifiedLlmError {
+            kind: LlmErrorKind::Timeout,
+            retryable: true,
+            should_compress: false,
+            should_rotate_credential: false,
+            retry_after: None,
+            message: "请求超时".to_string(),
+        };
+    }
+
+    match status {
+        401 | 403 => ClassifiedLlmError {
+            kind: LlmErrorKind::Auth,
+            retryable: false,
+            should_compress: false,
+            should_rotate_credential: true,
+            retry_after: None,
+            message: "认证失败，请检查 API Key".to_string(),
+        },
+        402 => {
+            let is_quota_transient = body.contains("retry")
+                || body.contains("limit")
+                || body.contains("usage");
+            ClassifiedLlmError {
+                kind: if is_quota_transient { LlmErrorKind::RateLimited } else { LlmErrorKind::BillingExhausted },
+                retryable: is_quota_transient,
+                should_compress: false,
+                should_rotate_credential: !is_quota_transient,
+                retry_after: if is_quota_transient { Some(Duration::from_secs(30)) } else { None },
+                message: if is_quota_transient { "使用限制，稍后重试".to_string() } else { "计费额度已耗尽".to_string() },
+            }
+        }
+        429 => ClassifiedLlmError {
+            kind: LlmErrorKind::RateLimited,
+            retryable: true,
+            should_compress: false,
+            should_rotate_credential: true,
+            retry_after: parse_retry_after(body),
+            message: "速率限制".to_string(),
+        },
+        529 => ClassifiedLlmError {
+            kind: LlmErrorKind::Overloaded,
+            retryable: true,
+            should_compress: false,
+            should_rotate_credential: false,
+            retry_after: Some(Duration::from_secs(5)),
+            message: "提供商过载".to_string(),
+        },
+        500 | 502 => ClassifiedLlmError {
+            kind: LlmErrorKind::ServerError,
+            retryable: true,
+            should_compress: false,
+            should_rotate_credential: false,
+            retry_after: None,
+            message: "服务端错误".to_string(),
+        },
+        503 => ClassifiedLlmError {
+            kind: LlmErrorKind::Overloaded,
+            retryable: true,
+            should_compress: false,
+            should_rotate_credential: false,
+            retry_after: Some(Duration::from_secs(3)),
+            message: "服务暂时不可用".to_string(),
+        },
+        400 => {
+            let is_context_overflow = body.contains("context_length")
+                || body.contains("max_tokens")
+                || body.contains("too many tokens")
+                || body.contains("prompt is too long");
+            ClassifiedLlmError {
+                kind: if is_context_overflow { LlmErrorKind::ContextOverflow } else { LlmErrorKind::Unknown },
+                retryable: false,
+                should_compress: is_context_overflow,
+                should_rotate_credential: false,
+                retry_after: None,
+                message: if is_context_overflow {
+                    "上下文过长，需要压缩".to_string()
+                } else {
+                    format!("请求错误: {}", &body[..body.len().min(200)])
+                },
+            }
+        }
+        404 => ClassifiedLlmError {
+            kind: LlmErrorKind::ModelNotFound,
+            retryable: false,
+            should_compress: false,
+            should_rotate_credential: false,
+            retry_after: None,
+            message: "模型不存在".to_string(),
+        },
+        _ => ClassifiedLlmError {
+            kind: LlmErrorKind::Unknown,
+            retryable: true,
+            should_compress: false,
+            should_rotate_credential: false,
+            retry_after: None,
+            message: format!("未知错误 ({}) {}", status, &body[..body.len().min(200)]),
+        },
+    }
+}
+
+fn parse_retry_after(body: &str) -> Option<Duration> {
+    // Anthropic: "Please retry after X seconds"
+    // OpenAI: "Please retry after Xms"
+    if let Some(secs) = extract_retry_seconds(body) {
+        return Some(Duration::from_secs(secs));
+    }
+    if let Some(ms) = extract_retry_millis(body) {
+        return Some(Duration::from_millis(ms));
+    }
+    Some(Duration::from_secs(2))
+}
+
+fn extract_retry_seconds(body: &str) -> Option<u64> {
+    let re = regex::Regex::new(r"retry\s+(?:after\s+)?(\d+)\s*(?:s|sec|seconds?)").ok()?;
+    let caps = re.captures(body)?;
+    caps[1].parse().ok()
+}
+
+fn extract_retry_millis(body: &str) -> Option<u64> {
+    let re = regex::Regex::new(r"retry\s+(?:after\s+)?(\d+)\s*ms").ok()?;
+    let caps = re.captures(body)?;
+    caps[1].parse().ok()
+}
--- a/crates/zclaw-runtime/src/driver/gemini.rs
+++ b/crates/zclaw-runtime/src/driver/gemini.rs
@@ -238,6 +238,8 @@ impl LlmDriver for GeminiDriver {
                                                input_tokens,
                                                output_tokens,
                                                stop_reason: stop_reason.to_string(),
+                                                cache_creation_input_tokens: None,
+                                                cache_read_input_tokens: None,
                                            });
                                        }
                                    }
@@ -500,6 +502,8 @@ impl GeminiDriver {
            input_tokens,
            output_tokens,
            stop_reason,
+            cache_creation_input_tokens: None,
+            cache_read_input_tokens: None,
        }
    }
 }
--- a/crates/zclaw-runtime/src/driver/local.rs
+++ b/crates/zclaw-runtime/src/driver/local.rs
@@ -238,6 +238,8 @@ impl LocalDriver {
            input_tokens,
            output_tokens,
            stop_reason,
+            cache_creation_input_tokens: None,
+            cache_read_input_tokens: None,
        }
    }

@@ -396,6 +398,8 @@ impl LlmDriver for LocalDriver {
                                input_tokens: 0,
                                output_tokens: 0,
                                stop_reason: "end_turn".to_string(),
+                                cache_creation_input_tokens: None,
+                                cache_read_input_tokens: None,
                            });
                            continue;
                        }
--- a/crates/zclaw-runtime/src/driver/mod.rs
+++ b/crates/zclaw-runtime/src/driver/mod.rs
@@ -15,11 +15,14 @@ mod anthropic;
 mod openai;
 mod gemini;
 mod local;
+mod error_classifier;
+mod retry_driver;

 pub use anthropic::AnthropicDriver;
 pub use openai::OpenAiDriver;
 pub use gemini::GeminiDriver;
 pub use local::LocalDriver;
+pub use retry_driver::{RetryDriver, RetryConfig};

 /// LLM Driver trait - unified interface for all providers
 #[async_trait]
@@ -106,6 +109,12 @@ pub struct CompletionResponse {
    pub output_tokens: u32,
    /// Stop reason
    pub stop_reason: StopReason,
+    /// Cache creation input tokens (Anthropic prompt caching)
+    #[serde(default)]
+    pub cache_creation_input_tokens: Option<u32>,
+    /// Cache read input tokens (Anthropic prompt caching)
+    #[serde(default)]
+    pub cache_read_input_tokens: Option<u32>,
 }

 /// LLM driver response content block (subset of canonical zclaw_types::ContentBlock).
--- a/crates/zclaw-runtime/src/driver/openai.rs
+++ b/crates/zclaw-runtime/src/driver/openai.rs
@@ -222,10 +222,13 @@ impl LlmDriver for OpenAiDriver {
                                let parsed_args: serde_json::Value = if args.is_empty() {
                                    serde_json::json!({})
                                } else {
-                                    serde_json::from_str(args).unwrap_or_else(|e| {
-                                        tracing::warn!("[OpenAI] Failed to parse tool args '{}': {}, using empty object", args, e);
-                                        serde_json::json!({})
-                                    })
+                                    match serde_json::from_str(args) {
+                                        Ok(v) => v,
+                                        Err(e) => {
+                                            tracing::error!("[OpenAI] Failed to parse tool call '{}' args: {}. Raw: {}", name, e, &args[..args.len().min(200)]);
+                                            serde_json::json!({ "_parse_error": e.to_string(), "_raw_args": args[..args.len().min(500)].to_string() })
+                                        }
+                                    }
                                };
                                yield Ok(StreamChunk::ToolUseEnd {
                                    id: id.clone(),
@@ -237,6 +240,8 @@ impl LlmDriver for OpenAiDriver {
                                input_tokens: 0,
                                output_tokens: 0,
                                stop_reason: "end_turn".to_string(),
+                                cache_creation_input_tokens: None,
+                                cache_read_input_tokens: None,
                            });
                            continue;
                        }
@@ -638,6 +643,8 @@ impl OpenAiDriver {
            input_tokens,
            output_tokens,
            stop_reason,
+            cache_creation_input_tokens: None,
+            cache_read_input_tokens: None,
        }
    }

@@ -761,6 +768,8 @@ impl OpenAiDriver {
                    StopReason::StopSequence => "stop",
                    StopReason::Error => "error",
                }.to_string(),
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            });
        })
    }
--- a/crates/zclaw-runtime/src/driver/retry_driver.rs
+++ b/crates/zclaw-runtime/src/driver/retry_driver.rs
@@ -0,0 +1,123 @@
+//! RetryDriver: LlmDriver 的重试装饰器。
+//! 仅在本地 Kernel 路径使用，SaaS Relay 已有自己的重试逻辑。
+
+use std::sync::Arc;
+use std::time::Duration;
+use async_trait::async_trait;
+use futures::Stream;
+use rand::Rng;
+use zclaw_types::{Result, ZclawError};
+
+use super::{LlmDriver, CompletionRequest, CompletionResponse, StreamChunk};
+use super::error_classifier::classify_llm_error;
+
+/// 重试配置
+#[derive(Debug, Clone)]
+pub struct RetryConfig {
+    pub max_attempts: u32,
+    pub base_delay_secs: f64,
+    pub max_delay_secs: f64,
+    pub jitter_ratio: f64,
+}
+
+impl Default for RetryConfig {
+    fn default() -> Self {
+        Self {
+            max_attempts: 3,
+            base_delay_secs: 1.0,
+            max_delay_secs: 8.0,
+            jitter_ratio: 0.5,
+        }
+    }
+}
+
+/// 重试装饰器
+pub struct RetryDriver {
+    inner: Arc<dyn LlmDriver>,
+    config: RetryConfig,
+}
+
+impl RetryDriver {
+    pub fn new(inner: Arc<dyn LlmDriver>, config: RetryConfig) -> Self {
+        Self { inner, config }
+    }
+
+    fn jittered_backoff(&self, attempt: u32) -> Duration {
+        let base = self.config.base_delay_secs * 2_f64.powi(attempt as i32);
+        let capped = base.min(self.config.max_delay_secs);
+        let mut rng = rand::thread_rng();
+        let jitter = capped * self.config.jitter_ratio * rng.gen::<f64>();
+        Duration::from_secs_f64(capped + jitter)
+    }
+}
+
+#[async_trait]
+impl LlmDriver for RetryDriver {
+    fn provider(&self) -> &str {
+        self.inner.provider()
+    }
+
+    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse> {
+        let mut last_error: Option<ZclawError> = None;
+
+        for attempt in 0..self.config.max_attempts {
+            match self.inner.complete(request.clone()).await {
+                Ok(response) => return Ok(response),
+                Err(e) => {
+                    let message = e.to_string();
+                    let status = extract_status_from_error(&message);
+                    let classified = classify_llm_error(
+                        self.inner.provider(),
+                        status,
+                        &message,
+                        message.contains("timeout") || message.contains("Timeout"),
+                    );
+
+                    if !classified.retryable {
+                        return Err(e);
+                    }
+
+                    if classified.should_compress {
+                        return Err(ZclawError::LlmError(
+                            format!("[CONTEXT_OVERFLOW] {}", message)
+                        ));
+                    }
+
+                    last_error = Some(e);
+
+                    if attempt + 1 < self.config.max_attempts {
+                        let delay = classified.retry_after
+                            .unwrap_or_else(|| self.jittered_backoff(attempt));
+                        tracing::warn!(
+                            "[RetryDriver] Attempt {}/{} failed ({}), retrying in {:.1}s",
+                            attempt + 1, self.config.max_attempts, classified.message,
+                            delay.as_secs_f64()
+                        );
+                        tokio::time::sleep(delay).await;
+                    }
+                }
+            }
+        }
+
+        Err(last_error.unwrap_or_else(|| ZclawError::LlmError("重试耗尽".to_string())))
+    }
+
+    fn stream(
+        &self,
+        request: CompletionRequest,
+    ) -> std::pin::Pin<Box<dyn Stream<Item = Result<StreamChunk>> + Send + '_>> {
+        // 流式路径不重试——部分 delta 已发送，重试会导致 UI 重复
+        self.inner.stream(request)
+    }
+
+    fn is_configured(&self) -> bool {
+        self.inner.is_configured()
+    }
+}
+
+fn extract_status_from_error(message: &str) -> u16 {
+    let re = regex::Regex::new(r"(?:error|status)[:\s]+(\d{3})").ok();
+    re.and_then(|re| re.captures(message))
+        .and_then(|caps| caps[1].parse().ok())
+        .unwrap_or(0)
+}
--- a/crates/zclaw-runtime/src/loop_runner.rs
+++ b/crates/zclaw-runtime/src/loop_runner.rs
@@ -4,10 +4,11 @@ use std::sync::Arc;
 use futures::StreamExt;
 use tokio::sync::mpsc;
 use zclaw_types::{AgentId, SessionId, Message, Result};
+use serde_json::Value;

 use crate::driver::{LlmDriver, CompletionRequest, ContentBlock};
 use crate::stream::StreamChunk;
-use crate::tool::{ToolRegistry, ToolContext, SkillExecutor, HandExecutor};
+use crate::tool::{ToolRegistry, ToolContext, SkillExecutor, HandExecutor, ToolConcurrency};
 use crate::tool::builtin::PathValidator;
 use crate::growth::GrowthIntegration;
 use crate::compaction::{self, CompactionConfig};
@@ -303,8 +304,28 @@ impl AgentLoop {
                plan_mode: self.plan_mode,
            };

-            // Call LLM
-            let response = self.driver.complete(request).await?;
+            // Call LLM with context-overflow recovery
+            let response = match self.driver.complete(request).await {
+                Ok(r) => r,
+                Err(e) => {
+                    let err_str = e.to_string();
+                    if err_str.contains("[CONTEXT_OVERFLOW]") && self.compaction_threshold > 0 {
+                        tracing::warn!("[AgentLoop] Context overflow detected, triggering emergency compaction");
+                        let pruned = compaction::prune_tool_outputs(&mut messages);
+                        if pruned > 0 {
+                            tracing::info!("[AgentLoop] Emergency pruning removed {} tool outputs", pruned);
+                        }
+                        let keep_recent = messages.len().saturating_sub(messages.len() / 3);
+                        let (compacted, removed) = compaction::compact_messages(messages, keep_recent.max(4));
+                        if removed > 0 {
+                            tracing::info!("[AgentLoop] Emergency compaction removed {} messages", removed);
+                            messages = compacted;
+                            continue; // retry the iteration with compacted messages
+                        }
+                    }
+                    return Err(e);
+                }
+            };
            total_input_tokens += response.input_tokens;
            total_output_tokens += response.output_tokens;

@@ -375,21 +396,22 @@ impl AgentLoop {
            let tool_context = self.create_tool_context(session_id.clone());
            let mut abort_result: Option<AgentLoopResult> = None;
            let mut clarification_result: Option<AgentLoopResult> = None;
-            for (id, name, input) in tool_calls {
-                // Check if loop was already aborted
-                if abort_result.is_some() {
-                    break;
-                }
+
+            // Phase 1: Pre-process inputs + middleware checks (serial)
+            struct ToolPlan {
+                idx: usize,
+                id: String,
+                name: String,
+                input: Value,
+            }
+            let mut plans: Vec<ToolPlan> = Vec::new();
+            for (idx, (id, name, input)) in tool_calls.into_iter().enumerate() {
+                if abort_result.is_some() { break; }

                // GLM and other models sometimes send tool calls with empty arguments `{}`
-                // Inject the last user message as a fallback query so the tool can infer intent.
                let input = if input.as_object().map_or(false, |obj| obj.is_empty()) {
                    if let Some(last_user_msg) = messages.iter().rev().find_map(|m| {
-                        if let Message::User { content } = m {
-                            Some(content.clone())
-                        } else {
-                            None
-                        }
+                        if let Message::User { content } = m { Some(content.clone()) } else { None }
                    }) {
                        tracing::info!("[AgentLoop] Tool '{}' received empty input, injecting user message as fallback query", name);
                        serde_json::json!({ "_fallback_query": last_user_msg })
@@ -400,101 +422,152 @@ impl AgentLoop {
                    input
                };

-                // Check tool call safety — via middleware chain
-                {
-                    let mw_ctx_ref = middleware::MiddlewareContext {
+                let mw_ctx = middleware::MiddlewareContext {
+                    agent_id: self.agent_id.clone(),
+                    session_id: session_id.clone(),
+                    user_input: input.to_string(),
+                    system_prompt: enhanced_prompt.clone(),
+                    messages: messages.clone(),
+                    response_content: Vec::new(),
+                    input_tokens: total_input_tokens,
+                    output_tokens: total_output_tokens,
+                };
+                match self.middleware_chain.run_before_tool_call(&mw_ctx, &name, &input).await? {
+                    middleware::ToolCallDecision::Allow => {
+                        plans.push(ToolPlan { idx, id, name, input });
+                    }
+                    middleware::ToolCallDecision::Block(msg) => {
+                        tracing::warn!("[AgentLoop] Tool '{}' blocked by middleware: {}", name, msg);
+                        messages.push(Message::tool_result(&id, zclaw_types::ToolId::new(&name), serde_json::json!({ "error": msg }), true));
+                    }
+                    middleware::ToolCallDecision::ReplaceInput(new_input) => {
+                        plans.push(ToolPlan { idx, id, name, input: new_input });
+                    }
+                    middleware::ToolCallDecision::AbortLoop(reason) => {
+                        tracing::warn!("[AgentLoop] Loop aborted by middleware: {}", reason);
+                        let msg = format!("{}\n已自动终止", reason);
+                        self.memory.append_message(&session_id, &Message::assistant(&msg)).await?;
+                        abort_result = Some(AgentLoopResult {
+                            response: msg,
+                            input_tokens: total_input_tokens,
+                            output_tokens: total_output_tokens,
+                            iterations,
+                        });
+                    }
+                }
+            }
+
+            // Phase 2: Execute tools (parallel for ReadOnly, serial for others)
+            if abort_result.is_none() && !plans.is_empty() {
+                let (parallel_plans, sequential_plans): (Vec<_>, Vec<_>) = plans.iter()
+                    .partition(|p| {
+                        self.tools.get(&p.name)
+                            .map(|t| t.concurrency())
+                            .unwrap_or(ToolConcurrency::Exclusive) == ToolConcurrency::ReadOnly
+                    });
+
+                let mut results: std::collections::HashMap<usize, (String, String, serde_json::Value)> = std::collections::HashMap::new();
+
+                // Execute parallel (ReadOnly) tools with JoinSet (max 3 concurrent)
+                if !parallel_plans.is_empty() {
+                    let semaphore = Arc::new(tokio::sync::Semaphore::new(3));
+                    let mut join_set = tokio::task::JoinSet::new();
+
+                    for plan in &parallel_plans {
+                        let tool = self.tools.get(&plan.name).unwrap();
+                        let ctx = tool_context.clone();
+                        let input = plan.input.clone();
+                        let idx = plan.idx;
+                        let id = plan.id.clone();
+                        let name = plan.name.clone();
+                        let permit = semaphore.clone().acquire_owned().await.unwrap();
+
+                        join_set.spawn(async move {
+                            let result = tokio::time::timeout(
+                                std::time::Duration::from_secs(30),
+                                tool.execute(input, &ctx)
+                            ).await;
+                            drop(permit);
+                            (idx, id, name, result)
+                        });
+                    }
+
+                    while let Some(res) = join_set.join_next().await {
+                        match res {
+                            Ok((idx, id, name, Ok(Ok(value)))) => {
+                                results.insert(idx, (id, name, value));
+                            }
+                            Ok((idx, id, name, Ok(Err(e)))) => {
+                                results.insert(idx, (id, name, serde_json::json!({ "error": e.to_string() })));
+                            }
+                            Ok((idx, id, name, Err(_))) => {
+                                tracing::warn!("[AgentLoop] Tool '{}' timed out after 30s (parallel)", name);
+                                results.insert(idx, (id, name.clone(), serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", name) })));
+                            }
+                            Err(e) => {
+                                tracing::warn!("[AgentLoop] JoinError in parallel tool execution: {}", e);
+                            }
+                        }
+                    }
+                }
+
+                // Execute sequential (Exclusive/Interactive) tools
+                for plan in &sequential_plans {
+                    let tool_result = match tokio::time::timeout(
+                        std::time::Duration::from_secs(30),
+                        self.execute_tool(&plan.name, plan.input.clone(), &tool_context),
+                    ).await {
+                        Ok(Ok(result)) => result,
+                        Ok(Err(e)) => serde_json::json!({ "error": e.to_string() }),
+                        Err(_) => {
+                            tracing::warn!("[AgentLoop] Tool '{}' timed out after 30s", plan.name);
+                            serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", plan.name) })
+                        }
+                    };
+
+                    // Check if this is a clarification response
+                    if plan.name == "ask_clarification"
+                        && tool_result.get("status").and_then(|v| v.as_str()) == Some("clarification_needed")
+                    {
+                        tracing::info!("[AgentLoop] Clarification requested, terminating loop");
+                        let question = tool_result.get("question")
+                            .and_then(|v| v.as_str())
+                            .unwrap_or("需要更多信息")
+                            .to_string();
+                        results.insert(plan.idx, (plan.id.clone(), plan.name.clone(), tool_result));
+                        self.memory.append_message(&session_id, &Message::assistant(&question)).await?;
+                        clarification_result = Some(AgentLoopResult {
+                            response: question,
+                            input_tokens: total_input_tokens,
+                            output_tokens: total_output_tokens,
+                            iterations,
+                        });
+                        break;
+                    }
+                    results.insert(plan.idx, (plan.id.clone(), plan.name.clone(), tool_result));
+                }
+
+                // Push results in original tool_call order
+                let mut sorted_indices: Vec<usize> = results.keys().copied().collect();
+                sorted_indices.sort();
+                for idx in sorted_indices {
+                    let (id, name, result) = results.remove(&idx).unwrap();
+                    // Run after_tool_call middleware (error counting, output guard, etc.)
+                    let mut mw_ctx = middleware::MiddlewareContext {
                        agent_id: self.agent_id.clone(),
                        session_id: session_id.clone(),
-                        user_input: input.to_string(),
+                        user_input: String::new(),
                        system_prompt: enhanced_prompt.clone(),
                        messages: messages.clone(),
                        response_content: Vec::new(),
                        input_tokens: total_input_tokens,
                        output_tokens: total_output_tokens,
                    };
-                    match self.middleware_chain.run_before_tool_call(&mw_ctx_ref, &name, &input).await? {
-                        middleware::ToolCallDecision::Allow => {}
-                        middleware::ToolCallDecision::Block(msg) => {
-                            tracing::warn!("[AgentLoop] Tool '{}' blocked by middleware: {}", name, msg);
-                            let error_output = serde_json::json!({ "error": msg });
-                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
-                            continue;
-                        }
-                        middleware::ToolCallDecision::ReplaceInput(new_input) => {
-                            // Execute with replaced input (with timeout)
-                            let tool_result = match tokio::time::timeout(
-                                std::time::Duration::from_secs(30),
-                                self.execute_tool(&name, new_input, &tool_context),
-                            ).await {
-                                Ok(Ok(result)) => result,
-                                Ok(Err(e)) => serde_json::json!({ "error": e.to_string() }),
-                                Err(_) => {
-                                    tracing::warn!("[AgentLoop] Tool '{}' (replaced input) timed out after 30s", name);
-                                    serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", name) })
-                                }
-                            };
-                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), tool_result, false));
-                            continue;
-                        }
-                        middleware::ToolCallDecision::AbortLoop(reason) => {
-                            tracing::warn!("[AgentLoop] Loop aborted by middleware: {}", reason);
-                            let msg = format!("{}\n已自动终止", reason);
-                            self.memory.append_message(&session_id, &Message::assistant(&msg)).await?;
-                            abort_result = Some(AgentLoopResult {
-                                response: msg,
-                                input_tokens: total_input_tokens,
-                                output_tokens: total_output_tokens,
-                                iterations,
-                            });
-                        }
+                    if let Err(e) = self.middleware_chain.run_after_tool_call(&mut mw_ctx, &name, &result).await {
+                        tracing::warn!("[AgentLoop] after_tool_call middleware failed for '{}': {}", name, e);
                    }
+                    messages.push(Message::tool_result(&id, zclaw_types::ToolId::new(&name), result, false));
                }
-
-                let tool_result = match tokio::time::timeout(
-                    std::time::Duration::from_secs(30),
-                    self.execute_tool(&name, input, &tool_context),
-                ).await {
-                    Ok(Ok(result)) => result,
-                    Ok(Err(e)) => serde_json::json!({ "error": e.to_string() }),
-                    Err(_) => {
-                        tracing::warn!("[AgentLoop] Tool '{}' timed out after 30s", name);
-                        serde_json::json!({ "error": format!("工具 '{}' 执行超时（30秒），请重试", name) })
-                    }
-                };
-
-                // Check if this is a clarification response — terminate loop immediately
-                // so the LLM waits for user input instead of continuing to generate.
-                if name == "ask_clarification"
-                    && tool_result.get("status").and_then(|v| v.as_str()) == Some("clarification_needed")
-                {
-                    tracing::info!("[AgentLoop] Clarification requested, terminating loop");
-                    let question = tool_result.get("question")
-                        .and_then(|v| v.as_str())
-                        .unwrap_or("需要更多信息")
-                        .to_string();
-                    messages.push(Message::tool_result(
-                        id,
-                        zclaw_types::ToolId::new(&name),
-                        tool_result,
-                        false,
-                    ));
-                    self.memory.append_message(&session_id, &Message::assistant(&question)).await?;
-                    clarification_result = Some(AgentLoopResult {
-                        response: question,
-                        input_tokens: total_input_tokens,
-                        output_tokens: total_output_tokens,
-                        iterations,
-                    });
-                    break;
-                }
-
-                // Add tool result to messages
-                messages.push(Message::tool_result(
-                    id,
-                    zclaw_types::ToolId::new(&name),
-                    tool_result,
-                    false, // is_error - we include errors in the result itself
-                ));
            }

            // Continue the loop - LLM will process tool results and generate final response
@@ -647,6 +720,7 @@ impl AgentLoop {

                let mut stream = driver.stream(request);
                let mut pending_tool_calls: Vec<(String, String, serde_json::Value)> = Vec::new();
+                let mut completed_tool_ids: std::collections::HashSet<String> = std::collections::HashSet::new();
                let mut iteration_text = String::new();
                let mut reasoning_text = String::new(); // Track reasoning separately for API requirement

@@ -703,6 +777,7 @@ impl AgentLoop {
                                    // Update with final parsed input and emit ToolStart event
                                    if let Some(tool) = pending_tool_calls.iter_mut().find(|(tid, _, _)| tid == id) {
                                        tool.2 = input.clone();
+                                        completed_tool_ids.insert(id.clone());
                                        if let Err(e) = tx.send(LoopEvent::ToolStart { name: tool.1.clone(), input: input.clone() }).await {
                                            tracing::warn!("[AgentLoop] Failed to send ToolStart event: {}", e);
                                        }
@@ -810,10 +885,26 @@ impl AgentLoop {
                    break 'outer;
                }

-                // Skip tool processing if stream errored or timed out
+                // Handle stream errors — execute complete tool calls, cancel incomplete ones
                if stream_errored {
-                    tracing::debug!("[AgentLoop] Stream errored, skipping tool processing and breaking");
-                    break 'outer;
+                    // Cancel incomplete tools (ToolStart sent but ToolUseEnd not received)
+                    let incomplete: Vec<_> = pending_tool_calls.iter()
+                        .filter(|(id, _, _)| !completed_tool_ids.contains(id))
+                        .collect();
+                    for (_, name, _) in &incomplete {
+                        tracing::warn!("[AgentLoop] Cancelling incomplete tool '{}' due to stream error", name);
+                        let error_output = serde_json::json!({ "error": "流式响应中断，工具调用未完成" });
+                        if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output }).await {
+                            tracing::warn!("[AgentLoop] Failed to send cancellation ToolEnd event: {}", e);
+                        }
+                    }
+                    // Retain only complete tools for execution
+                    pending_tool_calls.retain(|(id, _, _)| completed_tool_ids.contains(id));
+                    if pending_tool_calls.is_empty() {
+                        tracing::debug!("[AgentLoop] Stream errored with no complete tool calls, breaking");
+                        break 'outer;
+                    }
+                    tracing::info!("[AgentLoop] Stream errored but executing {} complete tool calls", pending_tool_calls.len());
                }

                tracing::debug!("[AgentLoop] Processing {} tool calls (reasoning: {} chars)", pending_tool_calls.len(), reasoning_text.len());
@@ -830,187 +921,192 @@ impl AgentLoop {
                    messages.push(Message::tool_use(id, zclaw_types::ToolId::new(name), input.clone()));
                }

-                // Execute tools
-                for (id, name, input) in pending_tool_calls {
-                    tracing::debug!("[AgentLoop] Executing tool: name={}, input={:?}", name, input);
+                // Execute tools — Phase 1: Pre-process through middleware (serial)
+                struct StreamToolPlan { idx: usize, id: String, name: String, input: Value }
+                let mut plans: Vec<StreamToolPlan> = Vec::new();
+                let mut abort_loop = false;
+                for (idx, (id, name, input)) in pending_tool_calls.into_iter().enumerate() {
+                    if abort_loop { break; }
+                    let mw_ctx = middleware::MiddlewareContext {
+                        agent_id: agent_id.clone(),
+                        session_id: session_id_clone.clone(),
+                        user_input: input.to_string(),
+                        system_prompt: enhanced_prompt.clone(),
+                        messages: messages.clone(),
+                        response_content: Vec::new(),
+                        input_tokens: total_input_tokens,
+                        output_tokens: total_output_tokens,
+                    };
+                    match middleware_chain.run_before_tool_call(&mw_ctx, &name, &input).await {
+                        Ok(middleware::ToolCallDecision::Allow) => {
+                            plans.push(StreamToolPlan { idx, id, name, input });
+                        }
+                        Ok(middleware::ToolCallDecision::Block(msg)) => {
+                            tracing::warn!("[AgentLoop] Tool '{}' blocked by middleware: {}", name, msg);
+                            let error_output = serde_json::json!({ "error": msg });
+                            if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
+                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
+                            }
+                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
+                        }
+                        Ok(middleware::ToolCallDecision::ReplaceInput(new_input)) => {
+                            plans.push(StreamToolPlan { idx, id, name, input: new_input });
+                        }
+                        Ok(middleware::ToolCallDecision::AbortLoop(reason)) => {
+                            tracing::warn!("[AgentLoop] Loop aborted by middleware: {}", reason);
+                            if let Err(e) = tx.send(LoopEvent::Error(reason)).await {
+                                tracing::warn!("[AgentLoop] Failed to send Error event: {}", e);
+                            }
+                            abort_loop = true;
+                        }
+                        Err(e) => {
+                            tracing::error!("[AgentLoop] Middleware error for tool '{}': {}", name, e);
+                            let error_output = serde_json::json!({ "error": e.to_string() });
+                            if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
+                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
+                            }
+                            messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
+                        }
+                    }
+                }
+                if abort_loop { break 'outer; }
+                if plans.is_empty() {
+                    tracing::debug!("[AgentLoop] No tools to execute after middleware filtering");
+                    break 'outer;
+                }

-                    // Check tool call safety — via middleware chain
+                // Build shared tool context
+                let pv = path_validator.clone().unwrap_or_else(|| {
+                    let home = std::env::var("USERPROFILE")
+                        .or_else(|_| std::env::var("HOME"))
+                        .unwrap_or_else(|_| ".".to_string());
+                    PathValidator::new().with_workspace(std::path::PathBuf::from(&home))
+                });
+                let working_dir = pv.workspace_root().map(|p| p.to_string_lossy().to_string());
+                let tool_context = ToolContext {
+                    agent_id: agent_id.clone(),
+                    working_directory: working_dir,
+                    session_id: Some(session_id_clone.to_string()),
+                    skill_executor: skill_executor.clone(),
+                    hand_executor: hand_executor.clone(),
+                    path_validator: Some(pv),
+                    event_sender: Some(tx.clone()),
+                };
+
+                // Phase 2: Execute tools (parallel for ReadOnly, serial for others)
+                let (parallel_plans, sequential_plans): (Vec<_>, Vec<_>) = plans.iter()
+                    .partition(|p| {
+                        tools.get(&p.name)
+                            .map(|t| t.concurrency())
+                            .unwrap_or(ToolConcurrency::Exclusive) == ToolConcurrency::ReadOnly
+                    });
+
+                let mut results: std::collections::HashMap<usize, (String, String, serde_json::Value, bool)> = std::collections::HashMap::new();
+
+                // Execute parallel (ReadOnly) tools with JoinSet (max 3 concurrent)
+                if !parallel_plans.is_empty() {
+                    let sem = Arc::new(tokio::sync::Semaphore::new(3));
+                    let mut join_set = tokio::task::JoinSet::new();
+                    for plan in &parallel_plans {
+                        let tool_ctx = tool_context.clone();
+                        let input = plan.input.clone();
+                        let idx = plan.idx;
+                        let id = plan.id.clone();
+                        let name = plan.name.clone();
+                        let tools_ref = tools.clone();
+                        let permit = sem.clone().acquire_owned().await.unwrap();
+                        join_set.spawn(async move {
+                            let result = if let Some(tool) = tools_ref.get(&name) {
+                                tokio::time::timeout(std::time::Duration::from_secs(30), tool.execute(input, &tool_ctx)).await
+                            } else {
+                                Ok(Err(zclaw_types::ZclawError::Internal(format!("Unknown tool: {}", name))))
+                            };
+                            drop(permit);
+                            (idx, id, name, result)
+                        });
+                    }
+                    while let Some(res) = join_set.join_next().await {
+                        match res {
+                            Ok((idx, id, name, Ok(Ok(value)))) => {
+                                results.insert(idx, (id, name, value, false));
+                            }
+                            Ok((idx, id, name, Ok(Err(e)))) => {
+                                results.insert(idx, (id, name, serde_json::json!({ "error": e.to_string() }), true));
+                            }
+                            Ok((idx, id, name, Err(_))) => {
+                                tracing::warn!("[AgentLoop] Tool '{}' timed out (parallel, 30s)", name);
+                                results.insert(idx, (id, name.clone(), serde_json::json!({ "error": format!("工具 '{}' 执行超时", name) }), true));
+                            }
+                            Err(e) => {
+                                tracing::warn!("[AgentLoop] JoinError in parallel tool execution: {}", e);
+                            }
+                        }
+                    }
+                }
+
+                // Execute sequential (Exclusive/Interactive) tools
+                for plan in &sequential_plans {
+                    let (result, is_error) = if let Some(tool) = tools.get(&plan.name) {
+                        match tool.execute(plan.input.clone(), &tool_context).await {
+                            Ok(output) => (output, false),
+                            Err(e) => (serde_json::json!({ "error": e.to_string() }), true),
+                        }
+                    } else {
+                        (serde_json::json!({ "error": format!("Unknown tool: {}", plan.name) }), true)
+                    };
+
+                    // Check clarification (only from sequential tools — ask_clarification is Interactive)
+                    if plan.name == "ask_clarification"
+                        && result.get("status").and_then(|v| v.as_str()) == Some("clarification_needed")
                    {
-                        let mw_ctx = middleware::MiddlewareContext {
+                        tracing::info!("[AgentLoop] Streaming: Clarification requested, terminating loop");
+                        let question = result.get("question").and_then(|v| v.as_str()).unwrap_or("需要更多信息").to_string();
+                        messages.push(Message::tool_result(plan.id.clone(), zclaw_types::ToolId::new(&plan.name), result, is_error));
+                        if let Err(e) = tx.send(LoopEvent::Delta(question.clone())).await { tracing::warn!("{}", e); }
+                        if let Err(e) = tx.send(LoopEvent::Complete(AgentLoopResult { response: question.clone(), input_tokens: total_input_tokens, output_tokens: total_output_tokens, iterations: iteration })).await { tracing::warn!("{}", e); }
+                        if let Err(e) = memory.append_message(&session_id_clone, &Message::assistant(&question)).await { tracing::warn!("{}", e); }
+                        break 'outer;
+                    }
+                    results.insert(plan.idx, (plan.id.clone(), plan.name.clone(), result, is_error));
+                }
+
+                // Phase 3: after_tool_call middleware + push results in original order
+                let mut sorted_indices: Vec<usize> = results.keys().copied().collect();
+                sorted_indices.sort();
+                for idx in sorted_indices {
+                    let (id, name, result, is_error) = results.remove(&idx).unwrap();
+
+                    // Emit ToolEnd event
+                    if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: result.clone() }).await {
+                        tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
+                    }
+
+                    // Run after_tool_call middleware
+                    {
+                        let mut mw_ctx = middleware::MiddlewareContext {
                            agent_id: agent_id.clone(),
                            session_id: session_id_clone.clone(),
-                            user_input: input.to_string(),
+                            user_input: String::new(),
                            system_prompt: enhanced_prompt.clone(),
                            messages: messages.clone(),
                            response_content: Vec::new(),
                            input_tokens: total_input_tokens,
                            output_tokens: total_output_tokens,
                        };
-                        match middleware_chain.run_before_tool_call(&mw_ctx, &name, &input).await {
-                            Ok(middleware::ToolCallDecision::Allow) => {}
-                            Ok(middleware::ToolCallDecision::Block(msg)) => {
-                                tracing::warn!("[AgentLoop] Tool '{}' blocked by middleware: {}", name, msg);
-                                let error_output = serde_json::json!({ "error": msg });
-                                if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
-                                    tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                }
-                                messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
-                                continue;
-                            }
-                            Ok(middleware::ToolCallDecision::AbortLoop(reason)) => {
-                                tracing::warn!("[AgentLoop] Loop aborted by middleware: {}", reason);
-                                if let Err(e) = tx.send(LoopEvent::Error(reason)).await {
-                                    tracing::warn!("[AgentLoop] Failed to send Error event: {}", e);
-                                }
-                                break 'outer;
-                            }
-                            Ok(middleware::ToolCallDecision::ReplaceInput(new_input)) => {
-                                // Execute with replaced input (same path_validator logic below)
-                                let pv = path_validator.clone().unwrap_or_else(|| {
-                                    let home = std::env::var("USERPROFILE")
-                                        .or_else(|_| std::env::var("HOME"))
-                                        .unwrap_or_else(|_| ".".to_string());
-                                    PathValidator::new().with_workspace(std::path::PathBuf::from(&home))
-                                });
-                                let working_dir = pv.workspace_root()
-                                    .map(|p| p.to_string_lossy().to_string());
-                                let tool_context = ToolContext {
-                                    agent_id: agent_id.clone(),
-                                    working_directory: working_dir,
-                                    session_id: Some(session_id_clone.to_string()),
-                                    skill_executor: skill_executor.clone(),
-                                    hand_executor: hand_executor.clone(),
-                                    path_validator: Some(pv),
-                                    event_sender: Some(tx.clone()),
-                                };
-                                let (result, is_error) = if let Some(tool) = tools.get(&name) {
-                                    match tool.execute(new_input, &tool_context).await {
-                                        Ok(output) => {
-                                            if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: output.clone() }).await {
-                                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                            }
-                                            (output, false)
-                                        }
-                                        Err(e) => {
-                                            let error_output = serde_json::json!({ "error": e.to_string() });
-                                            if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
-                                                tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                            }
-                                            (error_output, true)
-                                        }
-                                    }
-                                } else {
-                                    let error_output = serde_json::json!({ "error": format!("Unknown tool: {}", name) });
-                                    if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
-                                        tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                    }
-                                    (error_output, true)
-                                };
-                                messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), result, is_error));
-                                continue;
-                            }
-                            Err(e) => {
-                                tracing::error!("[AgentLoop] Middleware error for tool '{}': {}", name, e);
-                                let error_output = serde_json::json!({ "error": e.to_string() });
-                                if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
-                                    tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                }
-                                messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), error_output, true));
-                                continue;
-                            }
+                        if let Err(e) = middleware_chain.run_after_tool_call(&mut mw_ctx, &name, &result).await {
+                            tracing::warn!("[AgentLoop] after_tool_call middleware failed for '{}': {}", name, e);
                        }
                    }
-                    // Use pre-resolved path_validator (already has default fallback from create_tool_context logic)
-                    let pv = path_validator.clone().unwrap_or_else(|| {
-                        let home = std::env::var("USERPROFILE")
-                            .or_else(|_| std::env::var("HOME"))
-                            .unwrap_or_else(|_| ".".to_string());
-                        PathValidator::new().with_workspace(std::path::PathBuf::from(&home))
-                    });
-                    let working_dir = pv.workspace_root()
-                        .map(|p| p.to_string_lossy().to_string());
-                    let tool_context = ToolContext {
-                        agent_id: agent_id.clone(),
-                        working_directory: working_dir,
-                        session_id: Some(session_id_clone.to_string()),
-                        skill_executor: skill_executor.clone(),
-                        hand_executor: hand_executor.clone(),
-                        path_validator: Some(pv),
-                        event_sender: Some(tx.clone()),
-                    };

-                    let (result, is_error) = if let Some(tool) = tools.get(&name) {
-                        tracing::debug!("[AgentLoop] Tool '{}' found, executing...", name);
-                        match tool.execute(input.clone(), &tool_context).await {
-                            Ok(output) => {
-                                tracing::debug!("[AgentLoop] Tool '{}' executed successfully: {:?}", name, output);
-                                if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: output.clone() }).await {
-                                    tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                }
-                                (output, false)
-                            }
-                            Err(e) => {
-                                tracing::error!("[AgentLoop] Tool '{}' execution failed: {}", name, e);
-                                let error_output = serde_json::json!({ "error": e.to_string() });
-                                if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
-                                    tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                                }
-                                (error_output, true)
-                            }
-                        }
-                    } else {
-                        tracing::error!("[AgentLoop] Tool '{}' not found in registry", name);
-                        let error_output = serde_json::json!({ "error": format!("Unknown tool: {}", name) });
-                        if let Err(e) = tx.send(LoopEvent::ToolEnd { name: name.clone(), output: error_output.clone() }).await {
-                            tracing::warn!("[AgentLoop] Failed to send ToolEnd event: {}", e);
-                        }
-                        (error_output, true)
-                    };
-
-                    // Check if this is a clarification response — break outer loop
-                    if name == "ask_clarification"
-                        && result.get("status").and_then(|v| v.as_str()) == Some("clarification_needed")
-                    {
-                        tracing::info!("[AgentLoop] Streaming: Clarification requested, terminating loop");
-                        let question = result.get("question")
-                            .and_then(|v| v.as_str())
-                            .unwrap_or("需要更多信息")
-                            .to_string();
-                        messages.push(Message::tool_result(
-                            id,
-                            zclaw_types::ToolId::new(&name),
-                            result,
-                            is_error,
-                        ));
-                        // Send the question as final delta so the user sees it
-                        if let Err(e) = tx.send(LoopEvent::Delta(question.clone())).await {
-                            tracing::warn!("[AgentLoop] Failed to send Delta event: {}", e);
-                        }
-                        if let Err(e) = tx.send(LoopEvent::Complete(AgentLoopResult {
-                            response: question.clone(),
-                            input_tokens: total_input_tokens,
-                            output_tokens: total_output_tokens,
-                            iterations: iteration,
-                        })).await {
-                            tracing::warn!("[AgentLoop] Failed to send Complete event: {}", e);
-                        }
-                        if let Err(e) = memory.append_message(&session_id_clone, &Message::assistant(&question)).await {
-                            tracing::warn!("[AgentLoop] Failed to save clarification message: {}", e);
-                        }
-                        break 'outer;
-                    }
-
-                    // Add tool result to message history
-                    tracing::debug!("[AgentLoop] Adding tool_result to history: id={}, name={}, is_error={}", id, name, is_error);
-                    messages.push(Message::tool_result(
-                        id,
-                        zclaw_types::ToolId::new(&name),
-                        result,
-                        is_error,
-                    ));
+                    messages.push(Message::tool_result(id, zclaw_types::ToolId::new(&name), result, is_error));
                }

                tracing::debug!("[AgentLoop] Continuing to next iteration for LLM to process tool results");
+                // If stream errored, we executed complete tools but cannot continue the LLM loop
+                if stream_errored {
+                    tracing::info!("[AgentLoop] Stream was errored — executed salvageable tools, now breaking");
+                    break 'outer;
+                }
                // Continue loop - next iteration will call LLM with tool results
            }
        });
--- a/crates/zclaw-runtime/src/middleware.rs
+++ b/crates/zclaw-runtime/src/middleware.rs
@@ -12,6 +12,13 @@
 //! | 200-399 | Capability     | SkillIndex, Guardrail       |
 //! | 400-599 | Safety         | LoopGuard, Guardrail        |
 //! | 600-799 | Telemetry      | TokenCalibration, Tracking  |
+//!
+//! # Wave parallelization
+//!
+//! `before_completion` middlewares that only modify `system_prompt` (not `messages`)
+//! can declare `parallel_safe() == true`. The chain runs consecutive parallel-safe
+//! middlewares concurrently, merging their prompt contributions. This reduces
+//! sequential latency for the context-injection phase.

 use std::sync::Arc;
 use async_trait::async_trait;
@@ -50,6 +57,7 @@ pub enum ToolCallDecision {
 // ---------------------------------------------------------------------------

 /// Carries the mutable state that middleware may inspect or modify.
+#[derive(Clone)]
 pub struct MiddlewareContext {
    /// The agent that owns this loop.
    pub agent_id: AgentId,
@@ -101,6 +109,15 @@ pub trait AgentMiddleware: Send + Sync {
        500
    }

+    /// Whether `before_completion` is safe to run concurrently with other
+    /// parallel-safe middlewares. Only return `true` if the middleware:
+    /// - Only modifies `ctx.system_prompt` (never `ctx.messages`)
+    /// - Does not depend on prompt modifications from other middlewares
+    /// - Does not return `MiddlewareDecision::Stop`
+    fn parallel_safe(&self) -> bool {
+        false
+    }
+
    /// Hook executed **before** the LLM completion request is sent.
    ///
    /// Use this to inject context (memory, skill index, etc.) or to
@@ -163,15 +180,74 @@ impl MiddlewareChain {
        self.middlewares.insert(pos, mw);
    }

-    /// Run all `before_completion` hooks in order.
+    /// Run all `before_completion` hooks with wave-based parallelization.
+    ///
+    /// Consecutive `parallel_safe` middlewares run concurrently — each gets
+    /// its own cloned context and appends to `system_prompt` independently.
+    /// Their contributions are merged after all complete. Non-parallel-safe
+    /// middlewares (and non-consecutive ones) run sequentially as before.
    pub async fn run_before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
-        for mw in &self.middlewares {
-            match mw.before_completion(ctx).await? {
-                MiddlewareDecision::Continue => {}
-                MiddlewareDecision::Stop(reason) => {
-                    tracing::info!("[MiddlewareChain] '{}' requested stop: {}", mw.name(), reason);
-                    return Ok(MiddlewareDecision::Stop(reason));
+        let mut idx = 0;
+        while idx < self.middlewares.len() {
+            // Find the extent of consecutive parallel-safe middlewares
+            let wave_start = idx;
+            let mut wave_end = idx;
+            while wave_end < self.middlewares.len()
+                && self.middlewares[wave_end].parallel_safe()
+            {
+                wave_end += 1;
+            }
+
+            if wave_end - wave_start >= 2 {
+                // Run parallel wave (2+ consecutive parallel-safe middlewares)
+                let base_prompt_len = ctx.system_prompt.len();
+                let wave = &self.middlewares[wave_start..wave_end];
+
+                // Spawn concurrent tasks — each owns its cloned context + Arc ref to middleware
+                let mut join_handles = Vec::with_capacity(wave.len());
+                for mw in wave.iter() {
+                    let mut ctx_clone = ctx.clone();
+                    let mw_arc = Arc::clone(mw);
+                    join_handles.push(tokio::spawn(async move {
+                        let result = mw_arc.before_completion(&mut ctx_clone).await;
+                        (result, ctx_clone.system_prompt)
+                    }));
                }
+
+                // Await all and merge prompt contributions
+                for (i, handle) in join_handles.into_iter().enumerate() {
+                    let (result, modified_prompt): (Result<MiddlewareDecision>, String) = handle.await
+                        .map_err(|e| zclaw_types::ZclawError::Internal(format!("Parallel middleware panicked: {}", e)))?;
+                    match result? {
+                        MiddlewareDecision::Continue => {}
+                        MiddlewareDecision::Stop(reason) => {
+                            tracing::info!(
+                                "[MiddlewareChain] '{}' requested stop: {}",
+                                self.middlewares[wave_start + i].name(),
+                                reason
+                            );
+                            return Ok(MiddlewareDecision::Stop(reason));
+                        }
+                    }
+                    // Merge system_prompt contribution from this clone
+                    if modified_prompt.len() > base_prompt_len {
+                        let contribution = &modified_prompt[base_prompt_len..];
+                        ctx.system_prompt.push_str(contribution);
+                    }
+                }
+
+                idx = wave_end;
+            } else {
+                // Run single middleware sequentially
+                let mw = &self.middlewares[idx];
+                match mw.before_completion(ctx).await? {
+                    MiddlewareDecision::Continue => {}
+                    MiddlewareDecision::Stop(reason) => {
+                        tracing::info!("[MiddlewareChain] '{}' requested stop: {}", mw.name(), reason);
+                        return Ok(MiddlewareDecision::Stop(reason));
+                    }
+                }
+                idx += 1;
            }
        }
        Ok(MiddlewareDecision::Continue)
--- a/crates/zclaw-runtime/src/middleware/butler_router.rs
+++ b/crates/zclaw-runtime/src/middleware/butler_router.rs
@@ -290,6 +290,8 @@ impl AgentMiddleware for ButlerRouterMiddleware {
        80
    }

+    fn parallel_safe(&self) -> bool { true }
+
    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
        // Only route on the first user message in a turn (not tool results)
        let user_input = &ctx.user_input;
--- a/crates/zclaw-runtime/src/middleware/compaction.rs
+++ b/crates/zclaw-runtime/src/middleware/compaction.rs
@@ -1,21 +1,49 @@
 //! Compaction middleware — wraps the existing compaction module.
+//!
+//! Supports debounce (cooldown + min-round checks), async LLM compression
+//! with cached fallback, and iterative summaries that carry forward key info.

 use async_trait::async_trait;
-use zclaw_types::Result;
-use crate::middleware::{AgentMiddleware, MiddlewareContext, MiddlewareDecision};
-use crate::compaction::{self, CompactionConfig};
-use crate::growth::GrowthIntegration;
-use crate::driver::LlmDriver;
+use std::sync::atomic::{AtomicU64, Ordering};
 use std::sync::Arc;
+use tokio::sync::RwLock;
+use zclaw_types::{Message, Result};
+use crate::compaction::{self, CompactionConfig};
+use crate::driver::LlmDriver;
+use crate::growth::GrowthIntegration;
+use crate::middleware::{AgentMiddleware, MiddlewareContext, MiddlewareDecision};
+
+/// Minimum seconds between consecutive compactions.
+const COMPACTION_COOLDOWN_SECS: u64 = 30;
+/// Minimum message pairs (user+assistant) since last compaction before triggering again.
+const COMPACTION_MIN_ROUNDS: u64 = 3;
+
+fn now_millis() -> u64 {
+    std::time::SystemTime::now()
+        .duration_since(std::time::UNIX_EPOCH)
+        .unwrap_or_default()
+        .as_millis() as u64
+}
+
+/// Shared compaction debounce state (lock-free).
+struct CompactionState {
+    last_compaction_ms: AtomicU64,
+    last_compaction_msg_count: AtomicU64,
+}
+
+/// Cached result from a previous async LLM compaction.
+struct AsyncCompactionCache {
+    last_result: RwLock<Option<Vec<Message>>>,
+}

 /// Middleware that compresses conversation history when it exceeds a token threshold.
 pub struct CompactionMiddleware {
    threshold: usize,
    config: CompactionConfig,
-    /// Optional LLM driver for async compaction (LLM summarisation, memory flush).
    driver: Option<Arc<dyn LlmDriver>>,
-    /// Optional growth integration for memory flushing during compaction.
    growth: Option<GrowthIntegration>,
+    state: Arc<CompactionState>,
+    cache: Arc<AsyncCompactionCache>,
 }

 impl CompactionMiddleware {
@@ -25,7 +53,39 @@ impl CompactionMiddleware {
        driver: Option<Arc<dyn LlmDriver>>,
        growth: Option<GrowthIntegration>,
    ) -> Self {
-        Self { threshold, config, driver, growth }
+        Self {
+            threshold,
+            config,
+            driver,
+            growth,
+            state: Arc::new(CompactionState {
+                last_compaction_ms: AtomicU64::new(0),
+                last_compaction_msg_count: AtomicU64::new(0),
+            }),
+            cache: Arc::new(AsyncCompactionCache {
+                last_result: RwLock::new(None),
+            }),
+        }
+    }
+
+    fn should_compact(&self, msg_count: u64) -> bool {
+        let last_ms = self.state.last_compaction_ms.load(Ordering::Relaxed);
+        let last_count = self.state.last_compaction_msg_count.load(Ordering::Relaxed);
+
+        if now_millis().saturating_sub(last_ms) < COMPACTION_COOLDOWN_SECS * 1000 {
+            return false;
+        }
+
+        if msg_count.saturating_sub(last_count) < COMPACTION_MIN_ROUNDS * 2 {
+            return false;
+        }
+
+        true
+    }
+
+    fn record_compaction(&self, msg_count: u64) {
+        self.state.last_compaction_ms.store(now_millis(), Ordering::Relaxed);
+        self.state.last_compaction_msg_count.store(msg_count, Ordering::Relaxed);
    }
 }

@@ -39,6 +99,29 @@ impl AgentMiddleware for CompactionMiddleware {
            return Ok(MiddlewareDecision::Continue);
        }

+        // Step 1: Prune old tool outputs (cheap, no LLM needed)
+        let pruned = compaction::prune_tool_outputs(&mut ctx.messages);
+        if pruned > 0 {
+            tracing::info!("[CompactionMiddleware] Pruned {} old tool outputs", pruned);
+        }
+
+        // Step 2: Re-estimate tokens after pruning
+        let tokens = compaction::estimate_messages_tokens_calibrated(&ctx.messages);
+        if tokens < self.threshold {
+            return Ok(MiddlewareDecision::Continue);
+        }
+
+        // Step 3: Debounce check
+        if !self.should_compact(ctx.messages.len() as u64) {
+            // Still over threshold but within cooldown — use cached result if available
+            if let Some(cached) = self.cache.last_result.read().await.clone() {
+                tracing::debug!("[CompactionMiddleware] Cooldown active, using cached compaction result");
+                ctx.messages = cached;
+            }
+            return Ok(MiddlewareDecision::Continue);
+        }
+
+        // Step 4: Execute compaction
        let needs_async = self.config.use_llm || self.config.memory_flush_enabled;
        if needs_async {
            let outcome = compaction::maybe_compact_with_config(
@@ -56,6 +139,14 @@ impl AgentMiddleware for CompactionMiddleware {
            ctx.messages = compaction::maybe_compact(ctx.messages.clone(), self.threshold);
        }

+        self.record_compaction(ctx.messages.len() as u64);
+
+        // Cache result for cooldown fallback
+        {
+            let mut cache = self.cache.last_result.write().await;
+            *cache = Some(ctx.messages.clone());
+        }
+
        Ok(MiddlewareDecision::Continue)
    }
 }
--- a/crates/zclaw-runtime/src/middleware/evolution.rs
+++ b/crates/zclaw-runtime/src/middleware/evolution.rs
@@ -88,6 +88,8 @@ impl AgentMiddleware for EvolutionMiddleware {
        78 // 在 ButlerRouter(80) 之前
    }

+    fn parallel_safe(&self) -> bool { true }
+
    async fn before_completion(
        &self,
        ctx: &mut MiddlewareContext,
--- a/crates/zclaw-runtime/src/middleware/memory.rs
+++ b/crates/zclaw-runtime/src/middleware/memory.rs
@@ -111,6 +111,7 @@ impl MemoryMiddleware {
 impl AgentMiddleware for MemoryMiddleware {
    fn name(&self) -> &str { "memory" }
    fn priority(&self) -> i32 { 150 }
+    fn parallel_safe(&self) -> bool { true }

    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
        tracing::debug!(
--- a/crates/zclaw-runtime/src/middleware/skill_index.rs
+++ b/crates/zclaw-runtime/src/middleware/skill_index.rs
@@ -40,6 +40,7 @@ impl SkillIndexMiddleware {
 impl AgentMiddleware for SkillIndexMiddleware {
    fn name(&self) -> &str { "skill_index" }
    fn priority(&self) -> i32 { 200 }
+    fn parallel_safe(&self) -> bool { true }

    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision> {
        if self.entries.is_empty() {
--- a/crates/zclaw-runtime/src/middleware/title.rs
+++ b/crates/zclaw-runtime/src/middleware/title.rs
@@ -41,6 +41,7 @@ impl Default for TitleMiddleware {
 impl AgentMiddleware for TitleMiddleware {
    fn name(&self) -> &str { "title" }
    fn priority(&self) -> i32 { 180 }
+    fn parallel_safe(&self) -> bool { true }

    // All hooks default to Continue — placeholder until LLM driver is wired in.
    async fn before_completion(&self, _ctx: &mut crate::middleware::MiddlewareContext) -> zclaw_types::Result<MiddlewareDecision> {
--- a/crates/zclaw-runtime/src/middleware/tool_error.rs
+++ b/crates/zclaw-runtime/src/middleware/tool_error.rs
@@ -13,6 +13,7 @@ use serde_json::Value;
 use zclaw_types::Result;
 use crate::driver::ContentBlock;
 use crate::middleware::{AgentMiddleware, MiddlewareContext, ToolCallDecision};
+use std::collections::HashMap;
 use std::sync::Mutex;

 /// Middleware that intercepts tool call errors and formats recovery messages.
@@ -23,8 +24,8 @@ pub struct ToolErrorMiddleware {
    max_error_length: usize,
    /// Maximum consecutive failures before aborting the loop.
    max_consecutive_failures: u32,
-    /// Tracks consecutive tool failures.
-    consecutive_failures: Mutex<u32>,
+    /// Tracks consecutive tool failures per session.
+    session_failures: Mutex<HashMap<String, u32>>,
 }

 impl ToolErrorMiddleware {
@@ -32,7 +33,7 @@ impl ToolErrorMiddleware {
        Self {
            max_error_length: 500,
            max_consecutive_failures: 3,
-            consecutive_failures: Mutex::new(0),
+            session_failures: Mutex::new(HashMap::new()),
        }
    }

@@ -66,7 +67,7 @@ impl AgentMiddleware for ToolErrorMiddleware {

    async fn before_tool_call(
        &self,
-        _ctx: &MiddlewareContext,
+        ctx: &MiddlewareContext,
        tool_name: &str,
        tool_input: &Value,
    ) -> Result<ToolCallDecision> {
@@ -79,15 +80,17 @@ impl AgentMiddleware for ToolErrorMiddleware {
            return Ok(ToolCallDecision::ReplaceInput(serde_json::json!({})));
        }

-        // Check consecutive failure count — abort if too many failures
-        let failures = self.consecutive_failures.lock().unwrap_or_else(|e| e.into_inner());
-        if *failures >= self.max_consecutive_failures {
+        // Check consecutive failure count — abort if too many failures (per session)
+        let failures = self.session_failures.lock()
+            .map(|m| m.get(&ctx.session_id.to_string()).copied().unwrap_or(0))
+            .unwrap_or(0);
+        if failures >= self.max_consecutive_failures {
            tracing::warn!(
                "[ToolErrorMiddleware] Aborting loop: {} consecutive tool failures",
-                *failures
+                failures
            );
            return Ok(ToolCallDecision::AbortLoop(
-                format!("连续 {} 次工具调用失败，已自动终止以避免无限重试", *failures)
+                format!("连续 {} 次工具调用失败，已自动终止以避免无限重试", failures)
            ));
        }

@@ -100,11 +103,16 @@ impl AgentMiddleware for ToolErrorMiddleware {
        tool_name: &str,
        result: &Value,
    ) -> Result<()> {
-        let mut failures = self.consecutive_failures.lock().unwrap_or_else(|e| e.into_inner());
-
        // Check if the tool result indicates an error.
        if let Some(error) = result.get("error") {
-            *failures += 1;
+            let session_key = ctx.session_id.to_string();
+            let failures = self.session_failures.lock()
+                .map(|mut m| {
+                    let count = m.entry(session_key.clone()).or_insert(0);
+                    *count += 1;
+                    *count
+                })
+                .unwrap_or(1);
            let error_msg = match error {
                Value::String(s) => s.clone(),
                other => other.to_string(),
@@ -118,7 +126,7 @@ impl AgentMiddleware for ToolErrorMiddleware {

            tracing::warn!(
                "[ToolErrorMiddleware] Tool '{}' failed ({}/{} consecutive): {}",
-                tool_name, *failures, self.max_consecutive_failures, truncated
+                tool_name, failures, self.max_consecutive_failures, truncated
            );

            let guided_message = self.format_tool_error(tool_name, &truncated);
@@ -126,8 +134,11 @@ impl AgentMiddleware for ToolErrorMiddleware {
                text: guided_message,
            });
        } else {
-            // Success — reset consecutive failure counter
-            *failures = 0;
+            // Success — reset consecutive failure counter for this session
+            let session_key = ctx.session_id.to_string();
+            if let Ok(mut m) = self.session_failures.lock() {
+                m.insert(session_key, 0);
+            }
        }

        Ok(())
--- a/crates/zclaw-runtime/src/middleware/tool_output_guard.rs
+++ b/crates/zclaw-runtime/src/middleware/tool_output_guard.rs
@@ -21,35 +21,27 @@ use crate::middleware::{AgentMiddleware, MiddlewareContext, ToolCallDecision};
 /// Maximum safe output length in characters.
 const MAX_OUTPUT_LENGTH: usize = 50_000;

-/// Patterns that indicate sensitive information in tool output.
-const SENSITIVE_PATTERNS: &[&str] = &[
-    "api_key",
-    "apikey",
-    "api-key",
-    "secret_key",
-    "secretkey",
-    "access_token",
-    "auth_token",
-    "password",
-    "private_key",
-    "-----BEGIN RSA",
-    "-----BEGIN PRIVATE",
-    "sk-",           // OpenAI API keys
-    "sk_live_",      // Stripe keys
-    "AKIA",          // AWS access keys
+/// Regex patterns that match actual secret values (not just keywords).
+/// These detect the *value format* of secrets, avoiding false positives
+/// from legitimate content that merely mentions "password" or "api_key".
+const SECRET_VALUE_PATTERNS: &[&str] = &[
+    r#"sk-[a-zA-Z0-9]{20,}"#,              // OpenAI API keys (sk-xxx, 20+ chars)
+    r#"sk_live_[a-zA-Z0-9]{20,}"#,          // Stripe live keys
+    r#"sk_test_[a-zA-Z0-9]{20,}"#,          // Stripe test keys
+    r#"AKIA[A-Z0-9]{16}"#,                   // AWS access keys (exact 20 chars)
+    r#"-----BEGIN (RSA |EC )?PRIVATE KEY-----"#,  // PEM private keys
+    r#"(?:api_?key|secret_?key|access_?token|auth_?token|password)\s*[:=]\s*["'][^"']{8,}["']"#,  // key=value with actual secret
 ];

-/// Patterns that may indicate prompt injection in tool output.
+/// Keyword patterns that indicate prompt injection in tool output.
+/// These are specific enough to avoid false positives from normal content.
 const INJECTION_PATTERNS: &[&str] = &[
    "ignore previous instructions",
    "ignore all previous",
    "disregard your instructions",
-    "you are now",
    "new instructions:",
-    "system:",
    "[INST]",
    "</scratchpad>",
-    "think step by step about",
 ];

 /// Tool output sanitization middleware.
@@ -105,22 +97,24 @@ impl AgentMiddleware for ToolOutputGuardMiddleware {
            );
        }

-        // Rule 2: Sensitive information detection — block output containing secrets (P2-22)
-        let output_lower = output_str.to_lowercase();
-        for pattern in SENSITIVE_PATTERNS {
-            if output_lower.contains(pattern) {
-                tracing::error!(
-                    "[ToolOutputGuard] BLOCKED tool '{}' output: sensitive pattern '{}'",
-                    tool_name, pattern
-                );
-                return Err(zclaw_types::ZclawError::Internal(format!(
-                    "[ToolOutputGuard] Tool '{}' output blocked: sensitive information detected ('{}')",
-                    tool_name, pattern
-                )));
+        // Rule 2: Sensitive information detection — match actual secret values, not keywords
+        for pattern in SECRET_VALUE_PATTERNS {
+            if let Ok(re) = regex::Regex::new(pattern) {
+                if re.is_match(&output_str) {
+                    tracing::error!(
+                        "[ToolOutputGuard] BLOCKED tool '{}' output: secret value matched pattern '{}'",
+                        tool_name, pattern
+                    );
+                    return Err(zclaw_types::ZclawError::Internal(format!(
+                        "[ToolOutputGuard] Tool '{}' output blocked: sensitive information detected",
+                        tool_name
+                    )));
+                }
            }
        }

-        // Rule 3: Injection marker detection — BLOCK the output (P2-22 fix)
+        // Rule 3: Injection marker detection — specific phrase matching
+        let output_lower = output_str.to_lowercase();
        for pattern in INJECTION_PATTERNS {
            if output_lower.contains(pattern) {
                tracing::error!(
--- a/crates/zclaw-runtime/src/stream.rs
+++ b/crates/zclaw-runtime/src/stream.rs
@@ -24,6 +24,10 @@ pub enum StreamChunk {
        input_tokens: u32,
        output_tokens: u32,
        stop_reason: String,
+        #[serde(default)]
+        cache_creation_input_tokens: Option<u32>,
+        #[serde(default)]
+        cache_read_input_tokens: Option<u32>,
    },
    /// Error occurred
    Error { message: String },
--- a/crates/zclaw-runtime/src/test_util.rs
+++ b/crates/zclaw-runtime/src/test_util.rs
@@ -55,6 +55,8 @@ impl MockLlmDriver {
            input_tokens: 10,
            output_tokens: text.len() as u32 / 4,
            stop_reason: StopReason::EndTurn,
+            cache_creation_input_tokens: None,
+            cache_read_input_tokens: None,
        });
        self
    }
@@ -74,6 +76,8 @@ impl MockLlmDriver {
            input_tokens: 10,
            output_tokens: 20,
            stop_reason: StopReason::ToolUse,
+            cache_creation_input_tokens: None,
+            cache_read_input_tokens: None,
        });
        self
    }
@@ -86,6 +90,8 @@ impl MockLlmDriver {
            input_tokens: 0,
            output_tokens: 0,
            stop_reason: StopReason::Error,
+            cache_creation_input_tokens: None,
+            cache_read_input_tokens: None,
        });
        self
    }
@@ -142,6 +148,8 @@ impl MockLlmDriver {
                input_tokens: 0,
                output_tokens: 0,
                stop_reason: StopReason::EndTurn,
+                cache_creation_input_tokens: None,
+                cache_read_input_tokens: None,
            })
    }
 }
@@ -190,6 +198,8 @@ impl LlmDriver for MockLlmDriver {
                        input_tokens: 10,
                        output_tokens: 2,
                        stop_reason: "end_turn".to_string(),
+                        cache_creation_input_tokens: None,
+                        cache_read_input_tokens: None,
                    },
                ]
            })
--- a/crates/zclaw-runtime/src/tool.rs
+++ b/crates/zclaw-runtime/src/tool.rs
@@ -11,6 +11,17 @@ use crate::driver::ToolDefinition;
 use crate::loop_runner::LoopEvent;
 use crate::tool::builtin::PathValidator;

+/// Tool concurrency safety level
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum ToolConcurrency {
+    /// Read-only operations, always safe to parallelize (file_read, web_fetch, etc.)
+    ReadOnly,
+    /// Exclusive operations, must be serial (file_write, shell_exec, etc.)
+    Exclusive,
+    /// Interactive operations, never parallelize (ask_clarification, etc.)
+    Interactive,
+}
+
 /// Tool trait for implementing agent tools
 #[async_trait]
 pub trait Tool: Send + Sync {
@@ -25,6 +36,11 @@ pub trait Tool: Send + Sync {

    /// Execute the tool
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value>;
+
+    /// Tool concurrency safety level. Default: ReadOnly.
+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::ReadOnly
+    }
 }

 /// Skill executor trait for runtime skill execution
--- a/crates/zclaw-runtime/src/tool/builtin/ask_clarification.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/ask_clarification.rs
@@ -9,7 +9,7 @@ use async_trait::async_trait;
 use serde_json::{json, Value};
 use zclaw_types::{Result, ZclawError};

-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};

 /// Clarification type — categorizes the reason for asking.
 #[derive(Debug, Clone, PartialEq)]
@@ -96,6 +96,10 @@ impl Tool for AskClarificationTool {
        })
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Interactive
+    }
+
    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<Value> {
        let question = input["question"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'question' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/execute_skill.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/execute_skill.rs
@@ -4,7 +4,7 @@ use async_trait::async_trait;
 use serde_json::{json, Value};
 use zclaw_types::{Result, ZclawError};

-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};

 pub struct ExecuteSkillTool;

@@ -42,6 +42,10 @@ impl Tool for ExecuteSkillTool {
        })
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Exclusive
+    }
+
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        let skill_id = input["skill_id"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'skill_id' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/file_write.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/file_write.rs
@@ -6,7 +6,7 @@ use zclaw_types::{Result, ZclawError};
 use std::fs;
 use std::io::Write;

-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};
 use super::path_validator::PathValidator;

 pub struct FileWriteTool;
@@ -55,6 +55,10 @@ impl Tool for FileWriteTool {
        })
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Exclusive
+    }
+
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        let path = input["path"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'path' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/mcp_tool.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/mcp_tool.rs
@@ -8,7 +8,7 @@ use serde_json::Value;
 use std::sync::Arc;
 use zclaw_types::Result;

-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};

 /// Wraps an MCP tool adapter into the `Tool` trait.
 ///
@@ -42,6 +42,10 @@ impl Tool for McpToolWrapper {
        self.adapter.input_schema().clone()
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Exclusive
+    }
+
    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<Value> {
        self.adapter.execute(input).await
    }
--- a/crates/zclaw-runtime/src/tool/builtin/path_validator.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/path_validator.rs
@@ -97,6 +97,17 @@ fn default_blocked_paths() -> Vec<PathBuf> {
    ]
 }

+/// Normalize Windows UNC path prefix for consistent comparison.
+/// `\\?\C:\Users\...` → `C:\Users\...`
+fn normalize_windows_path(path: &Path) -> std::borrow::Cow<'_, Path> {
+    let s = path.to_string_lossy();
+    if s.starts_with(r"\\?\") {
+        std::borrow::Cow::Owned(PathBuf::from(&s[4..]))
+    } else {
+        std::borrow::Cow::Borrowed(path)
+    }
+}
+
 /// Expand tilde in path to home directory
 fn expand_tilde(path: &str) -> PathBuf {
    if path.starts_with('~') {
@@ -154,9 +165,16 @@ impl PathValidator {
        }
    }

-    /// Set the workspace root directory
+    /// Set the workspace root directory.
+    /// Canonicalizes the path to ensure consistent comparison on Windows
+    /// (where canonicalize() returns `\\?\C:\...` UNC paths).
    pub fn with_workspace(mut self, workspace: PathBuf) -> Self {
-        self.workspace_root = Some(workspace);
+        let canonical = if workspace.exists() {
+            workspace.canonicalize().unwrap_or(workspace)
+        } else {
+            workspace
+        };
+        self.workspace_root = Some(canonical);
        self
    }

@@ -230,7 +248,14 @@ impl PathValidator {
    fn resolve_and_validate(&self, path: &str) -> Result<PathBuf> {
        // Expand tilde
        let expanded = expand_tilde(path);
-        let path_buf = PathBuf::from(&expanded);
+        let mut path_buf = PathBuf::from(&expanded);
+
+        // If relative path and workspace is configured, resolve against workspace
+        if path_buf.is_relative() {
+            if let Some(ref workspace) = self.workspace_root {
+                path_buf = workspace.join(&path_buf);
+            }
+        }

        // Check for path traversal
        self.check_path_traversal(&path_buf)?;
@@ -280,10 +305,14 @@ impl PathValidator {
        Ok(())
    }

-    /// Check if path is in blocked list
+    /// Check if path is in blocked list.
+    /// Normalizes Windows UNC prefix (`\\?\`) for consistent comparison.
    fn check_blocked(&self, path: &Path) -> Result<()> {
+        // Strip Windows UNC prefix for consistent matching
+        let normalized = normalize_windows_path(path);
        for blocked in &self.config.blocked_paths {
-            if path.starts_with(blocked) || path == blocked {
+            let blocked_norm = normalize_windows_path(blocked);
+            if normalized.starts_with(&*blocked_norm) || normalized == blocked_norm {
                return Err(ZclawError::InvalidInput(format!(
                    "Access to this path is blocked: {}",
                    path.display()
@@ -303,11 +332,15 @@ impl PathValidator {
    /// - This prevents accidental exposure of the entire filesystem
    ///   when the validator is misconfigured or used without setup
    fn check_allowed(&self, path: &Path) -> Result<()> {
+        let path_norm = normalize_windows_path(path);
+
        // If no allowed paths specified, check workspace
        if self.config.allowed_paths.is_empty() {
            if let Some(ref workspace) = self.workspace_root {
                // Workspace is configured - validate path is within it
-                if !path.starts_with(workspace) {
+                // Both sides are canonicalized (workspace via with_workspace, path via resolve_and_validate)
+                let ws_norm = normalize_windows_path(workspace);
+                if !path_norm.starts_with(&*ws_norm) {
                    return Err(ZclawError::InvalidInput(format!(
                        "Path outside workspace: {} (workspace: {})",
                        path.display(),
@@ -329,7 +362,8 @@ impl PathValidator {

        // Check against allowed paths
        for allowed in &self.config.allowed_paths {
-            if path.starts_with(allowed) {
+            let allowed_norm = normalize_windows_path(allowed);
+            if path_norm.starts_with(&*allowed_norm) {
                return Ok(());
            }
        }
--- a/crates/zclaw-runtime/src/tool/builtin/shell_exec.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/shell_exec.rs
@@ -8,7 +8,7 @@ use std::process::{Command, Stdio};
 use std::time::{Duration, Instant};
 use zclaw_types::{Result, ZclawError};

-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};

 /// Parse a command string into program and arguments using proper shell quoting
 fn parse_command(command: &str) -> Result<(String, Vec<String>)> {
@@ -175,6 +175,10 @@ impl Tool for ShellExecTool {
        })
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Exclusive
+    }
+
    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<Value> {
        let command = input["command"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'command' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/builtin/task.rs
+++ b/crates/zclaw-runtime/src/tool/builtin/task.rs
@@ -11,7 +11,7 @@ use zclaw_memory::MemoryStore;

 use crate::driver::LlmDriver;
 use crate::loop_runner::{AgentLoop, LoopEvent};
-use crate::tool::{Tool, ToolContext, ToolRegistry};
+use crate::tool::{Tool, ToolContext, ToolRegistry, ToolConcurrency};
 use crate::tool::builtin::register_builtin_tools;
 use std::sync::Arc;

@@ -91,6 +91,10 @@ impl Tool for TaskTool {
        })
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Exclusive
+    }
+
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        let description = input["description"].as_str()
            .ok_or_else(|| ZclawError::InvalidInput("Missing 'description' parameter".into()))?;
--- a/crates/zclaw-runtime/src/tool/hand_tool.rs
+++ b/crates/zclaw-runtime/src/tool/hand_tool.rs
@@ -7,7 +7,7 @@ use async_trait::async_trait;
 use serde_json::{json, Value};
 use zclaw_types::Result;

-use crate::tool::{Tool, ToolContext};
+use crate::tool::{Tool, ToolContext, ToolConcurrency};

 /// Wrapper that exposes a Hand as a Tool in the agent's tool registry.
 ///
@@ -78,6 +78,10 @@ impl Tool for HandTool {
        self.input_schema.clone()
    }

+    fn concurrency(&self) -> ToolConcurrency {
+        ToolConcurrency::Exclusive
+    }
+
    async fn execute(&self, input: Value, context: &ToolContext) -> Result<Value> {
        // Delegate to the HandExecutor (bridged from HandRegistry via kernel).
        // If no hand_executor is available (e.g., standalone runtime without kernel),
--- a/crates/zclaw-types/src/error.rs
+++ b/crates/zclaw-types/src/error.rs
@@ -223,6 +223,33 @@ impl Serialize for ZclawError {
 /// Result type alias for ZCLAW operations
 pub type Result<T> = std::result::Result<T, ZclawError>;

+/// LLM 调用错误的细粒度分类，指导重试和恢复策略
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum LlmErrorKind {
+    Auth,
+    AuthPermanent,
+    BillingExhausted,
+    RateLimited,
+    Overloaded,
+    ServerError,
+    Timeout,
+    ContextOverflow,
+    ModelNotFound,
+    Unknown,
+}
+
+/// 分类后的 LLM 错误，附带恢复提示
+#[derive(Debug, Clone)]
+pub struct ClassifiedLlmError {
+    pub kind: LlmErrorKind,
+    pub retryable: bool,
+    pub should_compress: bool,
+    pub should_rotate_credential: bool,
+    pub retry_after: Option<std::time::Duration>,
+    pub message: String,
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
--- a/desktop/src-tauri/src/intelligence/experience.rs
+++ b/desktop/src-tauri/src/intelligence/experience.rs
@@ -16,6 +16,21 @@ use zclaw_types::Result;
 use super::pain_aggregator::PainPoint;
 use super::solution_generator::Proposal;

+/// Brief summary of a stored experience, for suggestion context enrichment.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ExperienceBrief {
+    pub pain_pattern: String,
+    pub solution_summary: String,
+    pub reuse_count: u32,
+}
+
+static EXPERIENCE_EXTRACTOR: std::sync::OnceLock<std::sync::Arc<ExperienceExtractor>> = std::sync::OnceLock::new();
+
+/// Get the global ExperienceExtractor singleton (if initialized).
+pub(crate) fn get_experience_extractor() -> Option<std::sync::Arc<ExperienceExtractor>> {
+    EXPERIENCE_EXTRACTOR.get().cloned()
+}
+
 // ---------------------------------------------------------------------------
 // Shared completion status
 // ---------------------------------------------------------------------------
@@ -263,6 +278,36 @@ fn xml_escape(s: &str) -> String {
     .replace('>', "&gt;")
 }

+/// Initialize the global ExperienceExtractor singleton.
+/// Called once during app startup, after viking storage is ready.
+pub async fn init_experience_extractor() -> Result<()> {
+    let sqlite_storage = crate::viking_commands::get_storage().await
+        .map_err(|e| zclaw_types::ZclawError::StorageError(e))?;
+    let viking = std::sync::Arc::new(zclaw_growth::VikingAdapter::new(sqlite_storage));
+    let store = std::sync::Arc::new(ExperienceStore::new(viking));
+    let extractor = std::sync::Arc::new(ExperienceExtractor::new(store));
+    EXPERIENCE_EXTRACTOR.set(extractor)
+        .map_err(|_| zclaw_types::ZclawError::StorageError("ExperienceExtractor already initialized".into()))?;
+    Ok(())
+}
+
+/// Find experiences relevant to the current conversation for suggestion enrichment.
+#[tauri::command]
+pub async fn experience_find_relevant(
+    agent_id: String,
+    query: String,
+) -> std::result::Result<Vec<ExperienceBrief>, String> {
+    let extractor = get_experience_extractor()
+        .ok_or("ExperienceExtractor not initialized".to_string())?;
+    let experiences = extractor.find_relevant_experiences(&agent_id, &query).await;
+    Ok(experiences.into_iter().take(3).map(|e| ExperienceBrief {
+        pain_pattern: e.pain_pattern,
+        solution_summary: e.solution_steps.join("；")
+            .chars().take(100).collect(),
+        reuse_count: e.reuse_count,
+    }).collect())
+}
+
 // ---------------------------------------------------------------------------
 // Tests
 // ---------------------------------------------------------------------------
@@ -407,4 +452,17 @@ mod tests {
        assert_eq!(truncate("hello", 10), "hello");
        assert_eq!(truncate("这是一个很长的字符串用于测试截断", 10).chars().count(), 11); // 10 + …
    }
+
+    #[test]
+    fn test_experience_brief_serialization() {
+        let brief = super::ExperienceBrief {
+            pain_pattern: "报表生成慢".to_string(),
+            solution_summary: "使用 researcher 技能自动收集".to_string(),
+            reuse_count: 3,
+        };
+        let json = serde_json::to_string(&brief).unwrap();
+        let parsed: super::ExperienceBrief = serde_json::from_str(&json).unwrap();
+        assert_eq!(parsed.pain_pattern, "报表生成慢");
+        assert_eq!(parsed.reuse_count, 3);
+    }
 }
--- a/desktop/src-tauri/src/intelligence_hooks.rs
+++ b/desktop/src-tauri/src/intelligence_hooks.rs
@@ -7,8 +7,10 @@

 use tracing::{debug, warn};

+use std::collections::HashMap;
 use std::sync::Arc;
 use tauri::Emitter;
+use tokio::sync::RwLock;
 use zclaw_growth::VikingStorage;

 use crate::intelligence::identity::IdentityManagerState;
@@ -16,6 +18,36 @@ use crate::intelligence::heartbeat::HeartbeatEngineState;
 use crate::intelligence::reflection::{MemoryEntryForAnalysis, ReflectionEngineState};
 use zclaw_runtime::driver::LlmDriver;

+// ---------------------------------------------------------------------------
+// Identity prompt cache — avoids mutex + disk I/O on every request
+// ---------------------------------------------------------------------------
+
+struct CachedIdentity {
+    prompt: String,
+    #[allow(dead_code)] // Reserved for future TTL-based cache validation
+    soul_hash: u64,
+}
+
+static IDENTITY_CACHE: std::sync::LazyLock<RwLock<HashMap<String, CachedIdentity>>> =
+    std::sync::LazyLock::new(|| RwLock::new(HashMap::new()));
+
+/// Invalidate cached identity prompt for a given agent (call when soul.md changes).
+pub fn invalidate_identity_cache(agent_id: &str) {
+    let cache = &*IDENTITY_CACHE;
+    // Non-blocking: spawn a task to remove the entry
+    if let Ok(mut guard) = cache.try_write() {
+        guard.remove(agent_id);
+    }
+}
+
+/// Simple hash for cache invalidation — uses string content hash.
+fn content_hash(s: &str) -> u64 {
+    use std::hash::{Hash, Hasher};
+    let mut hasher = std::collections::hash_map::DefaultHasher::new();
+    s.hash(&mut hasher);
+    hasher.finish()
+}
+
 /// Run pre-conversation intelligence hooks
 ///
 /// Builds identity-enhanced system prompt (SOUL.md + instructions) and
@@ -29,10 +61,29 @@ pub async fn pre_conversation_hook(
    _user_message: &str,
    identity_state: &IdentityManagerState,
 ) -> Result<String, String> {
-    // Build identity-enhanced system prompt (SOUL.md + instructions)
-    // Memory context is injected by MemoryMiddleware in the kernel middleware chain,
-    // not here, to avoid duplicate injection.
-    let enhanced_prompt = match build_identity_prompt(agent_id, "", identity_state).await {
+    // Check identity prompt cache first (avoids mutex + disk I/O)
+    let cache = &*IDENTITY_CACHE;
+    {
+        let guard = cache.read().await;
+        if let Some(cached) = guard.get(agent_id) {
+            // Cache hit — still need continuity context, but skip identity build
+            let continuity_context = build_continuity_context(agent_id, _user_message).await;
+            let mut result = cached.prompt.clone();
+            if !continuity_context.is_empty() {
+                result.push_str(&continuity_context);
+            }
+            debug!("[intelligence_hooks] Identity cache HIT for agent {}", agent_id);
+            return Ok(result);
+        }
+    }
+
+    // Cache miss — build identity prompt and continuity context in parallel
+    let (identity_result, continuity_context) = tokio::join!(
+        build_identity_prompt_cached(agent_id, "", identity_state, cache),
+        build_continuity_context(agent_id, _user_message)
+    );
+
+    let enhanced_prompt = match identity_result {
        Ok(prompt) => prompt,
        Err(e) => {
            warn!(
@@ -43,9 +94,6 @@ pub async fn pre_conversation_hook(
        }
    };

-    // Cross-session continuity: check for unresolved pain points and recent experiences
-    let continuity_context = build_continuity_context(agent_id, _user_message).await;
-
    let mut result = enhanced_prompt;
    if !continuity_context.is_empty() {
        result.push_str(&continuity_context);
@@ -240,6 +288,8 @@ pub async fn post_conversation_hook(
                        warn!("[intelligence_hooks] Failed to update soul with agent name: {}", e);
                    } else {
                        debug!("[intelligence_hooks] Updated agent name to '{}' in soul", name);
+                        // Invalidate cache since soul.md changed
+                        invalidate_identity_cache(agent_id);
                    }
                }
                drop(manager);
@@ -340,21 +390,34 @@ async fn build_memory_context(
    Ok(context)
 }

-/// Build identity-enhanced system prompt
-async fn build_identity_prompt(
+/// Build identity-enhanced system prompt and cache the result.
+async fn build_identity_prompt_cached(
    agent_id: &str,
    memory_context: &str,
    identity_state: &IdentityManagerState,
+    cache: &RwLock<HashMap<String, CachedIdentity>>,
 ) -> Result<String, String> {
-    // IdentityManagerState is Arc<tokio::sync::Mutex<AgentIdentityManager>>
-    // tokio::sync::Mutex::lock() returns MutexGuard directly
    let mut manager = identity_state.lock().await;

+    // Read current soul content for hashing
+    let soul_content = manager.get_file(agent_id, crate::intelligence::identity::IdentityFile::Soul);
+    let soul_hash = content_hash(&soul_content);
+
    let prompt = manager.build_system_prompt(
        agent_id,
        if memory_context.is_empty() { None } else { Some(memory_context) },
    ).await;

+    // Cache the result
+    drop(manager); // Release lock before acquiring write guard
+    {
+        let mut guard = cache.write().await;
+        guard.insert(agent_id.to_string(), CachedIdentity {
+            prompt: prompt.clone(),
+            soul_hash,
+        });
+    }
+
    Ok(prompt)
 }

--- a/desktop/src-tauri/src/lib.rs
+++ b/desktop/src-tauri/src/lib.rs
@@ -212,6 +212,12 @@ pub fn run() {
                if let Err(e) = rt.block_on(intelligence::pain_aggregator::init_pain_storage(pool)) {
                    tracing::error!("[PainStorage] Init failed: {}, pain points will not persist", e);
                }
+
+                // Initialize experience extractor for suggestion enrichment.
+                // Graceful degradation: failure does not block app startup.
+                if let Err(e) = rt.block_on(intelligence::experience::init_experience_extractor()) {
+                    tracing::warn!("[ExperienceExtractor] Init failed: {}, suggestion context will be empty", e);
+                }
            }

            Ok(())
@@ -435,6 +441,8 @@ pub fn run() {
            intelligence::pain_aggregator::butler_update_proposal_status,
            // Industry config loader
            viking_commands::viking_load_industry_keywords,
+            // Experience finder for suggestion enrichment
+            intelligence::experience::experience_find_relevant,
        ])
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
--- a/desktop/src/components/ChatArea.tsx
+++ b/desktop/src/components/ChatArea.tsx
@@ -665,6 +665,28 @@ function stripToolNarration(content: string): string {
  return result || content;
 }

+/**
+ * Strip dangling clarification references from text when ask_clarification tool was called.
+ * When the LLM calls ask_clarification, it often ends its text with phrases like
+ * "比如：" / "以下信息" / "以下选项" that reference the tool output — but the tool output
+ * is rendered in a separate ClarificationCard, so these become confusing dead-end sentences.
+ */
+function stripDanglingClarificationRef(text: string, hasClarificationTool: boolean): string {
+  if (!hasClarificationTool || !text) return text;
+  // Match trailing dangling references in Chinese and English
+  const patterns = [
+    /[，,]\s*可以(?:提供以下|告诉我更多细节，)?(?:信息|选项|方向|细节|分类|类型)[：:]\s*$/,
+    /[，,]\s*比如[：:]\s*$/,
+    /[，,]\s*(?:例如|譬如|如以下)[：:]\s*$/,
+    /,\s*(?:for example|such as|like|the following)[：:]?\s*$/i,
+  ];
+  for (const pat of patterns) {
+    const stripped = text.replace(pat, '');
+    if (stripped !== text) return stripped;
+  }
+  return text;
+}
+
 function MessageBubble({ message, onRetry }: { message: Message; setInput?: (text: string) => void; onRetry?: () => void }) {
  if (message.role === 'tool') {
    return null;
@@ -749,7 +771,10 @@ function MessageBubble({ message, onRetry }: { message: Message; setInput?: (tex
                ? (isUser
                    ? message.content
                    : <StreamingText
-                        content={stripToolNarration(message.content)}
+                        content={stripDanglingClarificationRef(
+                          stripToolNarration(message.content),
+                          toolCallSteps?.some(s => s.toolName === 'ask_clarification') ?? false,
+                        )}
                        isStreaming={!!message.streaming}
                        className="text-gray-700 dark:text-gray-200"
                      />
--- a/desktop/src/components/ai/ArtifactPanel.tsx
+++ b/desktop/src/components/ai/ArtifactPanel.tsx
@@ -6,9 +6,10 @@ import {
  Image as ImageIcon,
  Download,
  Copy,
-  ChevronLeft,
+  ChevronDown,
  File,
 } from 'lucide-react';
+import { MarkdownRenderer } from './MarkdownRenderer';

 // ---------------------------------------------------------------------------
 // Types
@@ -76,6 +77,7 @@ export function ArtifactPanel({
  className = '',
 }: ArtifactPanelProps) {
  const [viewMode, setViewMode] = useState<'preview' | 'code'>('preview');
+  const [fileMenuOpen, setFileMenuOpen] = useState(false);
  const selected = useMemo(
    () => artifacts.find((a) => a.id === selectedId),
    [artifacts, selectedId]
@@ -135,22 +137,59 @@ export function ArtifactPanel({

  return (
    <div className={`h-full flex flex-col ${className}`}>
-      {/* File header */}
+      {/* File header with inline file selector */}
      <div className="px-4 py-2 border-b border-gray-200 dark:border-gray-700 flex items-center gap-2 flex-shrink-0">
-        <button
-          onClick={() => onSelect('')}
-          className="p-1 rounded hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-400 hover:text-gray-600 dark:hover:text-gray-200 transition-colors"
-          title="返回文件列表"
-        >
-          <ChevronLeft className="w-4 h-4" />
-        </button>
-        <Icon className="w-4 h-4 text-orange-500 flex-shrink-0" />
-        <span className="text-sm font-medium text-gray-700 dark:text-gray-200 truncate flex-1">
-          {selected.name}
-        </span>
+        <div className="relative">
+          <button
+            onClick={() => setFileMenuOpen(!fileMenuOpen)}
+            className="flex items-center gap-1.5 text-sm font-medium text-gray-700 dark:text-gray-200 truncate hover:text-orange-500 transition-colors"
+            title="切换文件"
+          >
+            <Icon className="w-4 h-4 text-orange-500 flex-shrink-0" />
+            <span className="truncate max-w-[120px]">{selected.name}</span>
+            {artifacts.length > 1 && (
+              <ChevronDown className={`w-3.5 h-3.5 text-gray-400 transition-transform ${fileMenuOpen ? 'rotate-180' : ''}`} />
+            )}
+          </button>
+
+          {/* File selector dropdown */}
+          {fileMenuOpen && artifacts.length > 1 && (
+            <>
+              <div className="fixed inset-0 z-10" onClick={() => setFileMenuOpen(false)} />
+              <div className="absolute top-full left-0 mt-1 w-56 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg z-20 py-1 max-h-60 overflow-y-auto">
+                {artifacts.map((artifact) => {
+                  const ItemIcon = getFileIcon(artifact.type);
+                  return (
+                    <button
+                      key={artifact.id}
+                      onClick={() => { onSelect(artifact.id); setFileMenuOpen(false); }}
+                      className={`w-full flex items-center gap-2 px-3 py-2 text-left text-sm hover:bg-gray-50 dark:hover:bg-gray-700 transition-colors ${
+                        artifact.id === selected.id ? 'bg-orange-50 dark:bg-orange-900/20 text-orange-700 dark:text-orange-300' : 'text-gray-700 dark:text-gray-200'
+                      }`}
+                    >
+                      <ItemIcon className="w-4 h-4 flex-shrink-0" />
+                      <span className="truncate flex-1">{artifact.name}</span>
+                      <span className={`text-[10px] px-1 py-0.5 rounded ${getTypeColor(artifact.type)}`}>
+                        {getTypeLabel(artifact.type)}
+                      </span>
+                    </button>
+                  );
+                })}
+              </div>
+            </>
+          )}
+        </div>
+
+        <div className="flex-1" />
+
        <span className={`text-[10px] px-1.5 py-0.5 rounded font-medium ${getTypeColor(selected.type)}`}>
          {getTypeLabel(selected.type)}
        </span>
+        {selected.language && (
+          <span className="text-[10px] text-gray-400 dark:text-gray-500">
+            {selected.language}
+          </span>
+        )}
      </div>

      {/* View mode toggle */}
@@ -180,19 +219,7 @@ export function ArtifactPanel({
      {/* Content area */}
      <div className="flex-1 overflow-y-auto custom-scrollbar p-4">
        {viewMode === 'preview' ? (
-          <div className="prose prose-sm dark:prose-invert max-w-none">
-            {selected.type === 'markdown' ? (
-              <MarkdownPreview content={selected.content} />
-            ) : selected.type === 'code' ? (
-              <pre className="bg-gray-50 dark:bg-gray-800 rounded-lg p-3 text-xs font-mono overflow-x-auto text-gray-700 dark:text-gray-200">
-                {selected.content}
-              </pre>
-            ) : (
-              <pre className="whitespace-pre-wrap text-sm text-gray-700 dark:text-gray-200">
-                {selected.content}
-              </pre>
-            )}
-          </div>
+          <ArtifactContentPreview artifact={selected} />
        ) : (
          <pre className="bg-gray-50 dark:bg-gray-800 rounded-lg p-3 text-xs font-mono overflow-x-auto text-gray-700 dark:text-gray-200 leading-relaxed">
            {selected.content}
@@ -217,6 +244,37 @@ export function ArtifactPanel({
  );
 }

+// ---------------------------------------------------------------------------
+// ArtifactContentPreview — renders artifact based on type
+// ---------------------------------------------------------------------------
+
+function ArtifactContentPreview({ artifact }: { artifact: ArtifactFile }) {
+  if (artifact.type === 'markdown') {
+    return <MarkdownRenderer content={artifact.content} />;
+  }
+
+  if (artifact.type === 'code') {
+    return (
+      <div className="relative">
+        {artifact.language && (
+          <div className="absolute top-2 right-2 text-[10px] text-gray-400 dark:text-gray-500 bg-gray-100 dark:bg-gray-700 px-1.5 py-0.5 rounded">
+            {artifact.language}
+          </div>
+        )}
+        <pre className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 text-xs font-mono overflow-x-auto text-gray-700 dark:text-gray-200 leading-relaxed border border-gray-200 dark:border-gray-700">
+          {artifact.content}
+        </pre>
+      </div>
+    );
+  }
+
+  return (
+    <pre className="whitespace-pre-wrap text-sm text-gray-700 dark:text-gray-200">
+      {artifact.content}
+    </pre>
+  );
+}
+
 // ---------------------------------------------------------------------------
 // ActionButton
 // ---------------------------------------------------------------------------
@@ -243,50 +301,6 @@ function ActionButton({ icon, label, onClick }: { icon: React.ReactNode; label:
  );
 }

-// ---------------------------------------------------------------------------
-// Simple Markdown preview (no external deps)
-// ---------------------------------------------------------------------------
-
-function MarkdownPreview({ content }: { content: string }) {
-  // Basic markdown rendering: headings, bold, code blocks, lists
-  const lines = content.split('\n');
-
-  return (
-    <div className="space-y-2">
-      {lines.map((line, i) => {
-        // Heading
-        if (line.startsWith('### ')) {
-          return <h3 key={i} className="text-sm font-bold text-gray-800 dark:text-gray-100 mt-3">{line.slice(4)}</h3>;
-        }
-        if (line.startsWith('## ')) {
-          return <h2 key={i} className="text-base font-bold text-gray-800 dark:text-gray-100 mt-4">{line.slice(3)}</h2>;
-        }
-        if (line.startsWith('# ')) {
-          return <h1 key={i} className="text-lg font-bold text-gray-800 dark:text-gray-100">{line.slice(2)}</h1>;
-        }
-        // Code block (simplified)
-        if (line.startsWith('```')) return null;
-        // List item
-        if (line.startsWith('- ') || line.startsWith('* ')) {
-          return <li key={i} className="text-sm text-gray-700 dark:text-gray-300 ml-4">{renderInline(line.slice(2))}</li>;
-        }
-        // Empty line
-        if (!line.trim()) return <div key={i} className="h-2" />;
-        // Regular paragraph
-        return <p key={i} className="text-sm text-gray-700 dark:text-gray-300 leading-relaxed">{renderInline(line)}</p>;
-      })}
-    </div>
-  );
-}
-
-function renderInline(text: string): React.ReactNode {
-  // Bold
-  const parts = text.split(/\*\*(.*?)\*\*/g);
-  return parts.map((part, i) =>
-    i % 2 === 1 ? <strong key={i} className="font-semibold">{part}</strong> : part
-  );
-}
-
 // ---------------------------------------------------------------------------
 // Download helper
 // ---------------------------------------------------------------------------
--- a/desktop/src/components/ai/MarkdownRenderer.tsx
+++ b/desktop/src/components/ai/MarkdownRenderer.tsx
@@ -0,0 +1,123 @@
+/**
+ * MarkdownRenderer — shared Markdown rendering with styled components.
+ *
+ * Extracted from StreamingText.tsx so ArtifactPanel and other consumers
+ * can reuse the same rich rendering (GFM tables, syntax blocks, etc.)
+ * without duplicating the component overrides.
+ */
+
+import ReactMarkdown from 'react-markdown';
+import remarkGfm from 'remark-gfm';
+import type { Components } from 'react-markdown';
+
+// ---------------------------------------------------------------------------
+// Shared component overrides for react-markdown
+// ---------------------------------------------------------------------------
+
+export const markdownComponents: Components = {
+  pre({ children }) {
+    return (
+      <pre className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 overflow-x-auto text-sm leading-relaxed border border-gray-200 dark:border-gray-700 my-3">
+        {children}
+      </pre>
+    );
+  },
+  code({ className, children, ...props }) {
+    const isBlock = className?.startsWith('language-');
+    if (isBlock) {
+      return (
+        <code className={`${className || ''} text-gray-800 dark:text-gray-200`} {...props}>
+          {children}
+        </code>
+      );
+    }
+    return (
+      <code className="bg-gray-100 dark:bg-gray-800 text-gray-700 dark:text-gray-300 px-1.5 py-0.5 rounded text-[0.9em] font-mono" {...props}>
+        {children}
+      </code>
+    );
+  },
+  table({ children }) {
+    return (
+      <div className="overflow-x-auto my-3 -mx-1">
+        <table className="min-w-full border-collapse border border-gray-200 dark:border-gray-700 rounded-lg text-sm">
+          {children}
+        </table>
+      </div>
+    );
+  },
+  thead({ children }) {
+    return <thead className="bg-gray-50 dark:bg-gray-800/50">{children}</thead>;
+  },
+  th({ children }) {
+    return (
+      <th className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-left font-semibold text-gray-700 dark:text-gray-300">
+        {children}
+      </th>
+    );
+  },
+  td({ children }) {
+    return (
+      <td className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-gray-600 dark:text-gray-400">
+        {children}
+      </td>
+    );
+  },
+  ul({ children }) {
+    return <ul className="list-disc list-outside ml-5 my-2 space-y-1">{children}</ul>;
+  },
+  ol({ children }) {
+    return <ol className="list-decimal list-outside ml-5 my-2 space-y-1">{children}</ol>;
+  },
+  li({ children }) {
+    return <li className="leading-relaxed">{children}</li>;
+  },
+  h1({ children }) {
+    return <h1 className="text-xl font-bold mt-5 mb-3 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h1>;
+  },
+  h2({ children }) {
+    return <h2 className="text-lg font-bold mt-4 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h2>;
+  },
+  h3({ children }) {
+    return <h3 className="text-base font-semibold mt-3 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h3>;
+  },
+  blockquote({ children }) {
+    return (
+      <blockquote className="border-l-4 border-gray-300 dark:border-gray-600 pl-4 py-1 my-3 text-gray-600 dark:text-gray-400 italic bg-gray-50 dark:bg-gray-800/30 rounded-r-lg">
+        {children}
+      </blockquote>
+    );
+  },
+  p({ children }) {
+    return <p className="my-2 leading-relaxed first:mt-0 last:mb-0">{children}</p>;
+  },
+  a({ href, children }) {
+    return (
+      <a href={href} target="_blank" rel="noopener noreferrer" className="text-blue-600 dark:text-blue-400 underline hover:text-blue-800 dark:hover:text-blue-300">
+        {children}
+      </a>
+    );
+  },
+  hr() {
+    return <hr className="my-4 border-gray-200 dark:border-gray-700" />;
+  },
+};
+
+// ---------------------------------------------------------------------------
+// Convenience wrapper
+// ---------------------------------------------------------------------------
+
+interface MarkdownRendererProps {
+  content: string;
+  className?: string;
+}
+
+export function MarkdownRenderer({ content, className = '' }: MarkdownRendererProps) {
+  return (
+    <div className={`prose-sm prose-gray dark:prose-invert max-w-none ${className}`}>
+      <ReactMarkdown remarkPlugins={[remarkGfm]} components={markdownComponents}>
+        {content}
+      </ReactMarkdown>
+    </div>
+  );
+}
--- a/desktop/src/components/ai/StreamingText.tsx
+++ b/desktop/src/components/ai/StreamingText.tsx
@@ -1,7 +1,5 @@
 import { useMemo, useRef, useEffect, useState } from 'react';
-import ReactMarkdown from 'react-markdown';
-import remarkGfm from 'remark-gfm';
-import type { Components } from 'react-markdown';
+import { MarkdownRenderer } from './MarkdownRenderer';

 /**
 * Streaming text with word-by-word reveal animation.
@@ -18,111 +16,6 @@ interface StreamingTextProps {
  asMarkdown?: boolean;
 }

-// ---------------------------------------------------------------------------
-// Markdown component overrides for rich rendering
-// ---------------------------------------------------------------------------
-
-const markdownComponents: Components = {
-  // Code blocks (```...```)
-  pre({ children }) {
-    return (
-      <pre className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 overflow-x-auto text-sm leading-relaxed border border-gray-200 dark:border-gray-700 my-3">
-        {children}
-      </pre>
-    );
-  },
-  // Inline code (`...`)
-  code({ className, children, ...props }) {
-    // If it has a language class, it's inside a code block — render as block
-    const isBlock = className?.startsWith('language-');
-    if (isBlock) {
-      return (
-        <code className={`${className || ''} text-gray-800 dark:text-gray-200`} {...props}>
-          {children}
-        </code>
-      );
-    }
-    return (
-      <code className="bg-gray-100 dark:bg-gray-800 text-gray-700 dark:text-gray-300 px-1.5 py-0.5 rounded text-[0.9em] font-mono" {...props}>
-        {children}
-      </code>
-    );
-  },
-  // Tables
-  table({ children }) {
-    return (
-      <div className="overflow-x-auto my-3 -mx-1">
-        <table className="min-w-full border-collapse border border-gray-200 dark:border-gray-700 rounded-lg text-sm">
-          {children}
-        </table>
-      </div>
-    );
-  },
-  thead({ children }) {
-    return <thead className="bg-gray-50 dark:bg-gray-800/50">{children}</thead>;
-  },
-  th({ children }) {
-    return (
-      <th className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-left font-semibold text-gray-700 dark:text-gray-300">
-        {children}
-      </th>
-    );
-  },
-  td({ children }) {
-    return (
-      <td className="border border-gray-200 dark:border-gray-700 px-3 py-2 text-gray-600 dark:text-gray-400">
-        {children}
-      </td>
-    );
-  },
-  // Unordered lists
-  ul({ children }) {
-    return <ul className="list-disc list-outside ml-5 my-2 space-y-1">{children}</ul>;
-  },
-  // Ordered lists
-  ol({ children }) {
-    return <ol className="list-decimal list-outside ml-5 my-2 space-y-1">{children}</ol>;
-  },
-  // List items
-  li({ children }) {
-    return <li className="leading-relaxed">{children}</li>;
-  },
-  // Headings
-  h1({ children }) {
-    return <h1 className="text-xl font-bold mt-5 mb-3 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h1>;
-  },
-  h2({ children }) {
-    return <h2 className="text-lg font-bold mt-4 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h2>;
-  },
-  h3({ children }) {
-    return <h3 className="text-base font-semibold mt-3 mb-2 text-gray-900 dark:text-gray-100 first:mt-0">{children}</h3>;
-  },
-  // Blockquotes
-  blockquote({ children }) {
-    return (
-      <blockquote className="border-l-4 border-gray-300 dark:border-gray-600 pl-4 py-1 my-3 text-gray-600 dark:text-gray-400 italic bg-gray-50 dark:bg-gray-800/30 rounded-r-lg">
-        {children}
-      </blockquote>
-    );
-  },
-  // Paragraphs
-  p({ children }) {
-    return <p className="my-2 leading-relaxed first:mt-0 last:mb-0">{children}</p>;
-  },
-  // Links
-  a({ href, children }) {
-    return (
-      <a href={href} target="_blank" rel="noopener noreferrer" className="text-blue-600 dark:text-blue-400 underline hover:text-blue-800 dark:hover:text-blue-300">
-        {children}
-      </a>
-    );
-  },
-  // Horizontal rules
-  hr() {
-    return <hr className="my-4 border-gray-200 dark:border-gray-700" />;
-  },
-};
-
 // ---------------------------------------------------------------------------
 // Token splitter for streaming animation
 // ---------------------------------------------------------------------------
@@ -176,13 +69,7 @@ export function StreamingText({
 }: StreamingTextProps) {
  // For completed messages, use full markdown rendering with styled components
  if (!isStreaming && asMarkdown) {
-    return (
-      <div className={`prose-sm prose-gray dark:prose-invert max-w-none ${className}`}>
-        <ReactMarkdown remarkPlugins={[remarkGfm]} components={markdownComponents}>
-          {content}
-        </ReactMarkdown>
-      </div>
-    );
+    return <MarkdownRenderer content={content} className={className} />;
  }

  // For streaming messages, use token-by-token animation
--- a/desktop/src/components/ai/ToolCallChain.tsx
+++ b/desktop/src/components/ai/ToolCallChain.tsx
@@ -166,7 +166,8 @@ interface ToolStepRowProps {
 }

 function ToolStepRow({ step, isActive, showConnector }: ToolStepRowProps) {
-  const [expanded, setExpanded] = useState(false);
+  // Clarification cards default to expanded so users see options immediately
+  const [expanded, setExpanded] = useState(step.toolName === 'ask_clarification');
  const Icon = getToolIcon(step.toolName);
  const label = getToolLabel(step.toolName);
  const isRunning = step.status === 'running';
--- a/desktop/src/components/ai/index.ts
+++ b/desktop/src/components/ai/index.ts
@@ -8,4 +8,5 @@ export { SuggestionChips } from './SuggestionChips';
 export { ResizableChatLayout } from './ResizableChatLayout';
 export { ToolCallChain, type ToolCallStep } from './ToolCallChain';
 export { ArtifactPanel, type ArtifactFile } from './ArtifactPanel';
+export { MarkdownRenderer, markdownComponents } from './MarkdownRenderer';
 export { TokenMeter } from './TokenMeter';
--- a/desktop/src/lib/gateway-client.ts
+++ b/desktop/src/lib/gateway-client.ts
@@ -696,13 +696,14 @@ export class GatewayClient {
        break;

      case 'tool_call':
-        // Tool call event
+        // Tool call start: onTool(name, input, '') — empty output signals start
        if (callbacks.onTool && data.tool) {
-          callbacks.onTool(data.tool, JSON.stringify(data.input || {}), data.output || '');
+          callbacks.onTool(data.tool, JSON.stringify(data.input || {}), '');
        }
        break;

      case 'tool_result':
+        // Tool call end: onTool(name, '', output) — empty input signals end
        if (callbacks.onTool && data.tool) {
          callbacks.onTool(data.tool, '', String(data.result || data.output || ''));
        }
--- a/desktop/src/lib/llm-service.ts
+++ b/desktop/src/lib/llm-service.ts
@@ -646,18 +646,25 @@ const HARDCODED_PROMPTS: Record<string, { system: string; user: (arg: string) =>
  },

  suggestions: {
-    system: `你是对话分析助手。根据最近的对话内容，生成 3 个用户可能想继续探讨的问题。
+    system: `你是 ZCLAW 的管家助手，需要站在用户角度思考他们真正需要什么，生成 3 个个性化建议。

-要求：
- 每个问题必须与对话内容直接相关，具体且有针对性
- 帮助用户深入理解、实际操作或拓展思路
- 每个问题不超过 30 个中文字符
- 不要重复对话中已讨论过的内容
- 使用与用户相同的语言
+## 生成规则
+1. 第 1 条 — 深入追问：基于当前话题，提出一个有洞察力的追问，帮助用户深入探索
+2. 第 2 条 — 实用行动：建议一个具体的、可操作的下一步（调用技能、执行工具、查看数据等）
+3. 第 3 条 — 管家关怀：
+   - 如果有未解决痛点 → 回访建议，如"上次提到的X，后来解决了吗？"
+   - 如果有相关经验 → 引导复用，如"上次用X方法解决了类似问题，要再试试吗？"
+   - 如果有匹配技能 → 推荐使用，如"试试 [技能名] 来处理这个"
+   - 如果没有提供痛点/经验/技能信息 → 给出一个启发性的思考角度
+4. 每个不超过 30 个中文字符
+5. 不要重复对话中已讨论过的内容
+6. 不要生成空泛的建议（如"继续分析"、"换个角度"）
+7. 默认使用中文，不要混入英文词汇（如"workflow"用"工作流"、"report"用"报表"），除非用户在对话中明确使用英文
+8. 建议会被用户直接点击发送，因此不要包含任何称谓（如"领导"、"老板"、"老师"等），用无主语的问句或陈述句

 只输出 JSON 数组，包含恰好 3 个字符串。不要输出任何其他内容。
-示例：["如何在生产环境中部署？", "这个方案的成本如何？", "有没有更简单的替代方案？"]`,
-    user: (context: string) => `以下是对话中最近的消息：\n\n${context}\n\n请生成 3 个后续问题。`,
+示例：["科室绩效分析可以按哪些维度拆解？", "用研究技能查一下相关文献？", "上次提到的排班冲突问题，需要继续想解决方案吗？"]`,
+    user: (context: string) => `以下是对话中最近的消息：\n\n${context}\n\n请生成 3 个后续建议（1 深入追问 + 1 实用行动 + 1 管家关怀）。`,
  },
 };

--- a/desktop/src/lib/suggestion-context.ts
+++ b/desktop/src/lib/suggestion-context.ts
@@ -0,0 +1,131 @@
+/**
+ * Suggestion context enrichment — fetches intelligence data for personalized suggestions.
+ * All fetches are optional; failures silently degrade to empty context.
+ */
+
+import { invoke } from '@tauri-apps/api/core';
+import { createLogger } from './logger';
+
+const log = createLogger('SuggestionContext');
+
+const CONTEXT_FETCH_TIMEOUT = 500;
+
+/** Pain point from butler intelligence layer. */
+interface PainPoint {
+  summary: string;
+  category: string;
+  confidence: number;
+  status: string;
+  occurrence_count: number;
+}
+
+/** Brief experience from the experience store. */
+interface ExperienceBrief {
+  pain_pattern: string;
+  solution_summary: string;
+  reuse_count: number;
+}
+
+/** Pipeline/skill match candidate. */
+interface PipelineCandidateInfo {
+  id: string;
+  display_name: string;
+  description: string;
+  category: string | null;
+  match_reason: string | null;
+}
+
+/** Route intent response (only NoMatch variant has suggestions). */
+interface RouteResultResponse {
+  type: 'Matched' | 'Ambiguous' | 'NoMatch' | 'NeedMoreInfo';
+  suggestions?: PipelineCandidateInfo[];
+}
+
+/** Aggregated suggestion context from all intelligence sources. */
+export interface SuggestionContext {
+  userProfile: string;
+  painPoints: string;
+  experiences: string;
+  skillMatch: string;
+}
+
+function isTauriAvailable(): boolean {
+  return typeof window !== 'undefined' && '__TAURI_INTERNALS__' in window;
+}
+
+function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T | null> {
+  return Promise.race([
+    promise,
+    new Promise<null>(resolve => setTimeout(() => resolve(null), ms)),
+  ]);
+}
+
+async function fetchUserProfile(agentId: string): Promise<string> {
+  const profile = await invoke<string>('identity_get_file', {
+    agentId,
+    file: 'userprofile',
+  });
+  if (!profile || profile.trim().length === 0) return '';
+  const text = profile.trim();
+  return text.length > 200 ? text.slice(0, 200) : text;
+}
+
+async function fetchPainPoints(agentId: string): Promise<string> {
+  const points = await invoke<PainPoint[]>('butler_list_pain_points', { agentId });
+  if (!Array.isArray(points) || points.length === 0) return '';
+
+  const active = points
+    .filter(p => p.confidence >= 0.5 && p.status !== 'Solved' && p.status !== 'Dismissed')
+    .sort((a, b) => b.confidence - a.confidence)
+    .slice(0, 3);
+
+  if (active.length === 0) return '';
+  return active
+    .map((p, i) => `${i + 1}. [${p.category}] ${p.summary}（出现${p.occurrence_count}次）`)
+    .join('\n');
+}
+
+async function fetchExperiences(agentId: string, query: string): Promise<string> {
+  const experiences = await invoke<ExperienceBrief[]>('experience_find_relevant', {
+    agentId,
+    query,
+  });
+  if (!Array.isArray(experiences) || experiences.length === 0) return '';
+
+  return experiences.slice(0, 2)
+    .map(e => `上次解决"${e.pain_pattern}"的方法：${e.solution_summary}（已复用${e.reuse_count}次）`)
+    .join('\n');
+}
+
+async function fetchSkillMatch(userInput: string): Promise<string> {
+  const result = await invoke<RouteResultResponse>('route_intent', { userInput });
+  const suggestions = result?.suggestions;
+  if (!Array.isArray(suggestions) || suggestions.length === 0) return '';
+
+  const best = suggestions[0];
+  return `你可能需要：${best.display_name} — ${best.description}`;
+}
+
+const EMPTY_CONTEXT: SuggestionContext = { userProfile: '', painPoints: '', experiences: '', skillMatch: '' };
+
+/**
+ * Fetch all intelligence context in parallel for suggestion enrichment.
+ * Returns empty strings for any source that fails — never throws.
+ */
+export async function fetchSuggestionContext(
+  agentId: string,
+  lastUserMessage: string,
+): Promise<SuggestionContext> {
+  if (!isTauriAvailable()) {
+    return EMPTY_CONTEXT;
+  }
+
+  const [userProfile, painPoints, experiences, skillMatch] = await Promise.all([
+    withTimeout(fetchUserProfile(agentId).catch(e => { log.warn('User profile fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
+    withTimeout(fetchPainPoints(agentId).catch(e => { log.warn('Pain points fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
+    withTimeout(fetchExperiences(agentId, lastUserMessage).catch(e => { log.warn('Experiences fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
+    withTimeout(fetchSkillMatch(lastUserMessage).catch(e => { log.warn('Skill match fetch failed:', e); return ''; }), CONTEXT_FETCH_TIMEOUT),
+  ]);
+
+  return { userProfile: userProfile ?? '', painPoints: painPoints ?? '', experiences: experiences ?? '', skillMatch: skillMatch ?? '' };
+}
--- a/desktop/src/store/chat/artifactStore.ts
+++ b/desktop/src/store/chat/artifactStore.ts
@@ -1,13 +1,13 @@
 /**
- * ArtifactStore — manages the artifact panel state.
+ * ArtifactStore — manages the artifact panel state with IndexedDB persistence.
 *
 * Extracted from chatStore.ts as part of the structured refactor.
- * This store has zero external dependencies — the simplest slice to extract.
- *
- * @see docs/superpowers/specs/2026-04-02-chatstore-refactor-design.md §3.5
+ * Uses zustand/middleware persist + idb-storage for persistence across refreshes.
 */

 import { create } from 'zustand';
+import { persist, createJSONStorage } from 'zustand/middleware';
+import { createIdbStorageAdapter } from '../../lib/idb-storage';
 import type { ArtifactFile } from '../../components/ai/ArtifactPanel';

 // ---------------------------------------------------------------------------
@@ -33,22 +33,33 @@ export interface ArtifactState {
 // Store
 // ---------------------------------------------------------------------------

-export const useArtifactStore = create<ArtifactState>()((set) => ({
-  artifacts: [],
-  selectedArtifactId: null,
-  artifactPanelOpen: false,
+export const useArtifactStore = create<ArtifactState>()(
+  persist(
+    (set) => ({
+      artifacts: [],
+      selectedArtifactId: null,
+      artifactPanelOpen: false,

-  addArtifact: (artifact: ArtifactFile) =>
-    set((state) => ({
-      artifacts: [...state.artifacts, artifact],
-      selectedArtifactId: artifact.id,
-      artifactPanelOpen: true,
-    })),
+      addArtifact: (artifact: ArtifactFile) =>
+        set((state) => ({
+          artifacts: [...state.artifacts, artifact],
+          selectedArtifactId: artifact.id,
+          artifactPanelOpen: true,
+        })),

-  selectArtifact: (id: string | null) => set({ selectedArtifactId: id }),
+      selectArtifact: (id: string | null) => set({ selectedArtifactId: id }),

-  setArtifactPanelOpen: (open: boolean) => set({ artifactPanelOpen: open }),
+      setArtifactPanelOpen: (open: boolean) => set({ artifactPanelOpen: open }),

-  clearArtifacts: () =>
-    set({ artifacts: [], selectedArtifactId: null, artifactPanelOpen: false }),
-}));
+      clearArtifacts: () =>
+        set({ artifacts: [], selectedArtifactId: null, artifactPanelOpen: false }),
+    }),
+    {
+      name: 'zclaw-artifact-storage',
+      storage: createJSONStorage(() => createIdbStorageAdapter()),
+      partialize: (state) => ({
+        artifacts: state.artifacts,
+      }),
+    },
+  ),
+);
--- a/desktop/src/store/chat/streamStore.ts
+++ b/desktop/src/store/chat/streamStore.ts
@@ -34,11 +34,16 @@ import {
 } from './conversationStore';
 import { useMessageStore } from './messageStore';
 import { useArtifactStore } from './artifactStore';
-import { llmSuggest } from '../../lib/llm-service';
+import { llmSuggest, LLM_PROMPTS } from '../../lib/llm-service';
 import { detectNameSuggestion, detectAgentNameSuggestion } from '../../lib/cold-start-mapper';
+import { fetchSuggestionContext, type SuggestionContext } from '../../lib/suggestion-context';

 const log = createLogger('StreamStore');

+// Module-level prefetch for suggestion context — started during streaming,
+// consumed on stream completion. Saves ~0.5-1s vs fetching after stream ends.
+let _activeSuggestionContextPrefetch: Promise<SuggestionContext> | null = null;
+
 // ---------------------------------------------------------------------------
 // Error formatting — convert raw LLM/API errors to user-friendly messages
 // ---------------------------------------------------------------------------
@@ -214,6 +219,67 @@ class DeltaBuffer {
  }
 }

+// ---------------------------------------------------------------------------
+// Artifact creation from tool output (shared between sendMessage & agent stream)
+// ---------------------------------------------------------------------------
+
+const ARTIFACT_TYPE_MAP: Record<string, 'code' | 'markdown' | 'text' | 'table' | 'image'> = {
+  ts: 'code', tsx: 'code', js: 'code', jsx: 'code',
+  py: 'code', rs: 'code', go: 'code', java: 'code',
+  md: 'markdown', txt: 'text', json: 'code',
+  html: 'code', css: 'code', sql: 'code', sh: 'code',
+  yaml: 'code', yml: 'code', toml: 'code', xml: 'code',
+  csv: 'table', svg: 'image',
+};
+
+const ARTIFACT_LANG_MAP: Record<string, string> = {
+  ts: 'typescript', tsx: 'typescript', js: 'javascript', jsx: 'javascript',
+  py: 'python', rs: 'rust', go: 'go', java: 'java',
+  html: 'html', css: 'css', sql: 'sql', sh: 'bash',
+  json: 'json', yaml: 'yaml', yml: 'yaml', toml: 'toml',
+  xml: 'xml', csv: 'csv', md: 'markdown', txt: 'text',
+};
+
+/** Attempt to create an artifact from a completed tool call. */
+function tryCreateArtifactFromToolOutput(toolName: string, toolInput: string, toolOutput: string): void {
+  if (!toolOutput) return;
+
+  const toolsWithArtifacts = ['file_write', 'write_file', 'str_replace', 'str_replace_editor'];
+  if (!toolsWithArtifacts.includes(toolName)) return;
+
+  try {
+    const parsed = JSON.parse(toolOutput);
+    const filePath = parsed?.path || parsed?.file_path || '';
+    let content = parsed?.content || '';
+
+    // For str_replace tools, content may be in input
+    if (!content && toolInput) {
+      try {
+        const inputParsed = JSON.parse(toolInput);
+        content = inputParsed?.new_text || inputParsed?.content || '';
+      } catch { /* ignore */ }
+    }
+
+    if (!filePath || !content) return;
+
+    // Deduplicate: skip if an artifact with the same path already exists
+    const existing = useArtifactStore.getState().artifacts;
+    if (existing.some(a => a.name === filePath.split('/').pop())) return;
+
+    const fileName = filePath.split('/').pop() || filePath;
+    const ext = fileName.split('.').pop()?.toLowerCase() || '';
+
+    useArtifactStore.getState().addArtifact({
+      id: `artifact_${Date.now()}`,
+      name: fileName,
+      content: typeof content === 'string' ? content : JSON.stringify(content, null, 2),
+      type: ARTIFACT_TYPE_MAP[ext] || 'text',
+      language: ARTIFACT_LANG_MAP[ext],
+      createdAt: new Date(),
+    });
+  } catch { /* non-critical: artifact creation from tool output */ }
+}
+
 // ---------------------------------------------------------------------------
 // Stream event handlers (extracted from sendMessage)
 // ---------------------------------------------------------------------------
@@ -236,38 +302,8 @@ function createToolHandler(assistantId: string, chat: ChatStoreAccess) {
        })
      );

-      // Auto-create artifact when file_write tool produces output
-      if (tool === 'file_write') {
-        try {
-          const parsed = JSON.parse(output);
-          const filePath = parsed?.path || parsed?.file_path || '';
-          const content = parsed?.content || '';
-          if (filePath && content) {
-            const fileName = filePath.split('/').pop() || filePath;
-            const ext = fileName.split('.').pop()?.toLowerCase() || '';
-            const typeMap: Record<string, 'code' | 'markdown' | 'text'> = {
-              ts: 'code', tsx: 'code', js: 'code', jsx: 'code',
-              py: 'code', rs: 'code', go: 'code', java: 'code',
-              md: 'markdown', txt: 'text', json: 'code',
-              html: 'code', css: 'code', sql: 'code', sh: 'code',
-            };
-            const langMap: Record<string, string> = {
-              ts: 'typescript', tsx: 'typescript', js: 'javascript', jsx: 'javascript',
-              py: 'python', rs: 'rust', go: 'go', java: 'java',
-              html: 'html', css: 'css', sql: 'sql', sh: 'bash', json: 'json',
-            };
-            useArtifactStore.getState().addArtifact({
-              id: `artifact_${Date.now()}`,
-              name: fileName,
-              content: typeof content === 'string' ? content : JSON.stringify(content, null, 2),
-              type: typeMap[ext] || 'text',
-              language: langMap[ext],
-              createdAt: new Date(),
-              sourceStepId: assistantId,
-            });
-          }
-        } catch { /* non-critical: artifact creation from tool output */ }
-      }
+      // Auto-create artifact from tool output
+      tryCreateArtifactFromToolOutput(tool, input, output);
    } else {
      // toolStart: create new running step
      const step: ToolCallStep = {
@@ -399,36 +435,50 @@ function createCompleteHandler(
      }
    }

-    // Async memory extraction (independent — failures don't block name detection)
+    // Decoupled: suggestion generation runs immediately with prefetched context,
+    // memory extraction + reflection run independently in background.
    const filtered = msgs
      .filter(m => m.role === 'user' || m.role === 'assistant')
      .map(m => ({ role: m.role, content: m.content }));
    const convId = useConversationStore.getState().currentConversationId;
-    getMemoryExtractor().extractFromConversation(filtered, agentId, convId ?? undefined)
-      .catch(err => log.warn('Memory extraction failed:', err));

-    intelligenceClient.reflection.recordConversation().catch(err => {
-      log.warn('Recording conversation failed:', err);
-    });
-    intelligenceClient.reflection.shouldReflect().then(shouldReflect => {
-      if (shouldReflect) {
-        intelligenceClient.reflection.reflect(agentId, []).catch(err => {
-          log.warn('Reflection failed:', err);
-        });
-      }
-    });
-
-    // Follow-up suggestions (async LLM call with keyword fallback)
+    // Build conversation messages for suggestions
    const latestMsgs = chat.getMessages() || [];
    const conversationMessages = latestMsgs
      .filter(m => m.role === 'user' || m.role === 'assistant')
      .filter(m => !m.streaming)
      .map(m => ({ role: m.role, content: m.content }));

-    generateLLMSuggestions(conversationMessages, set).catch(err => {
-      log.warn('Suggestion generation error:', err);
-      set({ suggestionsLoading: false });
-    });
+    // Consume prefetched context (started in sendMessage during streaming)
+    const prefetchPromise = _activeSuggestionContextPrefetch;
+    _activeSuggestionContextPrefetch = null;
+
+    // Fire suggestion generation immediately — don't wait for memory extraction
+    const fireSuggestions = (ctx?: SuggestionContext) => {
+      generateLLMSuggestions(conversationMessages, set, ctx).catch(err => {
+        log.warn('Suggestion generation error:', err);
+        set({ suggestionsLoading: false });
+      });
+    };
+    if (prefetchPromise) {
+      prefetchPromise.then(fireSuggestions).catch(() => fireSuggestions());
+    } else {
+      fireSuggestions();
+    }
+
+    // Background tasks run independently — never block suggestions
+    getMemoryExtractor().extractFromConversation(filtered, agentId, convId ?? undefined)
+      .catch(err => log.warn('Memory extraction failed:', err));
+    intelligenceClient.reflection.recordConversation()
+      .catch(err => log.warn('Recording conversation failed:', err))
+      .then(() => intelligenceClient.reflection.shouldReflect())
+      .then(shouldReflect => {
+        if (shouldReflect) {
+          intelligenceClient.reflection.reflect(agentId, []).catch(err => {
+            log.warn('Reflection failed:', err);
+          });
+        }
+      }).catch(() => {});
  };
 }

@@ -559,15 +609,32 @@ function parseSuggestionResponse(raw: string): string[] {
 async function generateLLMSuggestions(
  messages: Array<{ role: string; content: string }>,
  set: (partial: Partial<StreamState>) => void,
+  context?: SuggestionContext,
 ): Promise<void> {
  set({ suggestionsLoading: true });

  try {
-    const recentMessages = messages.slice(-6);
-    const context = recentMessages
-      .map(m => `${m.role === 'user' ? '用户' : '助手'}: ${m.content}`)
+    const recentMessages = messages.slice(-20);
+    const conversationContext = recentMessages
+      .map(m => `${m.role === 'user' ? '用户' : '助手'}: ${m.content.slice(0, 200)}`)
      .join('\n\n');

+    // Build dynamic user message with intelligence context
+    const ctx = context ?? { userProfile: '', painPoints: '', experiences: '', skillMatch: '' };
+    const hasContext = ctx.userProfile || ctx.painPoints || ctx.experiences || ctx.skillMatch;
+    let userMessage: string;
+    if (hasContext) {
+      const sections: string[] = ['以下是用户的背景信息，请在生成建议时参考：\n'];
+      if (ctx.userProfile) sections.push(`## 用户画像\n${ctx.userProfile}`);
+      if (ctx.painPoints) sections.push(`## 活跃痛点\n${ctx.painPoints}`);
+      if (ctx.experiences) sections.push(`## 相关经验\n${ctx.experiences}`);
+      if (ctx.skillMatch) sections.push(`## 可用技能\n${ctx.skillMatch}`);
+      sections.push(`\n最近对话：\n${conversationContext}`);
+      userMessage = sections.join('\n\n');
+    } else {
+      userMessage = `以下是对话中最近的消息：\n\n${conversationContext}\n\n请生成 3 个后续问题。`;
+    }
+
    const connectionMode = typeof localStorage !== 'undefined'
      ? localStorage.getItem('zclaw-connection-mode')
      : null;
@@ -575,9 +642,9 @@ async function generateLLMSuggestions(
    let raw: string;

    if (connectionMode === 'saas') {
-      raw = await llmSuggestViaSaaS(context);
+      raw = await llmSuggestViaSaaS(userMessage);
    } else {
-      raw = await llmSuggest(context);
+      raw = await llmSuggest(userMessage);
    }

    const suggestions = parseSuggestionResponse(raw);
@@ -601,7 +668,7 @@ async function generateLLMSuggestions(
 * with non-streaming requests. Collects the full response from SSE deltas,
 * then parses the suggestion JSON from the accumulated text.
 */
-async function llmSuggestViaSaaS(context: string): Promise<string> {
+async function llmSuggestViaSaaS(userMessage: string): Promise<string> {
  const { saasClient } = await import('../../lib/saas-client');
  const { useConversationStore } = await import('./conversationStore');
  const { useSaaSStore } = await import('../saasStore');
@@ -611,9 +678,6 @@ async function llmSuggestViaSaaS(context: string): Promise<string> {
  const model = currentModel || (availableModels.length > 0 ? availableModels[0]?.id : undefined);
  if (!model) throw new Error('No model available for suggestions');

-  // Delay to avoid concurrent relay requests with memory extraction
-  await new Promise(r => setTimeout(r, 2000));
-
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 60000);

@@ -623,7 +687,7 @@ async function llmSuggestViaSaaS(context: string): Promise<string> {
        model,
        messages: [
          { role: 'system', content: LLM_PROMPTS_SYSTEM },
-          { role: 'user', content: `以下是对话中最近的消息：\n\n${context}\n\n请生成 3 个后续问题。` },
+          { role: 'user', content: userMessage },
        ],
        max_tokens: 500,
        temperature: 0.7,
@@ -664,17 +728,7 @@ async function llmSuggestViaSaaS(context: string): Promise<string> {
  }
 }

-const LLM_PROMPTS_SYSTEM = `你是对话分析助手。根据最近的对话内容，生成 3 个用户可能想继续探讨的问题。
-
-要求：
- 每个问题必须与对话内容直接相关，具体且有针对性
- 帮助用户深入理解、实际操作或拓展思路
- 每个问题不超过 30 个中文字符
- 不要重复对话中已讨论过的内容
- 使用与用户相同的语言
-
-只输出 JSON 数组，包含恰好 3 个字符串。不要输出任何其他内容。
-示例：["如何在生产环境中部署？", "这个方案的成本如何？", "有没有更简单的替代方案？"]`;
+const LLM_PROMPTS_SYSTEM = LLM_PROMPTS.suggestions.system;

 // ---------------------------------------------------------------------------
 // ChatStore injection (avoids circular imports)
@@ -786,6 +840,9 @@ export const useStreamStore = create<StreamState>()(
    });
    set({ isStreaming: true, activeRunId: null });

+    // Prefetch suggestion context during streaming — saves ~0.5-1s post-stream
+    _activeSuggestionContextPrefetch = fetchSuggestionContext(agentId, content);
+
    // Delta buffer — batches updates at ~60fps
    const buffer = new DeltaBuffer(assistantId, _chat);

@@ -1001,6 +1058,13 @@ export const useStreamStore = create<StreamState>()(
              return { ...m, toolSteps: steps };
            })
          );
+
+          // Auto-create artifact from tool output (agent stream path)
+          tryCreateArtifactFromToolOutput(
+            delta.tool || 'unknown',
+            delta.toolInput || '',
+            delta.toolOutput,
+          );
        } else {
          // toolStart: create new running step
          const step: ToolCallStep = {
@@ -1059,10 +1123,20 @@ export const useStreamStore = create<StreamState>()(
              .filter(m => !m.streaming)
              .map(m => ({ role: m.role, content: m.content }));

-            generateLLMSuggestions(conversationMessages, set).catch(err => {
-              log.warn('Suggestion generation error:', err);
-              set({ suggestionsLoading: false });
-            });
+            // Path B: use prefetched context for agent stream — fixes zero-personalization
+            const prefetchPromise = _activeSuggestionContextPrefetch;
+            _activeSuggestionContextPrefetch = null;
+            const fireSuggestions = (ctx?: SuggestionContext) => {
+              generateLLMSuggestions(conversationMessages, set, ctx).catch(err => {
+                log.warn('Suggestion generation error:', err);
+                set({ suggestionsLoading: false });
+              });
+            };
+            if (prefetchPromise) {
+              prefetchPromise.then(fireSuggestions).catch(() => fireSuggestions());
+            } else {
+              fireSuggestions();
+            }
          }
        }
      } else if (delta.stream === 'hand') {
--- a/docs/references/artifact-system-reference.md
+++ b/docs/references/artifact-system-reference.md
@@ -0,0 +1,309 @@
+# 产物系统参考文档
+
+> 调研 DeerFlow 和 Hermes Agent 的产物/输出面板实现，为 ZCLAW 产物系统重构提供参考。
+> 分析日期：2026-04-24
+
+---
+
+## 一、DeerFlow 产物系统
+
+DeerFlow 有完整的全栈产物管道，是主要参考对象。
+
+### 1.1 端到端数据流
+
+```
+Agent tool call (write_file / str_replace / present_files)
+    ↓
+Backend: ThreadState.artifacts (LangGraph annotated list, merge_artifacts reducer 去重)
+    ↓ 文件写入: {base_dir}/threads/{thread_id}/user-data/outputs/
+    ↓ 虚拟路径: /mnt/user-data/outputs/filename.ext
+    ↓
+Backend API: GET /api/threads/{thread_id}/artifacts/{virtual_path}
+    ↓ MIME 检测 / .skill ZIP 解压 / download vs inline
+    ↓
+Frontend: thread.values.artifacts (string[]) → ArtifactsProvider context
+    ↓
+ChatBox (ResizablePanelGroup) → chat(60%) | artifact panel(40%)
+    ↓
+ArtifactFileDetail → CodeMirror(代码) / Streamdown(Markdown) / iframe(HTML)
+```
+
+### 1.2 关键文件
+
+#### 前端核心
+
+| 文件 | 职责 |
+|------|------|
+| `frontend/src/core/artifacts/utils.ts` | URL 构建、产物列表提取、路径解析 |
+| `frontend/src/core/artifacts/loader.ts` | 从后端 API 获取产物文本；从 tool call args 直接提取内容 |
+| `frontend/src/core/artifacts/hooks.ts` | TanStack React Query hook，5 分钟缓存 |
+| `frontend/src/components/workspace/artifacts/context.tsx` | ArtifactsProvider + useArtifacts() — 管理列表、选中、开关、自动选中 |
+| `frontend/src/components/workspace/artifacts/artifact-file-detail.tsx` | 产物详情视图：头部(文件选择器+code/preview切换) + CodeEditor/Preview |
+| `frontend/src/components/workspace/artifacts/artifact-file-list.tsx` | 卡片式列表视图，每个卡片含图标/名称/扩展名/下载/安装按钮 |
+| `frontend/src/components/workspace/artifacts/artifact-trigger.tsx` | 头部触发按钮，仅在产物存在时显示 |
+
+#### 前端渲染
+
+| 文件 | 职责 |
+|------|------|
+| `frontend/src/components/workspace/code-editor.tsx` | CodeMirror 只读编辑器，支持 CSS/HTML/JS/JSON/MD/Python 语法高亮 |
+| `frontend/src/components/ai-elements/code-block.tsx` | Shiki 语法高亮代码块，双主题(light/dark) |
+| `frontend/src/components/ai-elements/web-preview.tsx` | iframe 网页预览，含地址栏和导航按钮 |
+| `frontend/src/components/workspace/messages/markdown-content.tsx` | Streamdown 渲染 Markdown (GFM + Math + Raw HTML + KaTeX) |
+| `frontend/src/core/utils/files.tsx` | 140+ 扩展名→语言映射，文件图标/类型判断 |
+
+#### 后端
+
+| 文件 | 职责 |
+|------|------|
+| `backend/.../thread_state.py` | ThreadState.artifacts 列表 + merge_artifacts 去重 reducer |
+| `backend/.../present_file_tool.py` | present_files 工具 — 标准化路径，返回 Command(update) |
+| `backend/.../paths.py` | 路径管理：threads/{id}/user-data/{workspace,uploads,outputs} |
+| `backend/app/gateway/routers/artifacts.py` | FastAPI 路由：GET 产物文件，MIME 检测，安全处理 |
+
+### 1.3 支持的内容类型
+
+| 类型 | 渲染方式 |
+|------|----------|
+| 代码文件 (140+ 扩展名) | CodeMirror 只读 + 语法高亮 |
+| Markdown (.md) | Streamdown (GFM + Math + KaTeX + Raw HTML) |
+| HTML (.html/.htm) | 沙箱 `<iframe>` (srcDoc) |
+| 图片 (.png/.jpg/.svg/.webp) | `<img>` 标签，非代码文件用 iframe |
+| .skill 压缩包 | ZIP 解压，SKILL.md 渲染为 Markdown |
+| 二进制文件 (PDF 等) | 后端 inline Content-Disposition |
+| 文本文件 (.txt/.csv/.log) | CodeMirror 纯文本模式 |
+
+### 1.4 持久化架构
+
+**磁盘存储：**
+```
+{DEER_FLOW_HOME}/threads/{thread_id}/user-data/outputs/
+```
+
+**状态持久化：** artifacts 列表是 LangGraph ThreadState 的一部分，由 checkpoint 系统自动持久化。
+
+**前端缓存：** TanStack React Query，5 分钟 stale time。
+
+### 1.5 UI/UX 设计模式
+
+#### 分栏布局 (chat-box.tsx)
+- `react-resizable-panels` 水平分栏
+- 关闭态：chat=100%, artifacts=0%
+- 打开态：chat=60%, artifacts=40%
+- 300ms CSS 过渡动画
+
+#### 自动打开 + 自动选中
+- 检测到 `write_file` / `str_replace` tool call 时自动打开面板并选中文件
+- `autoOpen` / `autoSelect` 标志防止用户手动关闭后重复打开
+
+#### 代码/预览切换
+- HTML/Markdown 默认 Preview，其他默认 Code
+- Preview 用 Streamdown(MD) 或 iframe(HTML)
+
+#### 头部操作栏
+- 文件选择器下拉菜单（不用返回列表即可切换）
+- 复制 / 下载 / 新窗口打开 / 关闭
+
+#### 聊天内嵌展示
+- `present_files` tool call → 聊天流内渲染卡片网格
+- 点击卡片 → 侧栏打开该文件
+
+#### 双路径方案
+1. **真实文件路径** — 从后端 API 获取，React Query 缓存
+2. **`write-file:` 虚拟路径** — 直接从 tool call args 提取内容，无需后端请求，支持流式显示
+
+### 1.6 Provider 层级
+
+```
+ArtifactsProvider → 提供useArtifacts() context
+  ChatBox → ResizablePanelGroup
+    Panel(chat) → MessageList → ToolCall 自动打开产物面板
+    Panel(artifacts) → ArtifactFileDetail → useArtifactContent() hook
+```
+
+---
+
+## 二、Hermes Agent 产物机制
+
+> **结论：Hermes Agent 无产物面板、无 Web 前端、无分栏布局。** 它是终端 CLI 工具，所有输出在终端内联渲染。但有值得借鉴的大输出处理机制。
+
+### 2.1 项目定位
+
+Hermes Agent 是 **Python CLI/TUI Agent**（类似 Claude Code），通过 prompt_toolkit TUI 运行，同时支持 Telegram/Discord/Slack/WhatsApp 等 IM 平台网关。
+
+**无 React/Next.js/Web UI。** 暴露 OpenAI 兼容 API 供 Open WebUI/LobeChat 等第三方 UI 接入。
+
+### 2.2 大输出处理（3 层防御）
+
+这是唯一接近"产物管理"的机制，值得借鉴。
+
+**文件：`tools/tool_result_storage.py`**
+
+| 层级 | 机制 | 说明 |
+|------|------|------|
+| Layer 1 | 工具自身截断 | 每个工具限制自己的输出长度 |
+| Layer 2 | `maybe_persist_tool_result` | 单个结果超阈值 → 写入沙箱临时文件，上下文中替换为 `<persisted-output>` 预览块 |
+| Layer 3 | `enforce_turn_budget` | 整轮超过 200K 字符 → 最大的几个溢出到磁盘 |
+
+核心逻辑：
+```python
+# 超阈值时：完整内容写入文件，上下文替换为预览
+remote_path = f"{storage_dir}/{tool_use_id}.txt"
+_write_to_sandbox(content, remote_path, env)
+return _build_persisted_message(preview, has_more, len(content), remote_path)
+# 后续 agent 可用 read_file + offset/limit 读取完整内容
+```
+
+### 2.3 预算配置
+
+**文件：`tools/budget_config.py`**
+
+| 参数 | 默认值 |
+|------|--------|
+| `DEFAULT_RESULT_SIZE_CHARS` | 100,000（单工具阈值）|
+| `DEFAULT_TURN_BUDGET_CHARS` | 200,000（整轮上限）|
+| `DEFAULT_PREVIEW_SIZE_CHARS` | 1,500（内联预览长度）|
+
+### 2.4 CLI 渲染方式
+
+**文件：`agent/display.py`**
+
+- **工具进度**：KawaiiSpinner 动画 + 一行摘要
+- **文件编辑**：内联 colored unified diff（write_file / patch 工具）
+- **最终响应**：Rich Panel 边框包裹，主题色可换（7 套 skin）
+
+### 2.5 会话持久化
+
+**文件：`hermes_state.py`**
+
+SQLite (`~/.hermes/state.db`) + FTS5 全文搜索：
+- sessions 表：元数据、模型配置、token 计数、费用、标题
+- messages 表：role、content、tool_call_id、reasoning、时间戳
+
+### 2.6 值得借鉴的点
+
+| 点 | 借鉴价值 |
+|----|----------|
+| 大输出溢出到磁盘 + 内联预览 | 解决 context window 溢出问题 |
+| 3 层递进防御 | 对 ZCLAW 中间件链有参考价值 |
+| 预算配置化 | 阈值可调，不同场景不同策略 |
+
+---
+
+## 三、对比分析：ZCLAW 现状 vs 参考方案
+
+### 3.1 现状差距
+
+| 维度 | DeerFlow | ZCLAW 现状 | 差距 |
+|------|----------|------------|------|
+| 数据源 | 3 个工具(present_files/write_file/str_replace)主动注册 | 仅 streamStore 解析 tool output 的 filePath | 极窄，几乎不触发 |
+| 持久化 | 磁盘文件 + LangGraph checkpoint | 纯内存 Zustand | 刷新即丢失 |
+| 渲染-代码 | CodeMirror 只读 + 语法高亮 (140+ 语言) | 纯 `<pre>` 标签，无高亮 | 无高亮 |
+| 渲染-Markdown | Streamdown (GFM+Math+KaTeX+RawHTML) | 手写 30 行正则渲染器 | 仅标题/粗体/列表 |
+| 渲染-HTML | 沙箱 iframe | 不支持 | 无 |
+| 渲染-图片 | `<img>` + iframe | 类型声明了无实现 | 无 |
+| 渲染-表格 | GFM 表格 | 纯文本 `<pre>` | 无 |
+| 面板布局 | react-resizable-panels 60/40 | react-resizable-panels 65/35 | 已有，可复用 |
+| 自动打开 | write_file/str_replace 触发 | addArtifact 时打开 | 已有 |
+| 文件选择 | 下拉菜单不离开详情视图 | 必须返回列表再选 | 体验差 |
+| 聊天内嵌 | present_files → 卡片网格 | 无 | 缺失 |
+| 缓存 | React Query 5min | 无 | 缺失 |
+| 双路径 | 真实路径 + write-file: 虚拟路径 | 仅运行时内存 | 缺失 |
+| 右面板重叠 | 单一面板 | ArtifactPanel + RightPanel"文件"tab 职责交叉 | 架构问题 |
+
+### 3.2 核心差距总结
+
+**按优先级排列：**
+
+1. **P0 数据源断裂** — 产物几乎没有来源，是最根本的问题
+2. **P0 无持久化** — 产物刷新即丢
+3. **P1 Markdown 渲染残缺** — 30 行正则 vs 完整 GFM 渲染器
+4. **P1 代码无语法高亮** — 纯 `<pre>` vs CodeMirror/Shiki
+5. **P2 双面板职责交叉** — ArtifactPanel vs RightPanel"文件"tab
+6. **P2 缺少详情内文件切换** — 需返回列表才能切换文件
+7. **P3 聊天内嵌产物卡片缺失**
+8. **P3 HTML/图片/表格渲染缺失**
+
+### 3.3 推荐方案
+
+#### 方案 A：最小可行（基于现有架构补全）
+
+在现有 ArtifactPanel + artifactStore 上修补：
+
+- **数据源**：扩展 streamStore 中的 tool output 解析，覆盖更多工具类型
+- **持久化**：artifactStore 追加 IndexedDB 写入（复用 messageStore 模式）
+- **Markdown**：引入 `react-markdown` + `remark-gfm` 替换手写渲染器
+- **代码高亮**：引入 `shiki` 或 `highlight.js`
+- **合并面板**：RightPanel "文件"tab 功能合并到 ArtifactPanel，删除 RightPanel 的 files tab
+
+**工作量**：~2-3 天
+
+#### 方案 B：参照 DeerFlow 重构（推荐）
+
+借鉴 DeerFlow 架构但适配 ZCLAW Tauri 本地架构：
+
+| DeerFlow 组件 | ZCLAW 适配 |
+|---------------|------------|
+| FastAPI 产物路由 | Tauri 命令 `artifact_list` / `artifact_read` / `artifact_serve` |
+| 磁盘 outputs/ 目录 | `{workspace}/artifacts/{session_key}/` |
+| LangGraph checkpoint | SQLite (已有 zclaw-memory) |
+| React Query 缓存 | TanStack Query 或 Zustand + stale cache |
+| CodeMirror 只读 | 引入 @uiw/react-codemirror |
+| Streamdown MD | react-markdown + remark-gfm + rehype-katex |
+| iframe HTML 预览 | Tauri webview window (安全隔离) |
+
+**核心改动清单：**
+
+1. **Rust 侧**（zclaw-kernel）：
+   - 新增 `artifact_create` / `artifact_list` / `artifact_read` Tauri 命令
+   - 产物写入 `{workspace}/artifacts/{session_key}/`
+   - 中间件链中 ToolEnd 事件触发产物注册
+
+2. **前端 Store**：
+   - artifactStore 增加 IndexedDB 持久化
+   - 从 streamStore 解耦产物创建逻辑到独立 hook
+
+3. **前端组件**：
+   - 替换 MarkdownPreview → react-markdown + GFM
+   - 引入 CodeMirror/shiki 代码高亮
+   - 详情视图增加文件下拉切换
+   - RightPanel "文件" tab 合并或移除
+
+**工作量**：~5-7 天
+
+#### 方案 C：借鉴 Hermes 防御机制（附加）
+
+无论选 A 还是 B，都可叠加 Hermes 的大输出防御：
+
+- 中间件链 ToolOutputGuard 层增加溢出检测
+- 超阈值产物自动持久化到磁盘，上下文替换为 `<persisted-output>` 预览
+- agent 可通过 read_file 回读完整内容
+
+---
+
+## 四、关键依赖库参考
+
+| 库 | 用途 | DeerFlow 使用 | 推荐 |
+|----|------|--------------|------|
+| react-markdown | Markdown 渲染 | ✅ (Streamdown) | ✅ |
+| remark-gfm | GFM 表格/删除线/任务列表 | ✅ | ✅ |
+| rehype-katex | 数学公式渲染 | ✅ | 按需 |
+| @uiw/react-codemirror | 代码编辑器/高亮 | ✅ | ✅ |
+| shiki | 静态代码高亮 | ✅ (chat 内代码块) | ✅ |
+| react-resizable-panels | 分栏布局 | ✅ | 已有 |
+| @tanstack/react-query | 数据缓存 | ✅ | 可选 |
+
+---
+
+## 五、文件索引
+
+| 参考项目 | 关键路径 |
+|----------|----------|
+| DeerFlow 前端 | `G:/deerflow/frontend/src/components/workspace/artifacts/` |
+| DeerFlow 前端工具 | `G:/deerflow/frontend/src/core/artifacts/` |
+| DeerFlow 布局 | `G:/deerflow/frontend/src/components/workspace/chats/chat-box.tsx` |
+| DeerFlow 代码编辑 | `G:/deerflow/frontend/src/components/workspace/code-editor.tsx` |
+| DeerFlow 后端路由 | `G:/deerflow/backend/app/gateway/routers/artifacts.py` |
+| DeerFlow 后端工具 | `G:/deerflow/backend/packages/harness/deerflow/tools/builtins/present_file_tool.py` |
+| Hermes 输出管理 | `G:/hermes-agent-main/tools/tool_result_storage.py` |
+| Hermes 预算配置 | `G:/hermes-agent-main/tools/budget_config.py` |
--- a/docs/references/deerflow-toolcall-reference.md
+++ b/docs/references/deerflow-toolcall-reference.md
@@ -0,0 +1,212 @@
+# DeerFlow 工具调用系统参考文档
+
+> 调研 DeerFlow 的工具调用完整流程，为 ZCLAW 工具调用问题排查提供参考。
+> 分析日期：2026-04-24
+
+---
+
+## 一、端到端数据流
+
+```
+用户消息
+  → FastAPI Gateway (/api/threads/{id}/runs/stream)
+    → services.start_run() → asyncio.create_task(run_agent(...))
+      → LangGraph Agent Graph (create_agent)
+        → LLM Model (ChatOpenAI / Claude)
+          → AIMessage (含 tool_calls 列表)
+            → 14 层 Middleware 链处理
+              → ToolNode (LangGraph 内置, 按 tool_call.name 路由)
+                → ToolMessage (执行结果)
+                  → 再次调用 LLM (带着 ToolMessage 继续)
+                    → StreamBridge.publish() → asyncio.Queue
+                      → SSE → 前端 useStream hook
+                        → React 组件渲染
+```
+
+## 二、工具注册与执行
+
+### 2.1 注册入口
+
+**文件**: `G:/deerflow/backend/packages/harness/deerflow/tools/tools.py` — `get_available_tools()`
+
+工具来自四个来源：
+
+| 来源 | 加载方式 | 示例 |
+|------|----------|------|
+| Config 工具 | YAML 配置 + 反射导入 (`module:variable`) | `deerflow.sandbox.tools:bash_tool` |
+| Builtin 工具 | 硬编码导入 | `present_file_tool`, `ask_clarification_tool` |
+| MCP 工具 | `MultiServerMCPClient` 从 MCP 服务器缓存获取 | 第三方 MCP 工具 |
+| ACP 工具 | `build_invoke_acp_agent_tool()` 动态构建 | 外部 agent 调用 |
+
+### 2.2 Sandbox 工具清单
+
+**文件**: `G:/deerflow/backend/packages/harness/deerflow/sandbox/tools.py`
+
+| 工具名 | 功能 |
+|--------|------|
+| `bash` | 沙箱中执行命令 |
+| `ls` | 列出目录 |
+| `read_file` | 读取文件 |
+| `write_file` | 写入文件（触发产物面板自动打开） |
+| `str_replace` | 字符串替换（触发产物面板自动打开） |
+
+### 2.3 Builtin 工具
+
+**文件**: `G:/deerflow/backend/packages/harness/deerflow/tools/builtins/`
+
+| 工具 | 功能 |
+|------|------|
+| `ask_clarification` | 向用户提问澄清（中断执行等待回复） |
+| `present_file` | 展示文件给用户（触发产物卡片） |
+| `setup_agent` | 自定义 agent 创建 |
+| `task_tool` | 子 agent 任务委派 |
+| `view_image` | 图片查看（仅视觉模型） |
+| `tool_search` | 延迟工具搜索（MCP 工具按需暴露） |
+
+## 三、中间件链（14 层）
+
+**文件**: `G:/deerflow/backend/packages/harness/deerflow/agents/lead_agent/agent.py` — `_build_middlewares()`
+
+与工具调用相关的关键中间件：
+
+### 3.1 DanglingToolCallMiddleware
+
+**文件**: `dangling_tool_call_middleware.py`
+
+在 `wrap_model_call` 中检测消息历史中缺失 ToolMessage 的 AIMessage，自动注入占位 ToolMessage：
+```python
+ToolMessage(
+    content="[Tool call was interrupted and did not return a result.]",
+    tool_call_id=tc_id,
+    name=tc.get("name", "unknown"),
+    status="error",
+)
+```
+
+### 3.2 ToolErrorHandlingMiddleware
+
+**文件**: `tool_error_handling_middleware.py`
+
+在 `wrap_tool_call` 中捕获工具执行异常，转换为错误 ToolMessage 而非让整个 run 崩溃。
+
+### 3.3 LoopDetectionMiddleware
+
+**文件**: `loop_detection_middleware.py`
+
+在 `after_model` 中检测重复工具调用：
+- 阈值 3 次 → 注入警告 HumanMessage
+- 阈值 5 次 → 直接清空 tool_calls，强制 LLM 产出文本回答
+
+### 3.4 DeferredToolFilterMiddleware
+
+**文件**: `deferred_tool_filter_middleware.py`
+
+在 `wrap_model_call` 中过滤延迟注册的 MCP 工具 schema，仅在 LLM 通过 `tool_search` 发现后才暴露。
+
+### 3.5 ClarificationMiddleware
+
+拦截 `ask_clarification` 工具调用，中断执行等待用户回复。
+
+### 3.6 SubagentLimitMiddleware
+
+截断过多的并行子 agent 调用。
+
+## 四、工具结果回传
+
+### 4.1 格式
+
+LangChain 的 `ToolMessage`，包含：
+- `content`: 执行结果文本
+- `tool_call_id`: 匹配 AIMessage 中的 tool_call ID
+- `name`: 工具名称
+- `status`: `"error"` 或省略
+
+### 4.2 特殊工具
+
+`present_file_tool` 返回 `Command` 而非纯字符串，同时更新 `artifacts` 和 `messages` 两个 state channel。
+
+## 五、前端工具调用展示
+
+### 5.1 消息分组
+
+**文件**: `G:/deerflow/frontend/src/core/messages/utils.ts` — `groupMessages()`
+
+| 分组类型 | 触发条件 | 展示 |
+|----------|----------|------|
+| `assistant:processing` | AI 消息含 tool_calls 或 reasoning | MessageGroup (折叠) |
+| `assistant` | AI 消息有文本无 tool_calls | MessageListItem (气泡) |
+| `assistant:present-files` | 含 present_files tool call | ArtifactFileList |
+| `assistant:clarification` | ask_clarification 结果 | MarkdownContent |
+| `assistant:subagent` | 含 task tool call | SubtaskCard |
+
+### 5.2 工具状态推断
+
+前端**没有显式状态机**。通过消息序列推断：
+- AI 消息含 tool_calls 但无对应 ToolMessage → 正在执行
+- ToolMessage 出现 → 执行完成
+- `assistant:processing` 组由 `ChainOfThought` 折叠组件包裹
+
+### 5.3 工具调用 UI
+
+**文件**: `message-group.tsx` 第 186-423 行
+
+按工具名渲染不同图标和内容：
+- `bash` → 终端图标 + 命令代码块
+- `read_file`/`write_file`/`str_replace` → 文件图标 + 路径链接（点击打开产物面板）
+- `web_search` → 搜索图标 + 结果链接
+- 默认 → 扳手图标 + 工具名
+
+## 六、流式处理中的工具调用
+
+### 6.1 架构
+
+```
+agent.astream(stream_mode=["values"])
+  → StreamBridge (asyncio.Queue per run, maxsize=256)
+    → sse_consumer() → SSE frames → 前端
+```
+
+### 6.2 关键特征
+
+- 工具调用**不中断**流。LangGraph 自动在 agent_node 和 tool_node 之间路由
+- 每次状态变更产出完整的 `values` 快照，前端通过 `seen_ids` 去重
+- 15 秒心跳包保持 SSE 连接
+
+### 6.3 前端看到的事件序列
+
+1. `values` 事件: 含 `tool_calls` 的 AIMessage
+2. `values` 事件: ToolMessage（工具结果）
+3. `values` 事件: LLM 基于工具结果的最终回答
+
+整个过程连续，不中断 SSE 连接。
+
+## 七、与 ZCLAW 对比（工具调用）
+
+| 维度 | DeerFlow | ZCLAW |
+|------|----------|-------|
+| 框架 | LangGraph (graph-based) | 自研 loop_runner (循环) |
+| 工具生命周期 | LangGraph ToolNode 自动管理 | 手动 ToolRegistry + loop_runner |
+| after_tool_call 中间件 | ✅ wrap_tool_call 钩子完整 | ❌ 流式和非流式模式均未调用 |
+| 并行工具执行 | LangGraph 自动处理 | 非流式有 JoinSet，流式全串行 |
+| 悬挂修复 | DanglingToolCallMiddleware | DanglingToolMiddleware (有) |
+| 错误恢复 | ToolErrorHandlingMiddleware (异常→ToolMessage) | ToolErrorMiddleware (计数器) |
+| 循环检测 | LoopDetectionMiddleware (3次警告/5次强停) | LoopGuardMiddleware (有) |
+| 前端状态 | 消息序列推断 | 显式 ToolCallStep 状态机 |
+| MCP 工具 | 延迟注册 + tool_search 按需暴露 | 全量注册 |
+
+## 八、关键文件索引
+
+| 功能 | DeerFlow 文件 |
+|------|-------------|
+| Agent 工厂 | `backend/packages/harness/deerflow/agents/lead_agent/agent.py` |
+| 中间件组装 | `backend/packages/harness/deerflow/agents/factory.py` |
+| 工具注册 | `backend/packages/harness/deerflow/tools/tools.py` |
+| Sandbox 工具 | `backend/packages/harness/deerflow/sandbox/tools.py` |
+| Builtin 工具 | `backend/packages/harness/deerflow/tools/builtins/` |
+| 错误处理中间件 | `agents/middlewares/tool_error_handling_middleware.py` |
+| 悬挂修复中间件 | `agents/middlewares/dangling_tool_call_middleware.py` |
+| 循环检测中间件 | `agents/middlewares/loop_detection_middleware.py` |
+| 延迟过滤中间件 | `agents/middlewares/deferred_tool_filter_middleware.py` |
+| 流式 Bridge | `runtime/stream_bridge/memory.py` |
+| 前端消息分组 | `frontend/src/core/messages/utils.ts` |
+| 前端工具调用组件 | `frontend/src/components/workspace/messages/message-group.tsx` |
--- a/docs/references/zclaw-toolcall-issues.md
+++ b/docs/references/zclaw-toolcall-issues.md
@@ -0,0 +1,141 @@
+# ZCLAW 工具调用问题分析
+
+> 对比 DeerFlow 工具调用系统，排查 ZCLAW 工具调用问题。
+> 分析日期：2026-04-24
+> 更新日期：2026-04-24（P0+P0-stream_errored 已修复）
+
+---
+
+## 一、发现的问题
+
+### P0: `after_tool_call` 中间件从未被调用 — ✅ 已修复 (2026-04-24)
+
+**文件**: `crates/zclaw-runtime/src/loop_runner.rs`
+
+在 `run()`（非流式，第 400-558 行）和 `run_streaming`（流式，第 893-1070 行）中，工具执行后直接 push `Message::tool_result` 到消息历史，**没有调用 `middleware_chain.run_after_tool_call()`**。
+
+**影响**:
+- `ToolErrorMiddleware.after_tool_call` 的错误计数和恢复消息逻辑不生效
+- `ToolOutputGuardMiddleware.after_tool_call` 的敏感信息检测不生效
+- 工具错误只能靠工具自身的错误返回传递，中间件层的防护形同虚设
+
+**DeerFlow 对比**: `ToolErrorHandlingMiddleware` 通过 `wrap_tool_call` 钩子完整包裹每次工具执行。
+
+### P0: `stream_errored` 跳过所有工具执行 — ✅ 已修复 (2026-04-24)
+
+**文件**: `crates/zclaw-runtime/src/loop_runner.rs` 第 872-876 行
+
+流式模式中，当 LLM 流出现任何错误（网络超时、API 错误、驱动错误）时，`stream_errored = true`，然后 `break 'outer` 直接退出循环，**跳过所有已解析的工具调用**。
+
+**影响**:
+- ToolStart 事件已发送给前端（用户看到"执行中"按钮），但工具从未实际执行
+- ToolEnd 事件永远不会发送 → 前端工具状态卡在"执行中"
+- 已完整接收（ToolUseEnd）的工具调用也被丢弃
+
+**修复**: 区分完整工具（收到 ToolUseEnd）和不完整工具（仅收到 ToolUseStart/Delta）。完整工具照常执行，不完整工具发送取消 ToolEnd 事件。
+
+### P1: 流式模式工具全串行 — ✅ 已修复 (2026-04-24)
+
+**文件**: `loop_runner.rs` 流式模式工具执行段
+
+非流式模式有 `JoinSet` + `Semaphore(3)` 并行执行 ReadOnly 工具，但流式模式用简单 `for` 循环串行执行所有工具。
+
+**修复**: 流式模式采用三阶段执行：Phase 1 中间件预检(serial) → Phase 2 并行+串行分区执行 → Phase 3 after_tool_call + 结果排序推送。
+
+### P2: OpenAI 驱动工具参数静默替换 — ✅ 已修复 (2026-04-24)
+
+**文件**: `crates/zclaw-runtime/src/driver/openai.rs` 第 222-228 行
+
+```rust
+let parsed_args = if args.is_empty() {
+    serde_json::json!({})
+} else {
+    serde_json::from_str(args).unwrap_or_else(|e| {
+        tracing::warn!("Failed to parse tool args '{}': {}", args, e);
+        serde_json::json!({})
+    })
+};
+```
+
+JSON 解析失败时静默替换为 `{}`，结合 loop_runner.rs 的空参数处理（第 412-423 行），会注入 `_fallback_query` 替代实际参数。
+
+**修复**: 解析失败时返回 `_parse_error` + `_raw_args` 字段，让工具和 LLM 能感知到参数问题并自我修正。
+
+### P2: ToolOutputGuard 过于激进 — ✅ 已修复 (2026-04-24)
+
+**文件**: `crates/zclaw-runtime/src/middleware/tool_output_guard.rs` 第 109 行
+
+使用 `to_lowercase()` 匹配敏感模式，合法内容中包含 "password"、"system:" 等字符串会被误拦。
+
+**修复**: 改用 `regex` 精确匹配实际密钥值格式（如 `sk-[a-zA-Z0-9]{20,}`、`AKIA[A-Z0-9]{16}`、`key=value` 模式），不再误拦仅包含关键词的合法内容。移除了 "system:" 等过于宽泛的注入检测模式。
+
+### P2: ToolErrorMiddleware 失败计数器是全局的 — ✅ 已修复 (2026-04-24)
+
+**文件**: `crates/zclaw-runtime/src/middleware/tool_error.rs` 第 27 行
+
+`consecutive_failures: AtomicU32` 是结构体字段，所有 session 共享。高并发下 A session 失败 2 次 + B session 失败 1 次就会触发 AbortLoop（阈值 3）。
+
+**修复**: 改用 `Mutex<HashMap<String, u32>>` 以 session_id 为 key 存储计数，每个会话独立跟踪。
+
+### P3: Gateway 客户端 onTool 回调语义不一致 — ✅ 已修复 (2026-04-24)
+
+**文件**: `desktop/src/lib/gateway-client.ts` 第 698-707 行
+
+`tool_call` 和 `tool_result` 两个 case 共用 `onTool` 回调，但参数约定不同，调用者必须通过 `output` 是否为空判断 start/end。
+
+**修复**: 明确 `tool_call` 的 output 始终为 `''`（修复了可能传递 data.output 的问题），添加清晰注释说明 start/end 语义约定。
+
+---
+
+## 二、根因分析
+
+工具调用问题最常见的故障模式：
+
+1. **LLM 返回的 tool_call 参数格式错误** → OpenAI 驱动静默替换为 `{}` → 工具以空参数执行 → 结果不符合预期
+2. **工具执行异常** → after_tool_call 中间件未调用 → 错误未格式化 → LLM 收到原始错误信息无法恢复
+3. **流被中断后重连** → DanglingToolMiddleware 修复悬挂 → 但如果修复逻辑本身有 bug（如重复修补），会导致消息膨胀
+
+## 三、修复建议
+
+### 修复 1: 在 loop_runner 中调用 after_tool_call
+
+**优先级**: P0
+**影响文件**: `loop_runner.rs`
+
+在非流式模式的工具执行循环中（约第 530 行），工具执行后调用：
+```rust
+let after_result = middleware_chain.run_after_tool_call(
+    &name, &input_json, &output_str, &mut ctx
+).await;
+```
+
+在流式模式的工具执行后（约第 1020 行），同样调用。
+
+### 修复 2: 将 ToolErrorMiddleware 计数器改为 per-session
+
+**优先级**: P2
+**影响文件**: `middleware/tool_error.rs`
+
+使用 `HashMap<String, u32>` 以 session_id 为 key 存储计数。
+
+### 修复 3: ToolOutputGuard 改为精确匹配
+
+**优先级**: P2
+**影响文件**: `middleware/tool_output_guard.rs`
+
+只在检测到独立的密钥值时触发（如 `sk-[48字符]`），而非单词级匹配。
+
+---
+
+## 四、关键文件
+
+| 文件 | 作用 |
+|------|------|
+| `crates/zclaw-runtime/src/loop_runner.rs` | 主循环，工具调度 |
+| `crates/zclaw-runtime/src/tool.rs` | ToolRegistry + Tool trait |
+| `crates/zclaw-runtime/src/middleware/tool_error.rs` | 工具错误处理 |
+| `crates/zclaw-runtime/src/middleware/tool_output_guard.rs` | 输出安全检查 |
+| `crates/zclaw-runtime/src/middleware/dangling_tool.rs` | 断裂工具修复 |
+| `crates/zclaw-runtime/src/driver/openai.rs` | OpenAI 兼容驱动 |
+| `desktop/src/lib/gateway-client.ts` | 前端通信客户端 |
+| `desktop/src/store/chat/streamStore.ts` | 前端流式处理 |
--- a/wiki/chat.md
+++ b/wiki/chat.md
@@ -1,6 +1,6 @@
 ---
 title: 聊天系统
-updated: 2026-04-22
+updated: 2026-04-23
 status: active
 tags: [module, chat, stream]
 ---
@@ -17,6 +17,7 @@ tags: [module, chat, stream]
 | 5 Store 拆分 | 原 908 行 ChatStore → stream/conversation/message/chat/artifact，单一职责 |
 | 5 分钟超时守护 | 防止流挂起: kernel-chat.ts:76，超时自动 cancelStream |
 | 统一回调接口 | 3 种实现共享 `{ onDelta, onThinkingDelta, onTool, onHand, onComplete, onError }` |
+| LLM 动态建议 | 替换硬编码关键词匹配，用 LLM 生成个性化建议（1深入追问+1实用行动+1管家关怀），4路并行预取智能上下文 |

 ### ChatStream 实现

@@ -33,11 +34,14 @@ tags: [module, chat, stream]

 | 文件 | 职责 |
 |------|------|
-| `desktop/src/store/chat/streamStore.ts` | 流式消息编排、发送、取消 |
+| `desktop/src/store/chat/streamStore.ts` | 流式消息编排、发送、取消、LLM 动态建议生成 |
 | `desktop/src/store/chat/conversationStore.ts` | 会话管理、当前模型、sessionKey |
 | `desktop/src/store/chat/messageStore.ts` | 消息持久化 (IndexedDB) |
 | `desktop/src/lib/kernel-chat.ts` | KernelClient ChatStream (Tauri) |
+| `desktop/src/lib/suggestion-context.ts` | 4路并行智能上下文拉取 (用户画像/痛点/经验/技能匹配) |
+| `desktop/src/lib/cold-start-mapper.ts` | 冷启动配置映射 (行业检测/命名/个性/技能) |
 | `desktop/src/components/ChatArea.tsx` | 聊天区域 UI |
+| `desktop/src/components/ai/SuggestionChips.tsx` | 动态建议芯片展示 |
 | `crates/zclaw-runtime/src/loop_runner.rs` | Rust 主聊天循环 + 中间件链 |

 ### 发送消息流
@@ -100,6 +104,20 @@ UI 选择模型 → conversationStore.currentModel = newModel
 - cancelStream 设置原子标志位，与 onDelta 回调无竞态
 - 3 种 ChatStream 共享同一套回调接口，上层代码无需感知实现差异
 - 消息持久化走 messageStore → IndexedDB，与流式渲染解耦
+- 动态建议 4 路并行预取 (userProfile/painPoints/experiences/skillMatch)，500ms 超时降级为空串
+- 建议生成与 memory extraction 解耦 — 不等 memory LLM 调用完成即启动建议
+
+### LLM 动态建议
+
+```
+sendMessage → isStreaming=true + _activeSuggestionContextPrefetch = fetchSuggestionContext(...)
+  → 流式响应中 prefetch 在后台执行
+onComplete → createCompleteHandler
+  → generateLLMSuggestions(prefetchedContext) — 立即启动不等 memory
+    → prompt: 1 深入追问 + 1 实用行动 + 1 管家关怀
+  → memory/reflection 后台独立运行 (Promise.all)
+  → SuggestionChips 渲染
+```

 ### Tauri 命令

@@ -114,6 +132,8 @@ UI 选择模型 → conversationStore.currentModel = newModel

 | 问题 | 状态 | 说明 |
 |------|------|------|
+| after_tool_call 中间件未调用 | ✅ 已修复 (04-24) | 流式+非流式均添加调用，ToolErrorMiddleware/ToolOutputGuard 现在生效 |
+| stream_errored 跳过所有工具 | ✅ 已修复 (04-24) | 完整工具照常执行，不完整工具发送取消事件 |
 | B-CHAT-07 混合域截断 | P2 Open | 跨域消息时可能截断上下文 |
 | SSE Token 统计为 0 | ✅ 已修复 | SseUsageCapture stream_done flag |
 | Tauri invoke 参数名 | ✅ 已修复 (f6c5dd2) | camelCase 格式 |
@@ -122,14 +142,15 @@ UI 选择模型 → conversationStore.currentModel = newModel
 **注意事项:**
 - 辅助 LLM 调用 (记忆摘要/提取、管家路由) 复用 `kernel_init` 的 model+base_url，与聊天同链路
 - 课堂聊天是独立 Tauri 命令 (`classroom_chat`)，不走 `agent_chat_stream`
+- Agent tab 已移除 — 跨会话身份由 soul.md 接管，不再通过 RightPanel 管理

 ## 5. 变更日志

 | 日期 | 变更 |
 |------|------|
+| 04-24 | 工具调用 P0 修复: after_tool_call 中间件接入(流式+非流式) + stream_errored 工具抢救(完整工具执行+不完整工具取消) |
+| 04-24 | 产物系统优化: MarkdownRenderer 提取共享 + ArtifactPanel react-markdown 渲染 + 文件选择器下拉 + 数据源扩展(file_write/str_replace 两路径) + artifactStore IndexedDB 持久化 |
+| 04-23 | 建议 prefetch: sendMessage 时启动 context 预取，流结束后立即消费，不等 memory extraction |
+| 04-23 | 建议 prompt 重写: 1深入追问+1实用行动+1管家关怀，上下文窗口 6→20 条 |
 | 04-23 | 身份信号: detectAgentNameSuggestion 前端即时检测 + RightPanel 监听 Tauri 事件刷新名称 |
-| 04-22 | Wiki 重写: 5 节模板，增加集成契约和不变量 |
-| 04-21 | 上一轮更新 |
-| 04-17 | ChatStore 拆分为 5 Store (stream/conversation/message/chat/artifact) |
-| 04-16 | Provider Key 解密修复 (b69dc61) |
-| 04-16 | Tauri invoke 参数名修复 (f6c5dd2) |
+| 04-23 | Agent tab 移除: RightPanel 清理 ~280 行 dead code，身份由 soul.md 接管 |
--- a/wiki/hands-skills.md
+++ b/wiki/hands-skills.md
@@ -133,6 +133,18 @@ skills/ -> SkillRegistry 加载 -> SkillIndexMiddleware@200 注入系统提示
 - MCP 限定名 `service_name.tool_name` 避免与内置工具冲突
 - 已删除空壳 Hands (04-17): Whiteboard/Slideshow/Speech，净减 ~5400 行

+### ⚡ 新增工具/技能必须声明 concurrency 级别
+
+`Tool` trait 的 `concurrency()` 方法决定并行执行策略 (04-24 Hermes Phase 2A):
+
+| 级别 | 含义 | 适用场景 |
+|------|------|---------|
+| `ReadOnly` (默认) | 只读，始终可并行 | file_read, web_search, calculator |
+| `Exclusive` | 有副作用，必须串行 | file_write, shell_exec, send_message, execute_skill, task |
+| `Interactive` | 需要用户交互，永不并行 | ask_clarification |
+
+**新增工具时**：在 `impl Tool for YourTool` 中覆盖 `concurrency()` 方法。默认 `ReadOnly`，如果有写操作/副作用必须返回 `ToolConcurrency::Exclusive`。未正确声明会导致并行执行时产生竞态条件。
+
 ## 4. 活跃问题 + 陷阱

 ### 活跃
@@ -155,6 +167,7 @@ skills/ -> SkillRegistry 加载 -> SkillIndexMiddleware@200 注入系统提示

 | 日期 | 变更 | 关联 |
 |------|------|------|
+| 2026-04-24 | Hermes Phase 2A: ToolConcurrency 枚举 + 并行执行 + concurrency() 声明要求 | commit 9060935 |
 | 2026-04-22 | Wiki 5-section 重构: 281->~195 行，语义路由细节引用 [[butler]] | wiki/ |
 | 2026-04-22 | Researcher 搜索修复: schema 扁平化 + 空参数回退 + 排版修复 | commit 5816f56+81005c3 |
 | 2026-04-17 | 空壳 Hand 清理: Whiteboard/Slideshow/Speech 删除，净减 ~5400 行 | Phase 5 清理 |
--- a/wiki/index.md
+++ b/wiki/index.md
@@ -1,6 +1,6 @@
 ---
 title: ZCLAW 项目知识库
-updated: 2026-04-22
+updated: 2026-04-24
 status: active
 ---

@@ -8,29 +8,29 @@ status: active

 > 面向中文用户的 AI Agent 桌面客户端。管家模式 + 多模型 + 7 自主能力 + 75 技能。
 > **使用方式**: 找到你要处理的模块，读对应页面，直接开始工作。
-> **数据来源**: 2026-04-22 代码全量扫描验证，非文档推测。
+> **数据来源**: 2026-04-23 代码全量扫描验证，非文档推测。

 ## 项目画像

 | 维度 | 值 |
 |------|-----|
 | 定位 | AI Agent 桌面客户端 (Tauri 2.x) |
-| 技术栈 | Rust 10 crates + src-tauri (~102K行, 357 .rs) + React 19 + TypeScript + PostgreSQL |
+| 技术栈 | Rust 10 crates + src-tauri (~148K行, 384 .rs) + React 19 + TypeScript + PostgreSQL |
 | 阶段 | 发布前稳定化，功能冻结中 |

-## 关键数字（2026-04-22 代码验证）
+## 关键数字（2026-04-23 代码验证）

 | 指标 | 值 |
 |------|-----|
 | Rust Crates | 10 + src-tauri |
-| Rust 代码 | 101,967 行 (357 .rs文件) |
-| Rust 测试 | 987 定义 / 797 通过 |
-| Tauri 命令 | 190 定义 / 97 @reserved / 104 invoke |
+| Rust 代码 | 148,185 行 (384 .rs文件) |
+| Rust 测试 | 997 定义 (619 #[test] + 378 #[tokio::test]) |
+| Tauri 命令 | 193 定义 / 104 invoke |
 | SaaS API | 137 .route() / 16 模块 / 38 SQL 迁移 / 42 表 |
 | 中间件 | 14 层 runtime + 10 层 SaaS HTTP |
 | SKILL / HAND | 75 技能目录 / 7 注册 Hand (6 TOML + _reminder) |
 | Pipeline | 18 YAML 模板 (8 目录) |
-| 前端 | 25 Store / 102 组件 / 75 lib / 17 Admin 页面 |
+| 前端 | 25 Store / 103 组件 / 78 lib / 17 Admin 页面 |
 | Intelligence | 16 .rs 文件 |
 | 质量指标 | 0 cargo warnings / 2 TODO/FIXME / 0 dead_code |

@@ -38,13 +38,13 @@ status: active

 | 类别 | 功能 | 入口 | Wiki |
 |------|------|------|------|
-| 对话 | 发消息、流式响应、多模型切换 | 聊天面板 | [[chat]] |
-| 分身 | 创建/切换/配置 Agent | 侧边栏 Agent 列表 | [[chat]] |
+| 对话 | 发消息、流式响应、多模型切换、LLM 动态建议 | 聊天面板 | [[chat]] |
+| 分身 | 创建/切换/配置 Agent、跨会话身份记忆 (soul.md) | 侧边栏 Agent 列表 | [[chat]] |
 | 自主 | 触发 Browser/Collector/Twitter 等 | 自动化面板 | [[hands-skills]] |
-| 记忆 | 搜索历史、自动注入上下文 | 设置 > 语义记忆 | [[memory]] |
+| 记忆 | 搜索历史、自动注入上下文、身份信号提取 | 设置 > 语义记忆 | [[memory]] |
 | 配置 | 模型/API/工作区/安全存储 | 设置面板 (19 页) | [[development]] |
 | SaaS | 登录注册、订阅计费、Admin 管理 | SaaS 平台 / Admin 后台 | [[saas]] |
-| 管家 | 痛点积累、行业配置、简洁/专业模式 | 聊天面板 (默认模式) | [[butler]] |
+| 管家 | 痛点积累、行业配置、简洁/专业模式、跨会话身份、动态建议 | 聊天面板 (默认模式) | [[butler]] |
 | Pipeline | YAML 模板选择、配置、DAG 执行 | 工作流面板 | [[pipeline]] |
 | 安全 | JWT 认证、TOTP 2FA、操作审计 | 设置 > 安全存储 | [[security]] |
 | 数据 | PostgreSQL (42表) + SQLite/FTS5 (本地记忆) | — | [[data-model]] |
@@ -97,5 +97,7 @@ ZCLAW
 | Agent 创建失败 | [[chat]] | [[saas]] | 权限或持久化问题 |
 | Pipeline 执行卡住 | [[pipeline]] | [[middleware]] | DAG 循环 / 依赖缺失 |
 | Admin 页面 403 | [[saas]] | [[security]] | JWT 过期 / admin_guard 拦截 |
+| Agent 名字不记住 | [[butler]] | [[memory]] | soul.md 写入失败 / identity signal 未提取 |
+| 建议不个性化 | [[chat]] | [[butler]] | 4路上下文超时 / ExperienceExtractor 未初始化 |

 > 数字真相源: `docs/TRUTH.md` — 如有冲突以代码实际为准
--- a/wiki/log.md
+++ b/wiki/log.md
@@ -1,6 +1,6 @@
 ---
 title: 变更日志
-updated: 2026-04-22
+updated: 2026-04-24
 status: active
 tags: [log, history]
 ---
@@ -9,10 +9,55 @@ tags: [log, history]

 > Append-only 操作记录。格式: `## [日期] 类型 | 描述`

+## [2026-04-24] fix(runtime+middleware) | 工具调用 P1/P2/P3 全面修复
+- **P1 流式工具并行**: 三阶段执行 (中间件预检→并行+串行分区→结果排序)，ReadOnly 工具 JoinSet+Semaphore(3)
+- **P2 OpenAI 驱动**: 参数解析失败不再静默替换为 `{}`，改为返回 `_parse_error`+`_raw_args` 让 LLM 自我修正
+- **P2 ToolOutputGuard**: 从关键词匹配改为 regex 精确匹配实际密钥值 (sk-xxx/AKIA/PEM 等)，消除误拦
+- **P2 ToolErrorMiddleware**: 失败计数器从全局 AtomicU32 改为 per-session HashMap，消除跨会话误触发
+- **P3 Gateway client**: 明确 tool_call/tool_result 的 onTool 回调语义约定 (output='' 为 start, input='' 为 end)
+- **测试**: 91 tests PASS, tsc --noEmit PASS
+
+## [2026-04-24] fix(runtime) | 工具调用两个 P0 修复
+- **P0: after_tool_call 中间件从未调用**: 流式+非流式模式均添加 `middleware_chain.run_after_tool_call()` 调用，ToolErrorMiddleware 和 ToolOutputGuardMiddleware 的 after 逻辑现在生效
+- **P0: stream_errored 跳过所有工具**: 流式模式中 `stream_errored` 不再 `break 'outer`，改为区分完整工具（ToolUseEnd 已接收）和不完整工具；完整工具照常执行，不完整工具发送取消 ToolEnd 事件
+- **影响文件**: `loop_runner.rs`
+- **测试**: 91 tests PASS, 0 cargo warnings
+
+## [2026-04-24] feat(artifact) | 产物系统优化完善
+- **MarkdownRenderer**: 从 StreamingText 提取共享 Markdown 渲染组件（react-markdown + remark-gfm），ArtifactPanel 复用
+- **ArtifactPanel**: 替换手写 30 行 MarkdownPreview → 完整 GFM 渲染（表格/代码块/列表/引用）；添加文件选择器下拉菜单
+- **数据源扩展**: 产物创建从 file_write 单工具 → file_write/str_replace/write_file/str_replace_editor；从 sendMessage 单路径 → sendMessage + initStreamListener 双路径
+- **持久化**: artifactStore 添加 zustand persist + IndexedDB (复用 idb-storage)，刷新后产物保留
+- **验证**: tsc --noEmit PASS, 343 vitest PASS
+
+## [2026-04-24] perf | Hermes 高价值设计实施 Phase 1-4
+- **Phase 1**: Anthropic prompt caching — cache_control ephemeral + cache token tracking (CompletionResponse + StreamChunk)
+- **Phase 2A**: 并行工具执行 — ToolConcurrency 枚举 (ReadOnly/Exclusive/Interactive) + JoinSet + Semaphore(3) + AtomicU32
+- **Phase 2B**: 工具输出修剪 — prune_tool_outputs() (2000→500 chars) + 集成到 CompactionMiddleware
+- **Phase 3**: 错误分类+智能重试 — LlmErrorKind + ClassifiedLlmError + RetryDriver (jittered backoff) + CONTEXT_OVERFLOW recovery
+- **Phase 4**: 异步压缩+迭代摘要 — 30s 防抖 + cached fallback + previous_summary 迭代累积
+- **新增文件**: error_classifier.rs, retry_driver.rs
+- **验证**: 997 workspace tests PASS
+
+## [2026-04-23] perf | 回复效率+建议生成并行化优化 (三部分)
+- **perf(src-tauri)**: identity prompt 缓存 (`LazyLock<RwLock<HashMap>>`) + `pre_conversation_hook` 并行化 (`tokio::join!`)
+- **perf(runtime)**: middleware `before_completion` 分波并行 — `parallel_safe()` trait + wave detection + `tokio::spawn`，5 层 safe 中间件可并行
+- **perf(desktop)**: suggestion context 预取 (sendMessage 时启动) + generateLLMSuggestions 与 memory extraction 解耦
+- **feat(desktop)**: suggestion prompt 重写 (1深入追问+1实用行动+1管家关怀) + 上下文窗口 6→20 条
+- **文件**: intelligence_hooks.rs, middleware.rs, 5 个 middleware 子模块, streamStore.ts, llm-service.ts
+- **验证**: cargo test --workspace --exclude zclaw-saas 0 fail, tsc --noEmit 0 error
+
 ## [2026-04-23] fix | Agent 命名检测重构+跨会话记忆修复+Agent tab 移除
 - **fix(desktop)**: `detectAgentNameSuggestion` 从 6 个固定正则改为 trigger+extract 两步法 (10 个 trigger)
 - **fix(desktop)**: 名字检测从 memory extraction 解耦 — 502 不再阻断面板刷新
 - **fix(src-tauri)**: `agent_update` 同步写入 soul.md — config.name → system prompt 断链修复
+
+## [2026-04-23] feat | 动态建议智能化
+- **feat(src-tauri)**: 新增 `experience_find_relevant` Tauri 命令 + `ExperienceBrief` 结构 + OnceLock 单例
+- **feat(desktop)**: 新增 `suggestion-context.ts` — 4 路并行拉取智能上下文（用户画像/痛点/经验/技能匹配）
+- **feat(desktop)**: `streamStore.ts` createCompleteHandler 并行化 + generateLLMSuggestions 增强
+- **feat(desktop)**: suggestion prompt 改为混合型（2 续问 + 1 管家关怀）
+- **文件**: experience.rs, lib.rs, suggestion-context.ts, streamStore.ts, llm-service.ts
 - **refactor(desktop)**: 移除 Agent tab (简洁模式/专业模式)，清理 dead code (~280 行)
 - **验证**: cargo check 0 error, tsc --noEmit 0 error

--- a/wiki/middleware.md
+++ b/wiki/middleware.md
@@ -1,6 +1,6 @@
 ---
 title: 中间件链
-updated: 2026-04-22
+updated: 2026-04-23
 status: active
 tags: [module, middleware, runtime]
 ---
@@ -17,6 +17,7 @@ tags: [module, middleware, runtime]
 - **WHY 注册顺序 != 执行顺序**: `kernel/mod.rs` 中 14 次 `chain.register()` 的代码顺序与运行时顺序无关，chain 按 `priority()` 升序排列后执行。
 - **WHY 6 类 14 层**: 进化(70-79) -> 路由(80-99) -> 上下文(100-199) -> 能力(200-399) -> 安全(400-599) -> 遥测(600-799)，优先级范围即执行阶段。
 - **WHY Stop/Block/AbortLoop**: 细粒度流控 -- Stop 中断 LLM 循环，Block 阻止单次工具调用，AbortLoop 终止整个 Agent 循环。命中后跳过所有后续中间件。
+- **WHY 分波并行 (parallel_safe)**: `before_completion` 阶段，只修改 `system_prompt` 的中间件可声明 `parallel_safe() == true`，连续的 parallel-safe 中间件通过 `tokio::spawn` 并行执行，各自持有 `MiddlewareContext` clone，完成后合并 prompt 贡献。降低串行延迟 ~1-3s。

 ## 2. 关键文件 + 数据流

@@ -34,8 +35,10 @@ tags: [module, middleware, runtime]
 ```
 用户消息 -> AgentLoop
  -> chain.run_before_completion(ctx)
-    -> [按 priority 升序] 每层 middleware.before_completion()
-      -> Continue: 下一层 | Stop(reason): 中断循环
+    -> [分波并行] 检测连续 parallel_safe 中间件
+      -> Wave 并行 (2+ safe): tokio::spawn 各自 ctx.clone() → 合并 prompt
+      -> 串行 (unsafe / 单个 safe): 逐个执行
+    -> Continue: 下一层 | Stop(reason): 中断循环
  -> LLM 调用
  -> (工具调用时) chain.run_before_tool_call()
    -> Allow | Block(msg) | ReplaceInput | AbortLoop
@@ -57,22 +60,22 @@ tags: [module, middleware, runtime]

 ### 14 层 Runtime 中间件

-| 优先级 | 中间件 | 文件 | 职责 | 注册条件 |
-|--------|--------|------|------|----------|
-| @78 | EvolutionMiddleware | `evolution.rs` | 推送进化候选项到 system prompt | 始终 |
-| @80 | ButlerRouter | `butler_router.rs` | 语义技能路由 + system prompt 增强 + XML fencing | 始终 |
-| @100 | Compaction | `compaction.rs` | 超阈值时压缩对话历史 | `compaction_threshold > 0` |
-| @150 | Memory | `memory.rs` | 对话后自动提取记忆 + 注入检索结果 | 始终 |
-| @180 | Title | `title.rs` | 自动生成会话标题 | 始终 |
-| @200 | SkillIndex | `skill_index.rs` | 注入技能索引到 system prompt | `!skill_index.is_empty()` |
-| @300 | DanglingTool | `dangling_tool.rs` | 修复缺失的工具调用结果 | 始终 |
-| @350 | ToolError | `tool_error.rs` | 格式化工具错误供 LLM 恢复 | 始终 |
-| @360 | ToolOutputGuard | `tool_output_guard.rs` | 工具输出安全检查 | 始终 |
-| @400 | Guardrail | `guardrail.rs` | shell_exec/file_write/web_fetch 安全规则 | 始终 |
-| @500 | LoopGuard | `loop_guard.rs` | 防止工具调用无限循环 | 始终 |
-| @550 | SubagentLimit | `subagent_limit.rs` | 限制并发子 agent | 始终 |
-| @650 | TrajectoryRecorder | `trajectory_recorder.rs` | 轨迹记录 + 压缩 | 始终 |
-| @700 | TokenCalibration | `token_calibration.rs` | Token 用量校准 | 始终 |
+| 优先级 | 中间件 | 文件 | 职责 | parallel_safe | 注册条件 |
+|--------|--------|------|------|---------------|----------|
+| @78 | EvolutionMiddleware | `evolution.rs` | 推送进化候选项到 system prompt | ✅ | 始终 |
+| @80 | ButlerRouter | `butler_router.rs` | 语义技能路由 + system prompt 增强 + XML fencing | ✅ | 始终 |
+| @100 | Compaction | `compaction.rs` | 超阈值时压缩对话历史 | ❌ | `compaction_threshold > 0` |
+| @150 | Memory | `memory.rs` | 对话后自动提取记忆 + 注入检索结果 | ✅ | 始终 |
+| @180 | Title | `title.rs` | 自动生成会话标题 | ✅ | 始终 |
+| @200 | SkillIndex | `skill_index.rs` | 注入技能索引到 system prompt | ✅ | `!skill_index.is_empty()` |
+| @300 | DanglingTool | `dangling_tool.rs` | 修复缺失的工具调用结果 | ❌ | 始终 |
+| @350 | ToolError | `tool_error.rs` | 格式化工具错误供 LLM 恢复 | ❌ | 始终 |
+| @360 | ToolOutputGuard | `tool_output_guard.rs` | 工具输出安全检查 | ❌ | 始终 |
+| @400 | Guardrail | `guardrail.rs` | shell_exec/file_write/web_fetch 安全规则 | ❌ | 始终 |
+| @500 | LoopGuard | `loop_guard.rs` | 防止工具调用无限循环 | ❌ | 始终 |
+| @550 | SubagentLimit | `subagent_limit.rs` | 限制并发子 agent | ❌ | 始终 |
+| @650 | TrajectoryRecorder | `trajectory_recorder.rs` | 轨迹记录 + 压缩 | ❌ | 始终 |
+| @700 | TokenCalibration | `token_calibration.rs` | Token 用量校准 | ❌ | 始终 |

 > 注册顺序 (代码) 与执行顺序 (priority) 不同。Chain 按 priority 升序排列后执行。

@@ -96,6 +99,8 @@ tags: [module, middleware, runtime]
 - Priority 升序: 0-999, 数值越小越先执行
 - 注册顺序 != 执行顺序; chain 按 priority 运行时排序
 - Stop/Block/AbortLoop 立即中断, 不执行后续中间件
+- parallel_safe 中间件只修改 system_prompt，不修改 messages，不返回 Stop
+- 分波合并: 并行 wave 中每个中间件 clone context，完成后按 base_prompt_len 截取增量合并

 ### 核心接口

@@ -103,6 +108,7 @@ tags: [module, middleware, runtime]
 trait AgentMiddleware: Send + Sync {
    fn name(&self) -> &str;
    fn priority(&self) -> i32 { 500 }
+    fn parallel_safe(&self) -> bool { false }
    async fn before_completion(&self, ctx: &mut MiddlewareContext) -> Result<MiddlewareDecision>;
    async fn before_tool_call(&self, ctx: &MiddlewareContext, tool_name: &str, tool_input: &Value) -> Result<ToolCallDecision>;
    async fn after_tool_call(&self, ctx: &mut MiddlewareContext, tool_name: &str, result: &Value) -> Result<()>;
@@ -129,8 +135,8 @@ trait AgentMiddleware: Send + Sync {

 | 日期 | 变更 | 影响 |
 |------|------|------|
+| 04-23 | 分波并行执行: parallel_safe() + wave detection + tokio::spawn | before_completion 阶段 5 层 safe 中间件可并行，延迟降低 ~1-3s |
 | 04-22 | DataMasking 中间件移除 | 14->14 层 (替换为无), 减少 1 层无收益处理 |
 | 04-22 | 跨会话记忆修复 | Memory 中间件去重+跨会话注入修复 |
 | 04-22 | Wiki 一致性校准 | 数字与代码验证对齐 |
 | 04-21 | Embedding 接通 | SkillIndex 路由 TF-IDF->Embedding+LLM fallback |
-| 04-15 | Heartbeat 统一健康系统 | TrajectoryRecorder 痛点感知增强 |
Author	SHA1	Message	Date
iven	7b0d452845	fix(tool): Windows UNC 路径规范 — PathValidator 路径比较一致性 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - with_workspace() 对 workspace_root 做 canonicalize，确保与 resolve_and_validate 产出的 canonical 路径格式一致 - 新增 normalize_windows_path() 剥离 \?\ 前缀，解决 Windows 上 starts_with 比较失败问题 - check_blocked/check_allowed 统一使用规范化路径比较	2026-04-24 17:02:24 +08:00
iven	855c89e8fb	fix(tool): 相对路径文件写入失败 — PathValidator 先基于 workspace 解析 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details 当 file_write 收到相对路径如 test_tool.txt 时，PathValidator 的 resolve_and_validate 尝试对空父目录 canonicalize 导致失败。修复：相对路径先基于 workspace_root 解析为绝对路径，再进行安全校验。	2026-04-24 16:02:09 +08:00
iven	3eb098f020	fix(runtime): 工具调用 P1/P2/P3 全面修复 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details P1: 流式模式工具并行执行 - 三阶段执行: Phase 1 中间件预检(serial) → Phase 2 并行+串行分区 → Phase 3 结果排序 - ReadOnly 工具用 JoinSet + Semaphore(3) 并行，Exclusive/Interactive 串行 - 与非流式模式保持一致的执行策略 P2: OpenAI 驱动工具参数解析 - 解析失败不再静默替换为 {}，改为返回 _parse_error + _raw_args - 让 LLM 和工具能感知参数问题并自我修正 P2: ToolOutputGuard 精确匹配 - 从 to_lowercase() 关键词匹配改为 regex 精确匹配实际密钥值 - 检测 sk-xxx(20+), AKIA(16), PEM 私钥, key=value 模式 - 移除 "system:", "you are now" 等过于宽泛的注入检测 - 消除合法内容包含 "password" 等词汇时的误拦 P2: ToolErrorMiddleware per-session 计数 - 从全局 AtomicU32 改为 Mutex<HashMap<session_id, u32>> - 每个会话独立跟踪连续失败次数，消除跨会话误触发 AbortLoop P3: Gateway client onTool 回调语义 - 明确 tool_call 的 output 始终为空串 (start 信号) - 添加注释说明 start/end 语义约定	2026-04-24 12:56:07 +08:00
iven	c12b64150b	fix(runtime): 工具调用 P0 修复 — after_tool_call 接入 + stream_errored 工具抢救 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details P0-1: after_tool_call 中间件从未被调用 - 流式模式(run_streaming)和非流式模式(run)均添加 middleware_chain.run_after_tool_call() - ToolErrorMiddleware 错误计数恢复逻辑现在生效 - ToolOutputGuardMiddleware 敏感信息检测现在生效 P0-2: stream_errored 跳过所有工具执行 - 新增 completed_tool_ids 跟踪哪些工具已收到完整 ToolUseEnd - 流式错误时区分完整工具和不完整工具 - 完整工具照常执行（产物创建等不受影响） - 不完整工具发送取消 ToolEnd 事件（前端不再卡"执行中"） - 工具执行后若 stream_errored，break outer 阻止无效 LLM 循环参考文档: - docs/references/zclaw-toolcall-issues.md (10项问题分析) - docs/references/deerflow-toolcall-reference.md (DeerFlow工具调用完整参考)	2026-04-24 12:20:14 +08:00
iven	4c31471cd6	feat(artifact): 产物系统优化 — 共享渲染 + 数据源扩展 + 持久化 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - MarkdownRenderer: 从 StreamingText 提取共享 react-markdown + remark-gfm 组件 - ArtifactPanel: 替换手写 MarkdownPreview 为完整 GFM 渲染，添加文件选择器下拉菜单 - 数据源: file_write/str_replace 双工具 + sendMessage/initStreamListener 双路径 - 持久化: artifactStore 添加 zustand persist + IndexedDB (复用 idb-storage)	2026-04-24 10:59:27 +08:00
iven	b60b96225d	docs(wiki): Hermes Phase 1-4 wiki 同步 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - hands-skills: 新增 concurrency() 声明要求不变量 - log: 追加 Hermes Phase 1-4 变更记录 - index: 更新日期	2026-04-24 08:54:48 +08:00
iven	06e93a21af	perf(compaction): Hermes Phase 4 — debounce + async cache + iterative summary Step 4.1: Compaction debounce - 30s cooldown between consecutive compactions - Minimum 3 rounds (6 messages) since last compaction before re-triggering - AtomicU64 lock-free state tracking Step 4.2: Async compaction with cached fallback - During cooldown, use cached result from previous compaction - RwLock<Option<Vec<Message>>> for thread-safe cache access - Cache updated after each successful compaction Step 4.3: Iterative summary - generate_summary/generate_llm_summary accept previous_summary parameter - LLM prompt includes previous summary for cumulative context preservation - Rule-based summary carries forward [上轮摘要保留] section - previous_summary extracted from leading System messages in message history	2026-04-24 08:53:37 +08:00
iven	9060935401	perf(runtime): Hermes Phase 1-3 — prompt caching + parallel tools + smart retry Phase 1: Anthropic prompt caching - Add cache_control ephemeral on system prompt blocks - Track cache_creation/cache_read tokens in CompletionResponse + StreamChunk Phase 2A: Parallel tool execution - Add ToolConcurrency enum (ReadOnly/Exclusive/Interactive) - JoinSet + Semaphore(3) for bounded parallel tool calls - 7 tools annotated with correct concurrency level - AtomicU32 for lock-free failure tracking in ToolErrorMiddleware Phase 2B: Tool output pruning - prune_tool_outputs() trims old ToolResult > 2000 chars to 500 chars - Integrated into CompactionMiddleware before token estimation Phase 3: Error classification + smart retry - LlmErrorKind + ClassifiedLlmError for structured error mapping - RetryDriver decorator with jittered exponential backoff - Kernel wraps all LLM calls with RetryDriver - CONTEXT_OVERFLOW recovery triggers emergency compaction in loop_runner	2026-04-24 08:39:56 +08:00
iven	6d6673bf5b	fix(suggest): 建议默认使用中文，不混入英文词汇 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details 规则 7 从"使用与用户相同的语言"改为明确要求中文优先，英文术语需翻译（如 workflow→工作流）。示例同步更新为纯中文表达。	2026-04-24 00:01:22 +08:00
iven	15f84bf8c1	fix(suggest): 建议芯片去掉称谓，避免用户发送时角色错位 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details suggestion prompt 新增规则：建议会被用户直接点击发送，因此不包含"领导/老板/老师"等称谓，改用无主语句式。同步更新示例和关怀模板中的表达方式。	2026-04-23 23:53:07 +08:00
iven	9a313e3c92	docs(wiki): 回复效率+建议并行化优化 wiki 同步 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - middleware.md: 分波并行执行设计决策 + parallel_safe 标注 + 不变量 + 执行流 - chat.md: suggestion prefetch + 解耦 memory + prompt 重写 - log.md: 追加变更记录 - CLAUDE.md: §13 架构快照 + 最近变更	2026-04-23 23:45:28 +08:00
iven	ee5611a2f8	perf(middleware): before_completion 分波并行执行 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - MiddlewareContext 加 Clone derive, 支持并行克隆上下文 - AgentMiddleware trait 新增 parallel_safe() 默认方法 (false) - MiddlewareChain::run_before_completion 改为分波执行: 连续 2+ 个 parallel_safe 中间件用 tokio::spawn 并发执行, 各自独立修改 system_prompt, 执行完成后合并贡献 - 5 个只修改 system_prompt 的中间件标记 parallel_safe: evolution(P78), butler_router(P80), memory(P150), title(P180), skill_index(P200) - 非 parallel_safe 中间件 (compaction, dangling_tool 等) 保持串行分波效果: Wave 1: evolution + butler_router → 并行 (省 ~0.5-1s) Wave 2: compaction → 串行 (可能修改 messages) Wave 3: memory + title + skill_index → 并行 (省 ~0.5-2s) Wave 4+: 工具/安全中间件 → 串行	2026-04-23 23:37:57 +08:00
iven	5cf7adff69	perf(chat): 回复效率 + 建议生成并行化优化 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - identity prompt 缓存: LazyLock<RwLock<HashMap>> 缓存已构建的 identity prompt, soul.md 更新时自动失效, 省去每次请求的 mutex + 磁盘 I/O (~0.5-1s) - pre-conversation hook 并行化: tokio::join! 并行执行 identity build 和 continuity context 查询, 不再串行等待 (~1-2s) - suggestion context 预取: 流式回复期间提前启动 fetchSuggestionContext, 回复结束时 context 已就绪 (~0.5-1s) - 建议生成与 memory extraction 解耦: generateLLMSuggestions 不再等待 memory extraction LLM 调用完成, 独立启动 (~3-8s) - Path B (agent stream) 补全 context: lifecycle:end 路径使用预取 context, 修复零个性化问题 - 上下文窗口扩展: slice(-6) → slice(-20), 每条截断 200 字符 - suggestion prompt 重写: 1 深入追问 + 1 实用行动 + 1 管家关怀, 明确角色定位, 禁止空泛建议	2026-04-23 23:13:20 +08:00
iven	10497362bb	fix(chat): 澄清问题卡片 UX 优化 — 去悬空引用 + 默认展开 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details - 提示词增加 ask_clarification 引用规则，避免 LLM 在文本中生成 "以下信息"/"比如："等悬空引用短语 - 新增 stripDanglingClarificationRef 前端安全网，当消息包含 ask_clarification 工具调用时自动移除末尾悬空引用 - 澄清卡片默认展开，让用户直接看到选项无需额外点击	2026-04-23 19:21:10 +08:00
iven	d7dbdf8600	docs(wiki): 动态建议智能化变更日志 Some checks failed CI / Lint & TypeCheck (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details CI / Build Frontend (push) Has been cancelled Details CI / Rust Check (push) Has been cancelled Details CI / Security Scan (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details	2026-04-23 18:01:44 +08:00
iven	8c25b20fe2	feat(suggest): 更新 suggestion prompt 为混合型（2续问+1管家关怀） - llm-service.ts: HARDCODED_PROMPTS.suggestions.system 改为混合型 - 2条对话续问 + 1条管家关怀（痛点回访/经验复用/技能推荐） - streamStore.ts: LLM_PROMPTS_SYSTEM 改为引用 llm-service 导出 - 单一真相源，OTA 更新时自动生效	2026-04-23 17:58:58 +08:00
iven	87110ffdff	feat(suggest): 改造 createCompleteHandler 并行化 + generateLLMSuggestions 增强 - createCompleteHandler: 记忆提取+上下文拉取 Promise.all 并行 - generateLLMSuggestions: 新增 SuggestionContext 参数，构建增强 user message - llmSuggestViaSaaS: 删除 2s 人为延迟（并行化后不再需要） - 变量重命名 context→conversationContext 避免与 SuggestionContext 冲突	2026-04-23 17:57:17 +08:00
iven	980a8135fa	feat(suggest): 新增 fetchSuggestionContext 聚合函数 + 类型定义 - 4 路并行拉取智能上下文：用户画像、痛点、经验、技能匹配 - 500ms 超时保护 + 静默降级（失败不阻断建议生成） - Tauri 不可用时直接返回空上下文	2026-04-23 17:54:57 +08:00
iven	e9e7ffd609	feat(intelligence): 新增 experience_find_relevant Tauri 命令 + ExperienceBrief - 新增 ExperienceBrief 结构（痛点模式+方案摘要+复用次数） - OnceLock 单例 + init_experience_extractor() 启动初始化 - experience_find_relevant 命令：按 agent_id + query 检索相关经验 - 注册到 invoke_handler + setup 阶段优雅降级初始化 - 新增序列化测试（10 tests PASS）	2026-04-23 17:52:33 +08:00