Phase 1: Anthropic prompt caching - Add cache_control ephemeral on system prompt blocks - Track cache_creation/cache_read tokens in CompletionResponse + StreamChunk Phase 2A: Parallel tool execution - Add ToolConcurrency enum (ReadOnly/Exclusive/Interactive) - JoinSet + Semaphore(3) for bounded parallel tool calls - 7 tools annotated with correct concurrency level - AtomicU32 for lock-free failure tracking in ToolErrorMiddleware Phase 2B: Tool output pruning - prune_tool_outputs() trims old ToolResult > 2000 chars to 500 chars - Integrated into CompactionMiddleware before token estimation Phase 3: Error classification + smart retry - LlmErrorKind + ClassifiedLlmError for structured error mapping - RetryDriver decorator with jittered exponential backoff - Kernel wraps all LLM calls with RetryDriver - CONTEXT_OVERFLOW recovery triggers emergency compaction in loop_runner
对话链路: 4 缝测试 (Tauri→Kernel / Kernel→LLM / LLM→UI / 流式生命周期) Hands链路: 3 缝测试 (工具路由 / 执行回调 / 通用工具) 记忆链路: 3 缝测试 (FTS5存储 / 模式检索 / 去重) 冒烟测试: 3 Rust + 8 TypeScript 全量 PASS - Kernel::boot_with_driver() 测试辅助方法 - 全量 cargo test 0 回归 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>