fix(kernel): 使用 Kernel 配置的 model 而非 Agent 持久化的旧值

问题：在"模型与 API"页面切换模型后，对话仍使用旧模型根因：Agent 配置从数据库恢复，其 model 字段优先于 Kernel 配置修复： - kernel.rs: send_message/send_message_stream 始终使用 Kernel 的当前 model - openai.rs: 添加 User-Agent header 解决 Coding Plan API 405 错误 - kernel_commands.rs: 添加详细调试日志便于追踪配置传递 - troubleshooting.md: 记录此问题的排查过程和解决方案 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 22:56:06 +08:00
parent 86e79b4ad1
commit ae4bf815e3
5 changed files with 415 additions and 40 deletions
--- a/docs/knowledge-base/troubleshooting.md
+++ b/docs/knowledge-base/troubleshooting.md
@@ -803,7 +803,204 @@ curl http://localhost:1420/api/agents

 ---

-## 9. 相关文档
+## 9. 内核 LLM 响应问题
+
+### 9.1 聊天显示"思考中..."但无响应
+
+**症状**: 发送消息后，UI 显示"思考中..."状态，但永远不会收到 AI 响应
+
+**根本原因**: `loop_runner.rs` 中的代码存在两个严重问题：
+
+1. **模型 ID 硬编码**: 使用固定的 `"claude-sonnet-4-20250514"` 而非用户配置的模型
+2. **响应被丢弃**: 返回硬编码的 `"Response placeholder"` 而非实际 LLM 响应内容
+
+**问题代码** (`crates/zclaw-runtime/src/loop_runner.rs`):
+```rust
+// ❌ 错误 - 硬编码模型和响应
+let request = CompletionRequest {
+    model: "claude-sonnet-4-20250514".to_string(), // 硬编码！
+    // ...
+};
+
+// ...
+
+Ok(AgentLoopResult {
+    response: "Response placeholder".to_string(), // 丢弃真实响应！
+    // ...
+})
+```
+
+**修复方案**:
+
+1. **添加配置字段到 AgentLoop**:
+```rust
+pub struct AgentLoop {
+    // ... existing fields
+    model: String,
+    system_prompt: Option<String>,
+    max_tokens: u32,
+    temperature: f32,
+}
+
+impl AgentLoop {
+    pub fn with_model(mut self, model: impl Into<String>) -> Self {
+        self.model = model.into();
+        self
+    }
+    // ... other builder methods
+}
+```
+
+2. **使用配置的模型**:
+```rust
+let request = CompletionRequest {
+    model: self.model.clone(), // 使用配置的模型
+    // ...
+};
+```
+
+3. **提取实际响应内容**:
+```rust
+// 从 CompletionResponse.content 提取文本
+let response_text = response.content
+    .iter()
+    .filter_map(|block| match block {
+        ContentBlock::Text { text } => Some(text.clone()),
+        ContentBlock::Thinking { thinking } => Some(format!("[思考] {}", thinking)),
+        ContentBlock::ToolUse { name, input, .. } => {
+            Some(format!("[工具调用] {}({})", name, serde_json::to_string(input).unwrap_or_default()))
+        }
+    })
+    .collect::<Vec<_>>()
+    .join("\n");
+
+Ok(AgentLoopResult {
+    response: response_text, // 返回真实响应
+    // ...
+})
+```
+
+4. **在 kernel.rs 中传递模型配置**:
+```rust
+pub async fn send_message(&self, agent_id: &AgentId, message: String) -> Result<MessageResponse> {
+    let agent_config = self.registry.get(agent_id)?;
+
+    // 确定使用的模型：agent 配置优先，然后是 kernel 配置
+    let model = if !agent_config.model.model.is_empty() {
+        &agent_config.model.model
+    } else {
+        &self.config.default_model
+    };
+
+    let loop_runner = AgentLoop::new(/* ... */)
+        .with_model(model)
+        .with_max_tokens(agent_config.max_tokens.unwrap_or(self.config.max_tokens))
+        .with_temperature(agent_config.temperature.unwrap_or(self.config.temperature));
+
+    // ...
+}
+```
+
+**影响范围**:
+- `crates/zclaw-runtime/src/loop_runner.rs` - 核心修复
+- `crates/zclaw-kernel/src/kernel.rs` - 模型配置传递
+
+**验证修复**:
+1. 配置 Coding Plan API（如 `https://coding.dashscope.aliyuncs.com/v1`）
+2. 发送消息
+3. 应该收到实际的 LLM 响应而非占位符
+
+**特别说明**: 此问题影响所有 LLM 提供商，不仅限于 Coding Plan API。任何自定义模型配置都会被忽略。
+
+### 9.2 Coding Plan API 配置流程
+
+**支持的 Coding Plan 端点**:
+
+| 提供商 | Provider ID | Base URL |
+|--------|-------------|----------|
+| Kimi Coding Plan | `kimi-coding` | `https://api.kimi.com/coding/v1` |
+| 百炼 Coding Plan | `qwen-coding` | `https://coding.dashscope.aliyuncs.com/v1` |
+| 智谱 GLM Coding Plan | `zhipu-coding` | `https://open.bigmodel.cn/api/coding/paas/v4` |
+
+**配置流程**:
+
+1. **前端** (`ModelsAPI.tsx`): 用户选择 Provider，输入 API Key 和 Model ID
+2. **存储** (`localStorage`): 保存为 `CustomModel` 对象
+3. **连接时** (`connectionStore.ts`): 从 localStorage 读取配置
+4. **传递给内核** (`kernel-client.ts`): 通过 `kernel_init` 命令传递
+5. **内核处理** (`kernel_commands.rs`): 根据 Provider 和 Base URL 创建驱动
+
+**关键代码路径**:
+```
+UI 配置 → localStorage → connectionStore.getDefaultModelConfig()
+    → kernelClient.setConfig() → invoke('kernel_init', { configRequest })
+    → KernelConfig → create_driver() → OpenAiDriver::with_base_url()
+```
+
+**注意事项**:
+- Coding Plan 使用 OpenAI 兼容协议 (`api_protocol: "openai"`)
+- Base URL 必须包含完整路径（如 `/v1`）
+- 未知 Provider 会走 fallback 逻辑，使用 `local_base_url` 作为自定义端点
+
+### 9.3 更换模型配置后仍使用旧模型
+
+**症状**: 在"模型与 API"页面切换模型后，对话仍然使用旧模型，API 请求中的 model 字段是旧的值
+
+**示例日志**:
+```
+[kernel_init] Final config: model=qwen3.5-plus, base_url=https://coding.dashscope.aliyuncs.com/v1
+[OpenAiDriver] Request body: {"model":"kimi-for-coding",...}  # 旧模型！
+```
+
+**根本原因**: Agent 配置持久化在数据库中，其 `model` 字段优先于 Kernel 的配置
+
+**问题代码** (`crates/zclaw-kernel/src/kernel.rs`):
+```rust
+// ❌ 错误 - Agent 的 model 优先于 Kernel 的 model
+let model = if !agent_config.model.model.is_empty() {
+    agent_config.model.model.clone()  // 持久化的旧值
+} else {
+    self.config.model().to_string()
+};
+```
+
+**问题分析**:
+
+1. Agent 配置在创建时保存到 SQLite 数据库
+2. Kernel 启动时从数据库恢复 Agent 配置
+3. `send_message` 中 Agent 的 model 配置优先于 Kernel 的当前配置
+4. 用户在"模型与 API"页面更改的是 Kernel 配置，不影响已持久化的 Agent 配置
+
+**修复方案**:
+
+让 Kernel 的当前配置优先，确保用户的"模型与 API"设置生效：
+
+```rust
+// ✅ 正确 - 始终使用 Kernel 的当前 model 配置
+let model = self.config.model().to_string();
+
+eprintln!("[Kernel] send_message: using model={} from kernel config", model);
+```
+
+**影响范围**:
+- `crates/zclaw-kernel/src/kernel.rs` - `send_message` 和 `send_message_stream` 方法
+
+**设计决策**:
+
+ZCLAW 的设计是让用户在"模型与 API"页面设置全局模型，而不是为每个 Agent 单独设置。因此：
+- Kernel 配置应该优先于 Agent 配置
+- Agent 配置主要用于存储 personality、system_prompt 等
+- model 配置应该由全局设置控制
+
+**验证修复**:
+1. 在"模型与 API"页面配置新模型
+2. 发送消息
+3. 检查终端日志，应显示 `using model=新模型 from kernel config`
+4. 检查 API 请求体，`model` 字段应为新模型
+
+---
+
+## 10. 相关文档

 - [OpenFang 配置指南](./openfang-configuration.md) - 配置文件位置、格式和最佳实践
 - [Agent 和 LLM 提供商配置](./agent-provider-config.md) - Agent 管理和 Provider 配置
@@ -815,6 +1012,8 @@ curl http://localhost:1420/api/agents

 | 日期 | 变更 |
 |------|------|
+| 2026-03-23 | 添加 9.3 节：更换模型配置后仍使用旧模型 - Agent 配置优先于 Kernel 配置导致的问题 |
+| 2026-03-22 | 添加内核 LLM 响应问题：loop_runner.rs 硬编码模型和响应导致 Coding Plan API 不工作 |
 | 2026-03-20 | 添加端口配置问题：runtime-manifest.json 声明 4200 但实际运行 50051 |
 | 2026-03-18 | 添加记忆提取和图谱 UI 问题 |
 | 2026-03-18 | 添加刷新后对话丢失问题和 ChatArea 布局问题 |