fix(openai): resolve DashScope/Bailian tool calling 400 errors
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- Detect providers that don't support streaming with tools (DashScope, aliyuncs, bigmodel.cn)
- Add stream_from_complete() to use non-streaming mode when tools are present
- Fix convert_response() to prioritize tool_calls over empty content
- Fix ToolUse message JSON serialization (Null -> "{}")
- Skip invalid tool calls with empty names in streaming
Root cause: DashScope Coding Plan API doesn't support stream=true with tools,
causing tool parameters to be lost or malformed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1058,6 +1058,153 @@ cargo fix --lib -p zclaw-protocols --allow-dirty
|
||||
|
||||
**注意**: `dead_code` 警告(未使用的字段、方法)不影响编译,可以保留供将来使用。
|
||||
|
||||
### 9.5 阿里云百炼 Coding Plan 工具调用 400 错误
|
||||
|
||||
**症状**:
|
||||
- 普通对话正常,但需要调用 skill/tool 时返回 400 错误
|
||||
- API 返回 `function.arguments must be in JSON format`
|
||||
- 或者响应为空,但显示有 `output_tokens`
|
||||
|
||||
**根本原因**: 多层问题叠加
|
||||
|
||||
1. **流式模式不支持工具调用**: 阿里云百炼 (DashScope) Coding Plan API 的限制:
|
||||
> "tools暂时无法与stream=True同时使用"
|
||||
- 当同时启用 `stream: true` 和 `tools` 时,API 行为异常
|
||||
- 工具调用参数无法正确传输
|
||||
|
||||
2. **响应解析优先级错误**: `convert_response` 方法优先处理 `content` 字段,即使它是空字符串
|
||||
- 当 API 返回 `content: Some("")` 和 `tool_calls: [...]` 时
|
||||
- 代码错误地选择了空的 content,导致响应为空
|
||||
|
||||
3. **ToolUse 消息 JSON 序列化错误**: 当 `input` 为 `Null` 时
|
||||
- `serde_json::to_string(input)` 产生 `"null"` 字符串
|
||||
- API 要求 `"{}"` (空对象) 格式
|
||||
|
||||
**问题分析**:
|
||||
|
||||
工具调用的完整流程:
|
||||
```
|
||||
用户消息 → LLM 决定调用工具 → 返回 tool_calls → 执行工具 → 返回结果 → LLM 生成最终响应
|
||||
```
|
||||
|
||||
在百炼 API 中,由于流式 + 工具不兼容:
|
||||
```
|
||||
stream=true + tools → API 行为异常 → tool_calls 参数丢失 → 空工具名/重复调用
|
||||
```
|
||||
|
||||
**修复方案**:
|
||||
|
||||
1. **检测不兼容的 Provider 并使用非流式模式** (`openai.rs:stream`):
|
||||
|
||||
```rust
|
||||
fn stream(&self, request: CompletionRequest) -> Pin<Box<dyn Stream<Item = Result<StreamChunk>> + Send + '_>> {
|
||||
let has_tools = !request.tools.is_empty();
|
||||
let needs_non_streaming = self.base_url.contains("dashscope") ||
|
||||
self.base_url.contains("aliyuncs") ||
|
||||
self.base_url.contains("bigmodel.cn");
|
||||
|
||||
if has_tools && needs_non_streaming {
|
||||
eprintln!("[OpenAiDriver:stream] Provider detected that may not support streaming with tools, using non-streaming mode");
|
||||
return self.stream_from_complete(request); // 使用非流式模式
|
||||
}
|
||||
// ... 正常流式逻辑
|
||||
}
|
||||
```
|
||||
|
||||
2. **实现 `stream_from_complete` 方法**: 调用非流式 API,然后模拟流式输出
|
||||
|
||||
```rust
|
||||
fn stream_from_complete(&self, request: CompletionRequest) -> Pin<Box<dyn Stream<Item = Result<StreamChunk>> + Send + '_>> {
|
||||
let mut complete_request = self.build_api_request(&request);
|
||||
complete_request.stream = false; // 强制非流式
|
||||
|
||||
Box::pin(stream! {
|
||||
// 1. 发送非流式请求
|
||||
let response = client.execute(request).await?;
|
||||
|
||||
// 2. 解析响应
|
||||
let api_response: OpenAiResponse = response.json().await?;
|
||||
|
||||
// 3. 转换为流式事件
|
||||
for tool_call in tool_calls {
|
||||
yield Ok(StreamChunk::ToolUseStart { id, name });
|
||||
yield Ok(StreamChunk::ToolUseDelta { id, delta });
|
||||
yield Ok(StreamChunk::ToolUseEnd { id, input });
|
||||
}
|
||||
|
||||
// 4. 文本内容
|
||||
yield Ok(StreamChunk::TextDelta { delta: content });
|
||||
|
||||
// 5. 完成
|
||||
yield Ok(StreamChunk::Complete { ... });
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
3. **修复响应解析优先级** (`convert_response`):
|
||||
|
||||
```rust
|
||||
let (content, stop_reason) = match choice {
|
||||
Some(c) => {
|
||||
let has_tool_calls = c.message.tool_calls.as_ref().map(|tc| !tc.is_empty()).unwrap_or(false);
|
||||
let has_content = c.message.content.as_ref().map(|t| !t.is_empty()).unwrap_or(false);
|
||||
|
||||
let blocks = if has_tool_calls {
|
||||
// ✅ 工具调用优先于空内容
|
||||
tool_calls.iter().map(|tc| ContentBlock::ToolUse {
|
||||
id: tc.id.clone(),
|
||||
name: tc.function.name.clone(),
|
||||
input: serde_json::from_str(&tc.function.arguments).unwrap_or(Value::Null),
|
||||
}).collect()
|
||||
} else if has_content {
|
||||
// 非空文本内容
|
||||
vec![ContentBlock::Text { text: c.message.content.as_ref().unwrap().clone() }]
|
||||
} else {
|
||||
vec![ContentBlock::Text { text: String::new() }]
|
||||
};
|
||||
// ...
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
4. **修复 ToolUse 消息的 JSON 序列化**:
|
||||
|
||||
```rust
|
||||
zclaw_types::Message::ToolUse { id, tool, input } => {
|
||||
let args = if input.is_null() {
|
||||
"{}".to_string() // ✅ Null 转换为空对象
|
||||
} else {
|
||||
serde_json::to_string(input).unwrap_or_else(|_| "{}".to_string())
|
||||
};
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
**影响范围**:
|
||||
- `crates/zclaw-runtime/src/driver/openai.rs` - OpenAI 兼容驱动
|
||||
|
||||
**已知的兼容性问题 Provider**:
|
||||
|
||||
| Provider | Base URL 特征 | 问题 |
|
||||
|----------|--------------|------|
|
||||
| 阿里云百炼 | `dashscope.aliyuncs.com` | 流式 + 工具不兼容 |
|
||||
| 阿里云百炼 Coding Plan | `coding.dashscope.aliyuncs.com` | 流式 + 工具不兼容 |
|
||||
| 智谱 GLM | `bigmodel.cn` | 可能存在同样问题 |
|
||||
|
||||
**验证修复**:
|
||||
1. 配置百炼 Coding Plan API (`https://coding.dashscope.aliyuncs.com/v1`)
|
||||
2. 发送需要调用 skill 的消息(如"查询腾讯财报")
|
||||
3. 应看到日志:`[OpenAiDriver:stream] Provider detected that may not support streaming with tools`
|
||||
4. 工具应正确执行,参数完整
|
||||
|
||||
**调试日志示例**:
|
||||
```
|
||||
[OpenAiDriver:stream] base_url=https://coding.dashscope.aliyuncs.com/v1, has_tools=true, needs_non_streaming=true
|
||||
[OpenAiDriver:stream] Provider detected that may not support streaming with tools, using non-streaming mode
|
||||
[OpenAiDriver] Non-streaming response received, tool_calls=1
|
||||
[AgentLoop] ToolUseEnd: id=call_xxx, input={"skill_id":"finance-tracker","input":{...}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. 技能系统问题
|
||||
@@ -1163,6 +1310,90 @@ triggers:
|
||||
2. 发送"查询腾讯财报"
|
||||
3. Agent 应该调用 `execute_skill` 工具,传入 `skill_id: "finance-tracker"`
|
||||
|
||||
### 10.2 `skills_dir: None` 导致技能系统完全失效
|
||||
|
||||
**症状**:
|
||||
- Agent 无法调用任何技能,总是直接回复文本
|
||||
- `skills.list()` 返回空列表
|
||||
- 系统提示词中没有任何技能信息
|
||||
|
||||
**根本原因**: `KernelConfig::from_provider()` 方法中 `skills_dir` 被硬编码为 `None`
|
||||
|
||||
**问题代码** (`crates/zclaw-kernel/src/config.rs:337`):
|
||||
```rust
|
||||
// ❌ 错误 - from_provider() 中硬编码为 None
|
||||
pub fn from_provider(
|
||||
provider: &str,
|
||||
api_key: &str,
|
||||
model: &str,
|
||||
base_url: Option<&str>,
|
||||
api_protocol: &str,
|
||||
) -> Self {
|
||||
let llm = match provider {
|
||||
// ... provider matching logic
|
||||
};
|
||||
|
||||
Self {
|
||||
database_url: default_database_url(),
|
||||
llm,
|
||||
skills_dir: None, // ← 硬编码!导致技能永不加载
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**影响分析**:
|
||||
|
||||
Tauri 初始化 Kernel 时使用 `from_provider()` 创建配置:
|
||||
```
|
||||
kernel_init → KernelConfig::from_provider() → skills_dir: None
|
||||
→ Kernel::boot() → skills_dir 不存在,跳过扫描
|
||||
→ skills.list() 返回空列表
|
||||
→ 系统提示词中无技能信息
|
||||
→ LLM 不知道有 execute_skill 工具可用
|
||||
```
|
||||
|
||||
**修复方案**:
|
||||
```rust
|
||||
// ✅ 正确 - 使用默认技能目录
|
||||
Self {
|
||||
database_url: default_database_url(),
|
||||
llm,
|
||||
skills_dir: default_skills_dir(), // 使用 ./skills 目录
|
||||
}
|
||||
```
|
||||
|
||||
**修复代码** (`config.rs:161-165`):
|
||||
```rust
|
||||
fn default_skills_dir() -> Option<std::path::PathBuf> {
|
||||
std::env::current_dir()
|
||||
.ok()
|
||||
.map(|cwd| cwd.join("skills"))
|
||||
}
|
||||
```
|
||||
|
||||
**相关文件**:
|
||||
- `crates/zclaw-kernel/src/config.rs:337` - 修复位置
|
||||
- `crates/zclaw-kernel/src/kernel.rs:79-83` - 技能目录扫描逻辑
|
||||
|
||||
**验证修复**:
|
||||
1. 启动应用,查看终端日志
|
||||
2. 应看到 `[Kernel] Scanning skills directory: ./skills`
|
||||
3. 发送 "查询腾讯财报"
|
||||
4. Agent 应调用 `execute_skill("finance-tracker", {...})`
|
||||
|
||||
**已知限制**:
|
||||
`default_skills_dir()` 依赖 `current_dir()`,如果工作目录不同可能失效。更可靠的方案是使用可执行文件目录:
|
||||
|
||||
```rust
|
||||
// 建议改进
|
||||
fn default_skills_dir() -> Option<PathBuf> {
|
||||
std::env::current_exe()
|
||||
.ok()
|
||||
.and_then(|exe| exe.parent().map(|p| p.join("skills")))
|
||||
.or_else(|| std::env::current_dir().ok().map(|cwd| cwd.join("skills")))
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. 相关文档
|
||||
@@ -1177,6 +1408,8 @@ triggers:
|
||||
|
||||
| 日期 | 变更 |
|
||||
|------|------|
|
||||
| 2026-03-24 | 添加 9.5 节:阿里云百炼 Coding Plan 工具调用 400 错误 - 流式+工具不兼容、响应解析优先级、JSON 序列化问题 |
|
||||
| 2026-03-24 | 添加 10.2 节:`skills_dir: None` 导致技能系统完全失效 - from_provider() 硬编码问题 |
|
||||
| 2026-03-24 | 添加 10.1 节:Agent 无法调用合适的技能 - 系统提示词注入技能列表 + triggers 字段 |
|
||||
| 2026-03-24 | 添加 9.4 节:自我进化系统启动错误 - DateTime 类型不匹配和未使用导入警告 |
|
||||
| 2026-03-23 | 添加 9.3 节:更换模型配置后仍使用旧模型 - Agent 配置优先于 Kernel 配置导致的问题 |
|
||||
|
||||
Reference in New Issue
Block a user