feat: 新增技能编排引擎和工作流构建器组件
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor: 统一Hands系统常量到单个源文件 refactor: 更新Hands中文名称和描述 fix: 修复技能市场在连接状态变化时重新加载 fix: 修复身份变更提案的错误处理逻辑 docs: 更新多个功能文档的验证状态和实现位置 docs: 更新Hands系统文档 test: 添加测试文件验证工作区路径
This commit is contained in:
417
docs/features/04-skills-ecosystem/01-intelligent-routing.md
Normal file
417
docs/features/04-skills-ecosystem/01-intelligent-routing.md
Normal file
@@ -0,0 +1,417 @@
|
||||
# 智能技能路由系统
|
||||
|
||||
> **设计目标**: 让 ZCLAW 能智能地理解用户意图,自动选择和调用合适的技能,而不是依赖硬编码的触发词。
|
||||
|
||||
---
|
||||
|
||||
## 一、问题分析
|
||||
|
||||
### 1.1 当前方案的问题
|
||||
|
||||
```
|
||||
用户: "查询腾讯财报"
|
||||
↓
|
||||
硬编码触发词匹配: "财报" ∈ triggers?
|
||||
↓
|
||||
❌ 如果 triggers 中没有 "财报",技能不会被调用
|
||||
```
|
||||
|
||||
**问题**:
|
||||
1. **无法覆盖所有表达方式** - 用户可能说 "财务数据"、"盈利情况"、"营收报告"...
|
||||
2. **维护成本高** - 每个技能都需要维护触发词列表
|
||||
3. **无语义理解** - 无法理解 "帮我分析一下这家公司的赚钱能力" 也是财务分析
|
||||
|
||||
### 1.2 设计目标
|
||||
|
||||
```
|
||||
用户: "帮我分析一下腾讯最近赚了多少钱"
|
||||
↓
|
||||
语义理解: 意图 = 财务分析, 实体 = 腾讯, 指标 = 盈利
|
||||
↓
|
||||
智能路由: 最佳匹配技能 = finance-tracker
|
||||
↓
|
||||
✅ 自动调用 execute_skill("finance-tracker", {company: "腾讯", metrics: ["profit"]})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、智能路由架构
|
||||
|
||||
### 2.1 三层架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ LLM Orchestrator │
|
||||
│ - 理解用户意图 │
|
||||
│ - 决定是否需要调用技能 │
|
||||
│ - 选择最佳技能 │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Semantic Skill Router │
|
||||
│ - 技能描述向量化 │
|
||||
│ - 查询-技能语义匹配 │
|
||||
│ - Top-K 候选检索 │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Skill Registry │
|
||||
│ - 77 个技能的元数据 │
|
||||
│ - 描述、能力、示例 │
|
||||
│ - 向量索引 │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2.2 路由流程
|
||||
|
||||
```
|
||||
用户消息
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 1. 意图分类 │ ──→ 是否需要技能?
|
||||
│ (LLM 判断) │ ├─ 否 → 直接对话
|
||||
└─────────────────────┘ └─ 是 ↓
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 2. 语义检索 │ ──→ Top-3 候选技能
|
||||
│ (Embedding) │ (基于描述相似度)
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 3. 精细选择 │ ──→ 最佳技能 + 参数
|
||||
│ (LLM 决策) │ (考虑上下文、依赖)
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 4. 技能执行 │ ──→ 执行结果
|
||||
│ (execute_skill) │
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
最终响应
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 三、核心组件设计
|
||||
|
||||
### 3.1 丰富的技能描述
|
||||
|
||||
**问题**: 当前技能描述过于简单
|
||||
|
||||
```yaml
|
||||
# 当前 (不够丰富)
|
||||
name: finance-tracker
|
||||
description: "财务追踪专家"
|
||||
triggers: ["财报", "财务分析"]
|
||||
```
|
||||
|
||||
**改进**: 添加语义丰富的描述
|
||||
|
||||
```yaml
|
||||
# 改进后
|
||||
name: finance-tracker
|
||||
description: |
|
||||
财务追踪专家 - 专注于企业财务数据分析、财报解读、盈利能力评估。
|
||||
|
||||
核心能力:
|
||||
- 财务报表分析 (资产负债表、利润表、现金流量表)
|
||||
- 盈利能力指标 (毛利率、净利率、ROE、ROA)
|
||||
- 营收增长分析 (同比、环比、复合增长率)
|
||||
- 财务健康评估 (流动性、偿债能力、运营效率)
|
||||
|
||||
适用场景:
|
||||
- 用户询问某公司的盈利、营收、利润
|
||||
- 需要分析财务数据、财报数据
|
||||
- 投资分析、估值计算
|
||||
- 财务风险评估
|
||||
|
||||
不适用场景:
|
||||
- 实时股价查询 → 使用 market-data
|
||||
- 行业分析 → use industry-analyst
|
||||
- 新闻资讯 → use news-collector
|
||||
|
||||
examples:
|
||||
- "腾讯去年赚了多少钱"
|
||||
- "分析一下苹果的财务状况"
|
||||
- "帮我看看这份财报"
|
||||
- "这家公司的盈利能力如何"
|
||||
- "对比一下阿里和京东的营收"
|
||||
|
||||
capabilities:
|
||||
- financial_analysis
|
||||
- report_generation
|
||||
- data_visualization
|
||||
```
|
||||
|
||||
### 3.2 语义路由器实现
|
||||
|
||||
```rust
|
||||
// crates/zclaw-kernel/src/skill_router.rs
|
||||
|
||||
use std::sync::Arc;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
/// 技能路由结果
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct RoutingResult {
|
||||
pub skill_id: String,
|
||||
pub confidence: f32,
|
||||
pub parameters: serde_json::Value,
|
||||
pub reasoning: String,
|
||||
}
|
||||
|
||||
/// 语义技能路由器
|
||||
pub struct SemanticSkillRouter {
|
||||
skills: Arc<SkillRegistry>,
|
||||
embedder: Box<dyn Embedder>,
|
||||
skill_embeddings: Vec<(String, Vec<f32>)>,
|
||||
}
|
||||
|
||||
impl SemanticSkillRouter {
|
||||
/// 检索 Top-K 候选技能
|
||||
pub async fn retrieve_candidates(&self, query: &str, top_k: usize) -> Vec<(SkillManifest, f32)> {
|
||||
// 1. 将查询向量化
|
||||
let query_embedding = self.embedder.embed(query).await;
|
||||
|
||||
// 2. 计算与所有技能的相似度
|
||||
let mut scores: Vec<_> = self.skill_embeddings
|
||||
.iter()
|
||||
.map(|(skill_id, embedding)| {
|
||||
let similarity = cosine_similarity(&query_embedding, embedding);
|
||||
(skill_id.clone(), similarity)
|
||||
})
|
||||
.collect();
|
||||
|
||||
// 3. 排序并返回 Top-K
|
||||
scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
|
||||
scores.truncate(top_k);
|
||||
|
||||
// 4. 返回技能元数据
|
||||
scores.into_iter()
|
||||
.filter_map(|(id, score)| {
|
||||
self.skills.get(&id).map(|s| (s, score))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// 智能路由 - 结合语义检索和 LLM 决策
|
||||
pub async fn route(&self, query: &str, context: &ConversationContext) -> Option<RoutingResult> {
|
||||
// Step 1: 语义检索 Top-3 候选
|
||||
let candidates = self.retrieve_candidates(query, 3).await;
|
||||
|
||||
if candidates.is_empty() {
|
||||
return None;
|
||||
}
|
||||
|
||||
// Step 2: 如果最高分超过阈值,直接返回
|
||||
if candidates[0].1 > 0.85 {
|
||||
let (skill, _) = &candidates[0];
|
||||
return Some(RoutingResult {
|
||||
skill_id: skill.id.to_string(),
|
||||
confidence: candidates[0].1,
|
||||
parameters: extract_parameters(query, &skill.id),
|
||||
reasoning: format!("High semantic match ({}%)", (candidates[0].1 * 100.0) as i32),
|
||||
});
|
||||
}
|
||||
|
||||
// Step 3: 否则让 LLM 精细选择
|
||||
self.llm_select_skill(query, candidates, context).await
|
||||
}
|
||||
|
||||
/// LLM 精细选择
|
||||
async fn llm_select_skill(
|
||||
&self,
|
||||
query: &str,
|
||||
candidates: Vec<(SkillManifest, f32)>,
|
||||
context: &ConversationContext,
|
||||
) -> Option<RoutingResult> {
|
||||
let prompt = self.build_selection_prompt(query, &candidates, context);
|
||||
|
||||
// 调用 LLM 进行选择
|
||||
let response = self.llm.complete(&prompt).await?;
|
||||
|
||||
// 解析 LLM 响应
|
||||
parse_llm_routing_response(&response, candidates)
|
||||
}
|
||||
|
||||
fn build_selection_prompt(
|
||||
&self,
|
||||
query: &str,
|
||||
candidates: &[(SkillManifest, f32)],
|
||||
context: &ConversationContext,
|
||||
) -> String {
|
||||
format!(
|
||||
r#"You are a skill router. Analyze the user query and select the best skill to handle it.
|
||||
|
||||
## User Query
|
||||
{}
|
||||
|
||||
## Conversation Context
|
||||
{}
|
||||
|
||||
## Candidate Skills
|
||||
{}
|
||||
|
||||
## Instructions
|
||||
1. Analyze the user's intent and required capabilities
|
||||
2. Select the MOST appropriate skill from the candidates
|
||||
3. Extract any parameters mentioned in the query
|
||||
4. If no skill is appropriate, respond with "none"
|
||||
|
||||
## Response Format (JSON)
|
||||
{{
|
||||
"selected_skill": "skill_id or null",
|
||||
"confidence": 0.0-1.0,
|
||||
"parameters": {{}},
|
||||
"reasoning": "Brief explanation"
|
||||
}}
|
||||
"#,
|
||||
query,
|
||||
context.summary(),
|
||||
candidates.iter()
|
||||
.map(|(s, score)| format!("- {} ({}%): {}", s.id, (score * 100.0) as i32, s.description))
|
||||
.collect::<Vec<_>>()
|
||||
.join("\n")
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
|
||||
let dot: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
|
||||
let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
|
||||
let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
|
||||
dot / (norm_a * norm_b + 1e-10)
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 系统提示词增强
|
||||
|
||||
```rust
|
||||
// 在 kernel.rs 中
|
||||
|
||||
/// 构建智能技能提示
|
||||
fn build_skill_aware_system_prompt(&self, base_prompt: Option<&String>) -> String {
|
||||
let mut prompt = base_prompt
|
||||
.map(|p| p.clone())
|
||||
.unwrap_or_else(|| "You are ZCLAW, an intelligent AI assistant.".to_string());
|
||||
|
||||
prompt.push_str("\n\n## Your Capabilities\n\n");
|
||||
prompt.push_str("You have access to specialized skills. Use the `execute_skill` tool when:\n");
|
||||
prompt.push_str("- The user's request matches a skill's domain\n");
|
||||
prompt.push_str("- You need specialized expertise for a task\n");
|
||||
prompt.push_str("- The task would benefit from a structured workflow\n\n");
|
||||
|
||||
prompt.push_str("**Important**: You should autonomously decide when to use skills based on your understanding of the user's intent. ");
|
||||
prompt.push_str("Do not wait for explicit skill names - recognize the need and act.\n\n");
|
||||
|
||||
prompt.push_str("## Available Skills\n\n");
|
||||
|
||||
// 注入技能摘要 (不是完整列表,减少 token)
|
||||
let skills = futures::executor::block_on(self.skills.list());
|
||||
for skill in skills.iter().take(20) { // 只展示前 20 个最相关的
|
||||
prompt.push_str(&format!(
|
||||
"- **{}**: {}\n",
|
||||
skill.id.as_str(),
|
||||
&skill.description[..skill.description.char_indices().take(100).last().map(|(i, _)| i).unwrap_or(skill.description.len())]
|
||||
));
|
||||
}
|
||||
|
||||
if skills.len() > 20 {
|
||||
prompt.push_str(&format!("\n... and {} more skills available.\n", skills.len() - 20));
|
||||
}
|
||||
|
||||
prompt
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 四、实现计划
|
||||
|
||||
### Phase 1: 基础架构 (当前)
|
||||
|
||||
- [x] 在系统提示词中注入技能列表
|
||||
- [x] 添加 `triggers` 字段到 SkillManifest
|
||||
- [x] 更新 SKILL.md 解析器
|
||||
|
||||
### Phase 2: 语义路由
|
||||
|
||||
1. **集成 Embedding 模型**
|
||||
- 使用本地模型 (如 `all-MiniLM-L6-v2`)
|
||||
- 或调用 LLM API 获取 embedding
|
||||
|
||||
2. **构建技能向量索引**
|
||||
- 启动时预计算所有技能描述的 embedding
|
||||
- 支持增量更新
|
||||
|
||||
3. **实现 Hybrid Router**
|
||||
- 语义检索 Top-K 候选
|
||||
- LLM 精细选择
|
||||
|
||||
### Phase 3: 智能编排
|
||||
|
||||
1. **多技能协调**
|
||||
- 识别需要多个技能的任务
|
||||
- 自动编排执行顺序
|
||||
|
||||
2. **上下文感知**
|
||||
- 根据对话历史调整技能选择
|
||||
- 记住用户偏好
|
||||
|
||||
3. **自主学习**
|
||||
- 记录用户反馈
|
||||
- 优化路由策略
|
||||
|
||||
---
|
||||
|
||||
## 五、技术选型
|
||||
|
||||
### 5.1 Embedding 模型
|
||||
|
||||
| 选项 | 优点 | 缺点 |
|
||||
|------|------|------|
|
||||
| **本地 `all-MiniLM-L6-v2`** | 快速、离线、免费 | 需要额外依赖 |
|
||||
| **LLM API Embedding** | 高质量 | 需要网络、有成本 |
|
||||
| **OpenAI text-embedding-3-small** | 高质量、多语言 | 需要付费 |
|
||||
|
||||
**推荐**: 使用 LLM Provider 的 embedding API (如果支持),否则使用本地模型。
|
||||
|
||||
### 5.2 向量存储
|
||||
|
||||
| 选项 | 适用场景 |
|
||||
|------|---------|
|
||||
| **内存 HashMap** | 技能数量 < 100 |
|
||||
| **SQLite + vec** | 持久化、简单 |
|
||||
| **Qdrant/Chroma** | 大规模、需要过滤 |
|
||||
|
||||
**推荐**: 对于 77 个技能,内存 HashMap 足够。
|
||||
|
||||
---
|
||||
|
||||
## 六、参考资料
|
||||
|
||||
- [LLM Skills vs Tools: The Missing Layer in Agent Design](https://www.abstractalgorithms.dev/llm-skills-vs-tools-in-agent-design)
|
||||
- [Tool Selection for LLM Agents: Routing Strategies](https://mbrenndoerfer.com/writing/tool-selection-llm-agents-routing-strategies)
|
||||
- [Semantic Tool Selection](https://vllm-semantic-router.com/zh-Hans/blog/semantic-tool-selection)
|
||||
|
||||
---
|
||||
|
||||
## 七、总结
|
||||
|
||||
**核心原则**:
|
||||
1. **让 LLM 自主决策** - 不要硬编码触发词
|
||||
2. **语义理解优于关键词匹配** - 理解用户意图
|
||||
3. **Hybrid 是最佳实践** - embedding 过滤 + LLM 决策
|
||||
4. **丰富的描述是关键** - 技能描述要有示例、边界、能力
|
||||
|
||||
**下一步**:
|
||||
1. 实现语义路由器原型
|
||||
2. 增强技能描述
|
||||
3. 测试和优化
|
||||
Reference in New Issue
Block a user