Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor: 统一Hands系统常量到单个源文件 refactor: 更新Hands中文名称和描述 fix: 修复技能市场在连接状态变化时重新加载 fix: 修复身份变更提案的错误处理逻辑 docs: 更新多个功能文档的验证状态和实现位置 docs: 更新Hands系统文档 test: 添加测试文件验证工作区路径
418 lines
14 KiB
Markdown
418 lines
14 KiB
Markdown
# 智能技能路由系统
|
||
|
||
> **设计目标**: 让 ZCLAW 能智能地理解用户意图,自动选择和调用合适的技能,而不是依赖硬编码的触发词。
|
||
|
||
---
|
||
|
||
## 一、问题分析
|
||
|
||
### 1.1 当前方案的问题
|
||
|
||
```
|
||
用户: "查询腾讯财报"
|
||
↓
|
||
硬编码触发词匹配: "财报" ∈ triggers?
|
||
↓
|
||
❌ 如果 triggers 中没有 "财报",技能不会被调用
|
||
```
|
||
|
||
**问题**:
|
||
1. **无法覆盖所有表达方式** - 用户可能说 "财务数据"、"盈利情况"、"营收报告"...
|
||
2. **维护成本高** - 每个技能都需要维护触发词列表
|
||
3. **无语义理解** - 无法理解 "帮我分析一下这家公司的赚钱能力" 也是财务分析
|
||
|
||
### 1.2 设计目标
|
||
|
||
```
|
||
用户: "帮我分析一下腾讯最近赚了多少钱"
|
||
↓
|
||
语义理解: 意图 = 财务分析, 实体 = 腾讯, 指标 = 盈利
|
||
↓
|
||
智能路由: 最佳匹配技能 = finance-tracker
|
||
↓
|
||
✅ 自动调用 execute_skill("finance-tracker", {company: "腾讯", metrics: ["profit"]})
|
||
```
|
||
|
||
---
|
||
|
||
## 二、智能路由架构
|
||
|
||
### 2.1 三层架构
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ LLM Orchestrator │
|
||
│ - 理解用户意图 │
|
||
│ - 决定是否需要调用技能 │
|
||
│ - 选择最佳技能 │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Semantic Skill Router │
|
||
│ - 技能描述向量化 │
|
||
│ - 查询-技能语义匹配 │
|
||
│ - Top-K 候选检索 │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Skill Registry │
|
||
│ - 77 个技能的元数据 │
|
||
│ - 描述、能力、示例 │
|
||
│ - 向量索引 │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 2.2 路由流程
|
||
|
||
```
|
||
用户消息
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ 1. 意图分类 │ ──→ 是否需要技能?
|
||
│ (LLM 判断) │ ├─ 否 → 直接对话
|
||
└─────────────────────┘ └─ 是 ↓
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ 2. 语义检索 │ ──→ Top-3 候选技能
|
||
│ (Embedding) │ (基于描述相似度)
|
||
└─────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ 3. 精细选择 │ ──→ 最佳技能 + 参数
|
||
│ (LLM 决策) │ (考虑上下文、依赖)
|
||
└─────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────┐
|
||
│ 4. 技能执行 │ ──→ 执行结果
|
||
│ (execute_skill) │
|
||
└─────────────────────┘
|
||
│
|
||
▼
|
||
最终响应
|
||
```
|
||
|
||
---
|
||
|
||
## 三、核心组件设计
|
||
|
||
### 3.1 丰富的技能描述
|
||
|
||
**问题**: 当前技能描述过于简单
|
||
|
||
```yaml
|
||
# 当前 (不够丰富)
|
||
name: finance-tracker
|
||
description: "财务追踪专家"
|
||
triggers: ["财报", "财务分析"]
|
||
```
|
||
|
||
**改进**: 添加语义丰富的描述
|
||
|
||
```yaml
|
||
# 改进后
|
||
name: finance-tracker
|
||
description: |
|
||
财务追踪专家 - 专注于企业财务数据分析、财报解读、盈利能力评估。
|
||
|
||
核心能力:
|
||
- 财务报表分析 (资产负债表、利润表、现金流量表)
|
||
- 盈利能力指标 (毛利率、净利率、ROE、ROA)
|
||
- 营收增长分析 (同比、环比、复合增长率)
|
||
- 财务健康评估 (流动性、偿债能力、运营效率)
|
||
|
||
适用场景:
|
||
- 用户询问某公司的盈利、营收、利润
|
||
- 需要分析财务数据、财报数据
|
||
- 投资分析、估值计算
|
||
- 财务风险评估
|
||
|
||
不适用场景:
|
||
- 实时股价查询 → 使用 market-data
|
||
- 行业分析 → use industry-analyst
|
||
- 新闻资讯 → use news-collector
|
||
|
||
examples:
|
||
- "腾讯去年赚了多少钱"
|
||
- "分析一下苹果的财务状况"
|
||
- "帮我看看这份财报"
|
||
- "这家公司的盈利能力如何"
|
||
- "对比一下阿里和京东的营收"
|
||
|
||
capabilities:
|
||
- financial_analysis
|
||
- report_generation
|
||
- data_visualization
|
||
```
|
||
|
||
### 3.2 语义路由器实现
|
||
|
||
```rust
|
||
// crates/zclaw-kernel/src/skill_router.rs
|
||
|
||
use std::sync::Arc;
|
||
use serde::{Deserialize, Serialize};
|
||
|
||
/// 技能路由结果
|
||
#[derive(Debug, Clone)]
|
||
pub struct RoutingResult {
|
||
pub skill_id: String,
|
||
pub confidence: f32,
|
||
pub parameters: serde_json::Value,
|
||
pub reasoning: String,
|
||
}
|
||
|
||
/// 语义技能路由器
|
||
pub struct SemanticSkillRouter {
|
||
skills: Arc<SkillRegistry>,
|
||
embedder: Box<dyn Embedder>,
|
||
skill_embeddings: Vec<(String, Vec<f32>)>,
|
||
}
|
||
|
||
impl SemanticSkillRouter {
|
||
/// 检索 Top-K 候选技能
|
||
pub async fn retrieve_candidates(&self, query: &str, top_k: usize) -> Vec<(SkillManifest, f32)> {
|
||
// 1. 将查询向量化
|
||
let query_embedding = self.embedder.embed(query).await;
|
||
|
||
// 2. 计算与所有技能的相似度
|
||
let mut scores: Vec<_> = self.skill_embeddings
|
||
.iter()
|
||
.map(|(skill_id, embedding)| {
|
||
let similarity = cosine_similarity(&query_embedding, embedding);
|
||
(skill_id.clone(), similarity)
|
||
})
|
||
.collect();
|
||
|
||
// 3. 排序并返回 Top-K
|
||
scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
|
||
scores.truncate(top_k);
|
||
|
||
// 4. 返回技能元数据
|
||
scores.into_iter()
|
||
.filter_map(|(id, score)| {
|
||
self.skills.get(&id).map(|s| (s, score))
|
||
})
|
||
.collect()
|
||
}
|
||
|
||
/// 智能路由 - 结合语义检索和 LLM 决策
|
||
pub async fn route(&self, query: &str, context: &ConversationContext) -> Option<RoutingResult> {
|
||
// Step 1: 语义检索 Top-3 候选
|
||
let candidates = self.retrieve_candidates(query, 3).await;
|
||
|
||
if candidates.is_empty() {
|
||
return None;
|
||
}
|
||
|
||
// Step 2: 如果最高分超过阈值,直接返回
|
||
if candidates[0].1 > 0.85 {
|
||
let (skill, _) = &candidates[0];
|
||
return Some(RoutingResult {
|
||
skill_id: skill.id.to_string(),
|
||
confidence: candidates[0].1,
|
||
parameters: extract_parameters(query, &skill.id),
|
||
reasoning: format!("High semantic match ({}%)", (candidates[0].1 * 100.0) as i32),
|
||
});
|
||
}
|
||
|
||
// Step 3: 否则让 LLM 精细选择
|
||
self.llm_select_skill(query, candidates, context).await
|
||
}
|
||
|
||
/// LLM 精细选择
|
||
async fn llm_select_skill(
|
||
&self,
|
||
query: &str,
|
||
candidates: Vec<(SkillManifest, f32)>,
|
||
context: &ConversationContext,
|
||
) -> Option<RoutingResult> {
|
||
let prompt = self.build_selection_prompt(query, &candidates, context);
|
||
|
||
// 调用 LLM 进行选择
|
||
let response = self.llm.complete(&prompt).await?;
|
||
|
||
// 解析 LLM 响应
|
||
parse_llm_routing_response(&response, candidates)
|
||
}
|
||
|
||
fn build_selection_prompt(
|
||
&self,
|
||
query: &str,
|
||
candidates: &[(SkillManifest, f32)],
|
||
context: &ConversationContext,
|
||
) -> String {
|
||
format!(
|
||
r#"You are a skill router. Analyze the user query and select the best skill to handle it.
|
||
|
||
## User Query
|
||
{}
|
||
|
||
## Conversation Context
|
||
{}
|
||
|
||
## Candidate Skills
|
||
{}
|
||
|
||
## Instructions
|
||
1. Analyze the user's intent and required capabilities
|
||
2. Select the MOST appropriate skill from the candidates
|
||
3. Extract any parameters mentioned in the query
|
||
4. If no skill is appropriate, respond with "none"
|
||
|
||
## Response Format (JSON)
|
||
{{
|
||
"selected_skill": "skill_id or null",
|
||
"confidence": 0.0-1.0,
|
||
"parameters": {{}},
|
||
"reasoning": "Brief explanation"
|
||
}}
|
||
"#,
|
||
query,
|
||
context.summary(),
|
||
candidates.iter()
|
||
.map(|(s, score)| format!("- {} ({}%): {}", s.id, (score * 100.0) as i32, s.description))
|
||
.collect::<Vec<_>>()
|
||
.join("\n")
|
||
)
|
||
}
|
||
}
|
||
|
||
fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
|
||
let dot: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
|
||
let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
|
||
let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
|
||
dot / (norm_a * norm_b + 1e-10)
|
||
}
|
||
```
|
||
|
||
### 3.3 系统提示词增强
|
||
|
||
```rust
|
||
// 在 kernel.rs 中
|
||
|
||
/// 构建智能技能提示
|
||
fn build_skill_aware_system_prompt(&self, base_prompt: Option<&String>) -> String {
|
||
let mut prompt = base_prompt
|
||
.map(|p| p.clone())
|
||
.unwrap_or_else(|| "You are ZCLAW, an intelligent AI assistant.".to_string());
|
||
|
||
prompt.push_str("\n\n## Your Capabilities\n\n");
|
||
prompt.push_str("You have access to specialized skills. Use the `execute_skill` tool when:\n");
|
||
prompt.push_str("- The user's request matches a skill's domain\n");
|
||
prompt.push_str("- You need specialized expertise for a task\n");
|
||
prompt.push_str("- The task would benefit from a structured workflow\n\n");
|
||
|
||
prompt.push_str("**Important**: You should autonomously decide when to use skills based on your understanding of the user's intent. ");
|
||
prompt.push_str("Do not wait for explicit skill names - recognize the need and act.\n\n");
|
||
|
||
prompt.push_str("## Available Skills\n\n");
|
||
|
||
// 注入技能摘要 (不是完整列表,减少 token)
|
||
let skills = futures::executor::block_on(self.skills.list());
|
||
for skill in skills.iter().take(20) { // 只展示前 20 个最相关的
|
||
prompt.push_str(&format!(
|
||
"- **{}**: {}\n",
|
||
skill.id.as_str(),
|
||
&skill.description[..skill.description.char_indices().take(100).last().map(|(i, _)| i).unwrap_or(skill.description.len())]
|
||
));
|
||
}
|
||
|
||
if skills.len() > 20 {
|
||
prompt.push_str(&format!("\n... and {} more skills available.\n", skills.len() - 20));
|
||
}
|
||
|
||
prompt
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 四、实现计划
|
||
|
||
### Phase 1: 基础架构 (当前)
|
||
|
||
- [x] 在系统提示词中注入技能列表
|
||
- [x] 添加 `triggers` 字段到 SkillManifest
|
||
- [x] 更新 SKILL.md 解析器
|
||
|
||
### Phase 2: 语义路由
|
||
|
||
1. **集成 Embedding 模型**
|
||
- 使用本地模型 (如 `all-MiniLM-L6-v2`)
|
||
- 或调用 LLM API 获取 embedding
|
||
|
||
2. **构建技能向量索引**
|
||
- 启动时预计算所有技能描述的 embedding
|
||
- 支持增量更新
|
||
|
||
3. **实现 Hybrid Router**
|
||
- 语义检索 Top-K 候选
|
||
- LLM 精细选择
|
||
|
||
### Phase 3: 智能编排
|
||
|
||
1. **多技能协调**
|
||
- 识别需要多个技能的任务
|
||
- 自动编排执行顺序
|
||
|
||
2. **上下文感知**
|
||
- 根据对话历史调整技能选择
|
||
- 记住用户偏好
|
||
|
||
3. **自主学习**
|
||
- 记录用户反馈
|
||
- 优化路由策略
|
||
|
||
---
|
||
|
||
## 五、技术选型
|
||
|
||
### 5.1 Embedding 模型
|
||
|
||
| 选项 | 优点 | 缺点 |
|
||
|------|------|------|
|
||
| **本地 `all-MiniLM-L6-v2`** | 快速、离线、免费 | 需要额外依赖 |
|
||
| **LLM API Embedding** | 高质量 | 需要网络、有成本 |
|
||
| **OpenAI text-embedding-3-small** | 高质量、多语言 | 需要付费 |
|
||
|
||
**推荐**: 使用 LLM Provider 的 embedding API (如果支持),否则使用本地模型。
|
||
|
||
### 5.2 向量存储
|
||
|
||
| 选项 | 适用场景 |
|
||
|------|---------|
|
||
| **内存 HashMap** | 技能数量 < 100 |
|
||
| **SQLite + vec** | 持久化、简单 |
|
||
| **Qdrant/Chroma** | 大规模、需要过滤 |
|
||
|
||
**推荐**: 对于 77 个技能,内存 HashMap 足够。
|
||
|
||
---
|
||
|
||
## 六、参考资料
|
||
|
||
- [LLM Skills vs Tools: The Missing Layer in Agent Design](https://www.abstractalgorithms.dev/llm-skills-vs-tools-in-agent-design)
|
||
- [Tool Selection for LLM Agents: Routing Strategies](https://mbrenndoerfer.com/writing/tool-selection-llm-agents-routing-strategies)
|
||
- [Semantic Tool Selection](https://vllm-semantic-router.com/zh-Hans/blog/semantic-tool-selection)
|
||
|
||
---
|
||
|
||
## 七、总结
|
||
|
||
**核心原则**:
|
||
1. **让 LLM 自主决策** - 不要硬编码触发词
|
||
2. **语义理解优于关键词匹配** - 理解用户意图
|
||
3. **Hybrid 是最佳实践** - embedding 过滤 + LLM 决策
|
||
4. **丰富的描述是关键** - 技能描述要有示例、边界、能力
|
||
|
||
**下一步**:
|
||
1. 实现语义路由器原型
|
||
2. 增强技能描述
|
||
3. 测试和优化
|