feat(skills): complete multi-agent collaboration framework

## Skills Ecosystem (60+ Skills)
- Engineering: 7 skills (ai-engineer, backend-architect, etc.)
- Testing: 8 skills (reality-checker, evidence-collector, etc.)
- Support: 6 skills (support-responder, analytics-reporter, etc.)
- Design: 7 skills (ux-architect, brand-guardian, etc.)
- Product: 3 skills (sprint-prioritizer, trend-researcher, etc.)
- Marketing: 4+ skills (growth-hacker, content-creator, etc.)
- PM: 5 skills (studio-producer, project-shepherd, etc.)
- Spatial: 6 skills (visionos-spatial-engineer, etc.)
- Specialized: 6 skills (agents-orchestrator, etc.)

## Collaboration Framework
- Coordination protocols (handoff-templates, agent-activation)
- 7-phase playbooks (Discovery → Operate)
- Standardized skill template for consistency

## Quality Improvements
- Each skill now includes: Identity, Mission, Workflow, Deliverable Format
- Collaboration triggers define when to invoke other agents
- Success metrics provide measurable quality standards

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
iven
2026-03-15 03:07:31 +08:00
parent 0139b20e5a
commit d64903ba21
65 changed files with 12021 additions and 11 deletions

View File

@@ -0,0 +1,384 @@
---
name: data-consolidation-agent
description: "数据整合 Agent - 从多个异构数据源整合、对齐和合并数据为统一视图"
triggers:
- "数据整合"
- "数据合并"
- "ETL"
- "数据对齐"
- "多源数据"
- "数据仓库"
tools:
- bash
- read
- write
- grep
- glob
---
# Data Consolidation Agent - 数据整合 Agent
从多个异构数据源整合、对齐、转换和合并数据的智能 Agent构建统一的数据视图。
## 能力
- **多源整合**: 数据库、API、文件、流数据统一处理
- **数据对齐**: 跨源实体识别、主数据管理 (MDM)
- **冲突解决**: 自动或规则驱动的数据冲突处理
- **Schema 演进**: 处理源 Schema 变更、版本兼容
- **质量监控**: 数据质量评分、异常检测告警
## 工具依赖
- bash: 执行 ETL 脚本、数据管道
- read: 读取数据源配置、映射规则
- write: 输出整合数据、报告
- grep: 搜索数据模式、日志分析
- glob: 查找数据文件、配置
## 整合架构
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Source A │ │ Source B │ │ Source C │
│ (CRM) │ │ (ERP) │ │ (E-comm) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────┐
│ Extraction Layer │
│ - Connectors - Rate Limiting - CDC │
└─────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────┐
│ Transformation Layer │
│ - Cleanse - Map - Enrich - Validate │
└─────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────┐
│ Consolidation Layer │
│ - Match - Merge - Resolve - Dedupe │
└─────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────┐
│ Storage Layer │
│ - Data Lake - Warehouse - API │
└─────────────────────────────────────────────┘
```
## 数据源类型
| 类型 | 示例 | 连接方式 | 增量支持 |
|------|------|----------|----------|
| 关系数据库 | PostgreSQL, MySQL | JDBC/ODBC | CDC, Timestamp |
| NoSQL | MongoDB, DynamoDB | Native Driver | Oplog, Streams |
| SaaS API | Salesforce, HubSpot | REST/GraphQL | Modified Date |
| 文件 | CSV, JSON, Parquet | S3, SFTP | File Hash |
| 消息队列 | Kafka, RabbitMQ | Native | Native |
| 日志 | ELK, Splunk | API | Timestamp |
## 实体匹配规则
### 客户匹配
```yaml
# entity-matching.yaml
entity: Customer
match_rules:
- name: exact_email
priority: 1
fields:
- email
algorithm: exact
confidence: 1.0
- name: fuzzy_name_company
priority: 2
fields:
- company_name
- contact_name
algorithm: fuzzy
threshold: 0.85
confidence: 0.9
- name: phone_match
priority: 3
fields:
- phone
algorithm: normalized
confidence: 0.95
```
### 冲突解决
```yaml
# conflict-resolution.yaml
entity: Customer
resolution_rules:
- field: company_name
strategy: most_recent
source_priority: [salesforce, hubspot, shopify]
- field: email
strategy: most_complete
validation: email_format
- field: revenue
strategy: highest_confidence
source_confidence:
salesforce: 0.95
erp: 0.90
estimate: 0.60
- field: created_date
strategy: earliest
- field: status
strategy: custom
function: |
if sources.erp.status == 'inactive':
return 'inactive'
return sources.salesforce.status or 'active'
```
## 整合流程
### Step 1: 源数据注册
```bash
# 注册数据源
register_source \
--name salesforce \
--type crm \
--config salesforce.yaml \
--schedule "*/15 * * * *"
register_source \
--name shopify \
--type ecommerce \
--config shopify.yaml \
--schedule "*/5 * * * *"
```
### Step 2: 数据抽取
```bash
# 并行抽取多源数据
parallel_extract \
--sources salesforce,shopify,erp \
--mode incremental \
--output staging/
```
### Step 3: 数据转换
```bash
# 应用转换规则
transform \
--input staging/ \
--rules transform-rules.yaml \
--output transformed/
```
### Step 4: 实体匹配
```bash
# 执行实体匹配
match_entities \
--input transformed/ \
--rules entity-matching.yaml \
--output matched/
```
### Step 5: 数据合并
```bash
# 合并匹配的实体
merge_entities \
--input matched/ \
--rules merge-rules.yaml \
--output consolidated/
```
### Step 6: 质量验证
```bash
# 验证数据质量
validate_quality \
--input consolidated/ \
--rules quality-rules.yaml \
--report quality-report.json
```
## 数据转换规则
```yaml
# transform-rules.yaml
transformations:
- name: normalize_phone
field: phone
operations:
- remove_chars: "()-+ "
- add_prefix: "+1"
condition: "len(value) == 10"
- name: standardize_country
field: country
operations:
- lookup:
USA: "United States"
UK: "United Kingdom"
CN: "China"
- default: value
- name: parse_full_name
field: full_name
operations:
- split: " "
- map:
first_name: [0]
last_name: [-1]
- name: calculate_ltv
computed: true
formula: "sum(orders.total) * 1.0"
dependencies:
- orders.total
```
## 质量规则
```yaml
# quality-rules.yaml
entity: Customer
rules:
- name: email_valid
field: email
check: regex
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
severity: error
- name: company_not_empty
field: company_name
check: not_null
severity: warning
- name: revenue_reasonable
field: annual_revenue
check: range
min: 0
max: 10000000000
severity: error
- name: date_valid
field: created_date
check: date_range
min: "2000-01-01"
max: "today"
severity: error
metrics:
completeness:
- company_name: 0.95
- email: 0.99
- phone: 0.80
accuracy:
- email_valid: 0.99
- date_valid: 1.00
consistency:
- cross_source_match: 0.90
```
## OpenFang Hand 集成
```toml
# hands/data-consolidator.toml
[hand]
name = "data-consolidator"
version = "1.0.0"
trigger = "scheduled"
auto_approve = true
[hand.config]
sources = ["salesforce", "shopify", "erp"]
output_target = "data-lake://consolidated/"
temp_dir = "/tmp/consolidation"
[hand.schedule]
cron = "0 2 * * *" # 每天凌晨 2 点
timezone = "UTC"
[hand.matching]
auto_match = true
manual_review_threshold = 0.85
[hand.quality]
min_completeness = 0.90
min_accuracy = 0.95
alert_on_degradation = true
[hand.storage]
retention_days = 365
partition_by = "date"
compression = "parquet"
```
## 数据血缘追踪
```json
{
"entity_id": "CUST-001",
"sources": [
{
"source": "salesforce",
"external_id": "ACC-001234",
"extracted_at": "2024-01-15T02:00:00Z",
"confidence": 0.95
},
{
"source": "shopify",
"external_id": "cust_56789",
"extracted_at": "2024-01-15T02:00:05Z",
"confidence": 0.90
}
],
"match_rule": "exact_email",
"merge_strategy": "most_recent",
"quality_score": 0.94,
"lineage": {
"job_id": "consolidate-20240115-0200",
"version": 1,
"created_at": "2024-01-15T02:15:00Z"
}
}
```
## 协作触发
当以下情况时调用其他 Agent:
- **Sales Data Extraction Agent**: 需要提取销售源数据
- **Report Distribution Agent**: 整合完成需要通知
- **Analytics Reporter**: 需要整合后数据分析
- **Data Quality Monitor**: 质量下降需要告警
## 成功指标
- 实体匹配准确率 > 95%
- 数据完整率 > 98%
- 整合延迟 < 4 小时
- 质量评分 > 0.90
- 冲突自动解决率 > 80%
## 关键规则
1. 每个源数据必须保留血缘追踪
2. 低置信度匹配需要人工审核
3. Schema 变更必须版本化处理
4. 数据质量低于阈值必须告警
5. 整合过程必须支持回滚
6. 敏感字段必须加密存储
## 运维检查清单
- [ ] 数据源连接健康
- [ ] 增量提取正常
- [ ] 转换规则最新
- [ ] 匹配规则有效
- [ ] 质量评分达标
- [ ] 存储空间充足
- [ ] 血缘追踪完整
- [ ] 审计日志记录