iven/zclaw_openfang

Fork 0

Files

iven 8898bb399e

CI / Lint & TypeCheck (push) Has been cancelled

Details

CI / Unit Tests (push) Has been cancelled

Details

CI / Build Frontend (push) Has been cancelled

Details

CI / Rust Check (push) Has been cancelled

Details

CI / Security Scan (push) Has been cancelled

Details

CI / E2E Tests (push) Has been cancelled

Details

docs: audit reports + feature docs + skills + admin-v2 + config sync

Update audit tracker, roadmap, architecture docs,
add admin-v2 Roles page + Billing tests,
sync CLAUDE.md, Cargo.toml, docker-compose.yml,
add deep-research / frontend-design / chart-visualization skills

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-02 19:25:00 +08:00

14 KiB

Raw Blame History

上下文压缩系统 (Context Compaction)

成熟度: L4 - 生产 (内核 AgentLoop 已集成，前端重复压缩已移除) 最后更新: 2026-04-01 负责人: Intelligence Layer Team

概述

上下文压缩系统解决了无限对话长度的核心挑战：

Token 限制管理 - 监控对话长度，防止超出模型限制
智能摘要 - 将历史对话压缩为简洁摘要
信息保留 - 确保关键决策、偏好、上下文不丢失
无感知压缩 - 用户无需手动管理对话历史

核心概念

压缩配置 (CompactionConfig)

interface CompactionConfig {
  soft_threshold_tokens: number;      // 软阈值（触发压缩建议）
  hard_threshold_tokens: number;      // 硬阈值（强制压缩）
  reserve_tokens: number;             // 为响应预留的 token
  memory_flush_enabled: boolean;      // 是否在压缩前刷新记忆
  keep_recent_messages: number;       // 保留的最近消息数
  summary_max_tokens: number;         // 摘要最大 token 数
  use_llm: boolean;                   // 是否使用 LLM 生成摘要
  llm_fallback_to_rules: boolean;     // LLM 失败时回退到规则
}

压缩检查 (CompactionCheck)

interface CompactionCheck {
  should_compact: boolean;            // 是否需要压缩
  current_tokens: number;             // 当前 token 数
  threshold: number;                  // 触发阈值
  urgency: 'none' | 'soft' | 'hard';  // 紧急程度
}

压缩结果 (CompactionResult)

interface CompactionResult {
  compacted_messages: CompactableMessage[];  // 压缩后的消息列表
  summary: string;                           // 生成的摘要
  original_count: number;                    // 原始消息数
  retained_count: number;                    // 保留消息数
  flushed_memories: number;                  // 刷新的记忆数
  tokens_before_compaction: number;          // 压缩前 token
  tokens_after_compaction: number;           // 压缩后 token
}

压缩流程

┌─────────────────────────────────────────────────────────┐
│                   Context Compaction                     │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   ┌──────────────┐                                      │
│   │ 新消息到达   │                                      │
│   └──────┬───────┘                                      │
│          │                                              │
│          ▼                                              │
│   ┌──────────────┐     soft_threshold      ┌─────────┐ │
│   │ Token 计算   │─────────────────────────▶│ 建议压缩 │ │
│   └──────┬───────┘                          └─────────┘ │
│          │                                              │
│          │ hard_threshold                               │
│          ▼                                              │
│   ┌──────────────┐                                      │
│   │ 强制压缩     │                                      │
│   └──────┬───────┘                                      │
│          │                                              │
│          ▼                                              │
│   ┌──────────────────────────────────────────────┐     │
│   │ 1. 保留最近 N 条消息                          │     │
│   │ 2. 对旧消息生成摘要                           │     │
│   │ 3. 可选：提取记忆到 Memory Store             │     │
│   │ 4. 替换旧消息为摘要                           │     │
│   └──────────────────────────────────────────────┘     │
│          │                                              │
│          ▼                                              │
│   ┌──────────────┐                                      │
│   │ 压缩完成     │                                      │
│   └──────────────┘                                      │
│                                                          │
└─────────────────────────────────────────────────────────┘

Token 估算算法

CJK + 英文混合估算

// Rust 实现 (compactor.rs)
pub fn estimate_tokens(text: &str) -> usize {
    let mut tokens: f64 = 0.0;
    for char in text.chars() {
        let code = char as u32;
        if code >= 0x4E00 && code <= 0x9FFF {
            // CJK 基本汉字 → 1.5 tokens
            tokens += 1.5;
        } else if code >= 0x3400 && code <= 0x4DBF {
            // CJK 扩展 A → 1.5 tokens
            tokens += 1.5;
        } else if code >= 0x3000 && code <= 0x303F {
            // CJK 标点 → 1.0 token
            tokens += 1.0;
        } else if char == ' ' || char == '\n' || char == '\t' {
            // 空白字符 → 0.25 token
            tokens += 0.25;
        } else {
            // ASCII 字符 → ~0.3 token (4 chars ≈ 1 token)
            tokens += 0.3;
        }
    }
    tokens.ceil() as usize
}

设计原则：宁可高估，不可低估。高估会提前触发压缩，但不会导致 API 错误。

摘要生成

规则摘要（当前实现）

fn generate_summary(&self, messages: &[CompactableMessage]) -> String {
    let mut sections: Vec<String> = vec!["[以下是之前对话的摘要]".to_string()];

    // 1. 提取讨论主题
    let topics = extract_topics(user_messages);
    sections.push(format!("讨论主题: {}", topics.join("; ")));

    // 2. 提取关键结论
    let conclusions = extract_conclusions(assistant_messages);
    sections.push(format!("关键结论:\n- {}", conclusions.join("\n- ")));

    // 3. 提取技术上下文（代码片段等）
    let tech_context = extract_technical_context(messages);
    sections.push(format!("技术上下文: {}", tech_context.join(", ")));

    // 4. 统计信息
    sections.push(format!("(已压缩 {} 条消息)", messages.len()));

    sections.join("\n")
}

摘要示例

[以下是之前对话的摘要]
讨论主题: 如何在 Rust 中实现异步 HTTP 服务器; 性能优化建议
关键结论:
- 使用 tokio::run 作为异步运行时
- 考虑使用连接池减少开销
- 建议启用 HTTP/2 支持提升性能
技术上下文: 代码片段 (rust), 代码片段 (toml)
(已压缩 24 条消息，其中用户 12 条，助手 12 条)

技术实现

核心文件

文件	用途
`desktop/src-tauri/src/intelligence/compactor.rs`	Rust 压缩核心实现
`desktop/src/lib/intelligence-backend.ts`	TypeScript API 封装
`desktop/src/domains/intelligence/store.ts`	状态管理

Tauri Commands

#[tauri::command]
pub fn compactor_estimate_tokens(text: String) -> usize;

#[tauri::command]
pub fn compactor_estimate_messages_tokens(messages: Vec<CompactableMessage>) -> usize;

#[tauri::command]
pub fn compactor_check_threshold(
    messages: Vec<CompactableMessage>,
    config: Option<CompactionConfig>,
) -> CompactionCheck;

#[tauri::command]
pub fn compactor_compact(
    messages: Vec<CompactableMessage>,
    agent_id: String,
    conversation_id: Option<String>,
    config: Option<CompactionConfig>,
) -> CompactionResult;

前端 API

// intelligence-backend.ts
export const compactor = {
  estimateTokens(text: string): Promise<number>;
  estimateMessagesTokens(messages: CompactableMessage[]): Promise<number>;
  checkThreshold(messages: CompactableMessage[], config?: CompactionConfig): Promise<CompactionCheck>;
  compact(messages: CompactableMessage[], agentId: string, conversationId?: string, config?: CompactionConfig): Promise<CompactionResult>;
};

使用场景

场景 1：自动压缩

// 在发送消息前检查
const check = await intelligence.compactor.checkThreshold(messages);

if (check.urgency === 'hard') {
  // 强制压缩
  const result = await intelligence.compactor.compact(messages, agentId);
  setMessages(result.compacted_messages);
  console.log(`压缩完成: ${result.tokens_before_compaction} → ${result.tokens_after_compaction} tokens`);
} else if (check.urgency === 'soft') {
  // 建议用户压缩或等待
  showCompactionSuggestion();
}

场景 2：手动压缩

// 用户主动触发压缩
const result = await intelligence.compactor.compact(
  messages,
  agentId,
  conversationId,
  {
    soft_threshold_tokens: 12000,
    keep_recent_messages: 10,
  }
);

场景 3：压缩 + 记忆提取

// 压缩前先提取记忆
if (config.memory_flush_enabled) {
  const memories = await extractMemoriesFromOldMessages(oldMessages);
  for (const memory of memories) {
    await intelligence.memory.store(memory);
  }
}

// 然后执行压缩
const result = await intelligence.compactor.compact(messages, agentId);

与其他组件的集成

┌─────────────────────────────────────────────────────┐
│                Context Compactor                     │
├─────────────────────────────────────────────────────┤
│                                                      │
│   ┌──────────────┐     ┌──────────────┐             │
│   │ ChatStore    │────▶│ Token 检查   │             │
│   └──────────────┘     └──────────────┘             │
│          │                    │                      │
│          │                    ▼                      │
│   ┌──────────────┐     ┌──────────────┐             │
│   │ Memory Store │◀────│ 记忆提取     │             │
│   └──────────────┘     └──────────────┘             │
│          │                                           │
│          │                                           │
│          ▼                                           │
│   ┌──────────────────────────────────────────────┐  │
│   │           摘要生成                            │  │
│   │   - 规则提取（当前）                          │  │
│   │   - LLM 摘要（可选）                          │  │
│   └──────────────────────────────────────────────┘  │
│          │                                           │
│          ▼                                           │
│   ┌──────────────┐                                   │
│   │ 压缩后消息   │                                   │
│   └──────────────┘                                   │
│                                                      │
└─────────────────────────────────────────────────────┘

配置示例

开发模式（频繁压缩测试）

{
  soft_threshold_tokens: 5000,
  hard_threshold_tokens: 8000,
  reserve_tokens: 2000,
  memory_flush_enabled: true,
  keep_recent_messages: 4,
  summary_max_tokens: 400,
  use_llm: false,
  llm_fallback_to_rules: true,
}

生产模式（较长上下文）

{
  soft_threshold_tokens: 15000,
  hard_threshold_tokens: 20000,
  reserve_tokens: 4000,
  memory_flush_enabled: true,
  keep_recent_messages: 6,
  summary_max_tokens: 800,
  use_llm: false,
  llm_fallback_to_rules: true,
}

大上下文模式（32K 模型）

{
  soft_threshold_tokens: 25000,
  hard_threshold_tokens: 30000,
  reserve_tokens: 6000,
  memory_flush_enabled: true,
  keep_recent_messages: 10,
  summary_max_tokens: 1200,
  use_llm: true,  // 启用 LLM 摘要
  llm_fallback_to_rules: true,
}

限制与未来改进

当前限制

规则摘要质量有限 - 无法理解复杂语义，可能丢失重要细节
无增量压缩 - 每次都重新处理所有旧消息
无压缩预览 - 用户无法在压缩前预览摘要内容
LLM 摘要未实现 - use_llm: true 配置存在但未实际使用

未来改进

LLM 增强摘要 - 使用轻量模型生成高质量摘要
增量压缩 - 只处理新增的消息，保留之前的摘要
压缩预览 - 显示摘要内容，允许用户编辑
智能保留 - 基于重要性的消息保留策略
压缩历史 - 保存压缩记录，支持回溯

14 KiB Raw Blame History Unescape Escape

上下文压缩系统 (Context Compaction)

概述

核心概念

压缩配置 (CompactionConfig)

压缩检查 (CompactionCheck)

压缩结果 (CompactionResult)

压缩流程

Token 估算算法

CJK + 英文混合估算

摘要生成

规则摘要（当前实现）

摘要示例

技术实现

核心文件

Tauri Commands

前端 API

使用场景

场景 1：自动压缩

场景 2：手动压缩

场景 3：压缩 + 记忆提取

与其他组件的集成

配置示例

开发模式（频繁压缩测试）

生产模式（较长上下文）

大上下文模式（32K 模型）

限制与未来改进

当前限制

未来改进

相关文档

14 KiB

Raw Blame History