Commit Graph

22 Commits

Author SHA1 Message Date
iven
26a833d1c8 fix: resolve 17 P2 defects and 5 P3 defects from pre-launch audit
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Batch fix covering multiple modules:
- P2-01: HandRegistry Semaphore-based max_concurrent enforcement
- P2-03: Populate toolCount/metricCount from Hand trait methods
- P2-06: heartbeat_update_config minimum interval validation
- P2-07: ReflectionResult used_fallback marker for rule-based fallback
- P2-08/09: identity_propose_change parameter naming consistency
- P2-10: ClassroomMetadata is_placeholder flag for LLM failure
- P2-11: classroomStore userDidCloseDuringGeneration intent tracking
- P2-12: workflowStore pipeline_create sends actionType
- P2-13/14: PipelineInfo step_count + PipelineStepInfo for proper step mapping
- P2-15: Pipe transform support in context.resolve (8 transforms)
- P2-16: Mustache {{...}} → \${...} auto-normalization
- P2-17: SaaSLogin password placeholder 6→8
- P2-19: serialize_skill_md + update_skill preserve tools field
- P2-22: ToolOutputGuard sensitive patterns from warn→block
- P2-23: Mutex::unwrap() → unwrap_or_else in relay/service.rs
- P3-01/03/07/08/09: Various P3 fixes
- DEFECT_LIST.md: comprehensive status sync (43/51 fixed, 8 remaining)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 00:49:16 +08:00
iven
90855dc83e fix(desktop): resolve 2 release-blocking P1 defects
P1-04: GenerationPipeline hardcoded model="default" causing classroom
generation 404. Added model field to GenerationPipeline struct, passed
from kernel config via with_driver(driver, model). Static scene
generation now receives model parameter.

P1-03: LLM API concurrent 500 DATABASE_ERROR. Added transient DB error
retry (PoolTimedOut/Io) in create_relay_task with 200ms backoff.
Recommend setting ZCLAW_DB_MIN_CONNECTIONS=10 for burst resilience.
2026-04-05 19:18:41 +08:00
iven
5c48d62f7e fix(saas): harden model group failover + relay reliability
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- cache: insert-then-retain pattern avoids empty-window race during refresh
- relay: manage_task_status flag for proper failover state transitions
- relay: retry_task re-resolves model groups instead of blind provider reuse
- relay: filter empty-member groups from available models list
- relay: quota cache stale entry cleanup (TTL 5x expiry)
- error: from_sqlx_unique helper for 409 vs 500 distinction
- model_config: unique constraint handling, duplicate member check
- model_config: failover_strategy whitelist, model_id vs group name conflict check
- model_config: group-scoped member removal with group_id validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 12:26:55 +08:00
iven
be0a78a523 feat(saas): add model groups for cross-provider failover
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Model Groups provide logical model names that map to multiple physical
models across providers, with automatic failover when one provider's
key pool is exhausted.

Backend:
- New model_groups + model_group_members tables with FK constraints
- Full CRUD API (7 endpoints) with admin-only write permissions
- Cache layer: DashMap-backed CachedModelGroup with load_from_db
- Relay integration: ModelResolution enum for Direct/Group routing
- Cross-provider failover: sort_candidates_by_quota + OnceLock cache
- Relay failure path: record failure usage + relay_dequeue (fixes
  queue counter leak that caused connection pool exhaustion)
- add_group_member: validate model_id exists before insert

Frontend:
- saas-relay-client: accept getModel() callback for dynamic model selection
- connectionStore: prefer conversationStore.currentModel over first available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 09:56:21 +08:00
iven
1048901665 fix(saas): industry template audit fixes + pgvector optional + relay timeout
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- Fix seed template tools to match actual runtime tool names
  (file_read/file_write/shell_exec/web_fetch)
- Persist system_prompt/temperature/max_tokens via identity system
  in agentStore.createFromTemplate()
- Fire-and-forget assignTemplate() in AgentOnboardingWizard
- Fix saas-relay-client unused variable warning
- Make pgvector extension optional in knowledge_base migration
- Increase StreamBridge timeout from 30s to 90s for thinking models
2026-04-03 15:10:13 +08:00
iven
52bdafa633 refactor(crates): kernel/generation module split + DeerFlow optimizations + middleware + dead code cleanup
- Split zclaw-kernel/kernel.rs (1486 lines) into 9 domain modules
- Split zclaw-kernel/generation.rs (1080 lines) into 3 modules
- Add DeerFlow-inspired middleware: DanglingTool, SubagentLimit, ToolError, ToolOutputGuard
- Add PromptBuilder for structured system prompt assembly
- Add FactStore (zclaw-memory) for persistent fact extraction
- Add task builtin tool for agent task management
- Driver improvements: Anthropic/OpenAI extended thinking, Gemini safety settings
- Replace let _ = with proper log::warn! across SaaS handlers
- Remove unused dependency (url) from zclaw-hands
2026-04-03 00:28:03 +08:00
iven
28299807b6 fix(desktop): DeerFlow UI — ChatArea refactor + ai-elements + dead CSS cleanup
ChatArea retry button uses setInput instead of direct sendToGateway,
fix bootstrap spinner stuck for non-logged-in users,
remove dead CSS (aurora-title/sidebar-open/quick-action-chips),
add ai components (ReasoningBlock/StreamingText/ChatMode/ModelSelector/TaskProgress),
add ClassroomPlayer + ResizableChatLayout + artifact panel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 19:24:44 +08:00
iven
2ff696289f fix(saas): reduce DB connection pool pressure in relay path
Some checks failed
CI / Rust Check (push) Has been cancelled
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
1. key_pool: merge 3 serial UPDATE queries into 2 (cumulative stats +
   last_used_at combined into single UPDATE)
2. service: reduce SSE spawn sleep from 3s to 500ms and add 5s timeout
   on DB operations to prevent connection hoarding
2026-03-31 13:47:43 +08:00
iven
3e5d64484e fix(relay): fix llm_routing read path bug and add User-Agent header for Coding Plan
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
1. connectionStore.ts: storedAccount.account.llm_routing → storedAccount.llm_routing
   - saveSaaSSession stores SaaSAccountInfo directly, not { account: SaaSAccountInfo }
   - This bug caused admin llm_routing config to never take effect

2. relay/service.rs: add User-Agent: claude-code/1.0 header
   - Kimi Coding Plan requires recognized coding agent User-Agent
   - Default reqwest UA is rejected with 403

3. Docs: add llm_routing routing mode explanation and troubleshooting entries
2026-03-31 12:02:32 +08:00
iven
eb956d0dce feat: 新增管理后台前端项目及安全加固
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor(saas): 重构认证中间件与限流策略
- 登录限流调整为5次/分钟/IP
- 注册限流调整为3次/小时/IP
- GET请求不计入限流

fix(saas): 修复调度器时间戳处理
- 使用NOW()替代文本时间戳
- 兼容TEXT和TIMESTAMPTZ列类型

feat(saas): 实现环境变量插值
- 支持${ENV_VAR}语法解析
- 数据库密码支持环境变量注入

chore: 新增前端管理界面
- 基于React+Ant Design Pro
- 包含路由守卫/错误边界
- 对接58个API端点

docs: 更新安全加固文档
- 新增密钥管理规范
- 记录P0安全项审计结果
- 补充TLS终止说明

test: 完善配置解析单元测试
- 新增环境变量插值测试用例
2026-03-31 00:11:33 +08:00
iven
544358764e fix(relay): 移除 SSE usage 记录中重复的 sleep
service.rs L316-317 有两行相同的 tokio::time::sleep(3s),
导致 SSE 流结束后实际等待 6 秒而非 3 秒才记录 usage。
2026-03-30 14:26:22 +08:00
iven
ba2c6a6105 fix(saas): P1 审计修复 — 连接池断路器 + Worker重试 + XSS防护 + 状态机SQL解析器
P1 修复内容:
- F7: health handler 连接池容量检查 (80%阈值返回503 degraded)
- F9: SSE spawned task 并发限制 (Semaphore 16 permits)
- F10: Key Pool 单次 JOIN 查询优化 (消除 N+1)
- F12: CORS panic → 配置错误
- F14: 连接池使用率计算修正 (ratio = used*100/total)
- F15: SQL 迁移解析器替换为状态机 (支持 $$, DO $body$, 存储过程)
- Worker 重试机制: 失败任务通过 mpsc channel 重新入队
- DOMPurify XSS 防护 (PipelineResultPreview)
- Admin V2: ErrorBoundary + SWR全局配置 + 请求优化
2026-03-30 14:21:39 +08:00
iven
834aa12076 fix: P0 panic风险修复 + P1编译warnings清零 + P2代码/文档清理
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
P0 安全性:
- account/handlers.rs: .unwrap() → .expect() 语义化错误信息
- relay/handlers.rs: SSE Response .unwrap() → .expect()

P1 编译质量 (6 warnings → 0):
- kernel.rs: 移除未使用的 Capability import 和 config_clone 变量
- pipeline_commands.rs: 未使用变量 id → _id
- db.rs: 移除多余括号
- relay/service.rs: 移除未使用的 StreamExt import
- telemetry/service.rs: 抑制 param_idx 未读赋值警告
- main.rs: TcpKeepalive::with_retries() Linux-only 条件编译

P2 代码清理:
- 移除 handStore/HandsPanel/HandTaskPanel/gateway-api/SchedulerPanel 调试 console.log
- SchedulerPanel: 修复 updateWorkflow 未解构导致 TS 编译错误
- 文档清理 zclaw-channels 已移除 crate 的引用
2026-03-30 11:33:47 +08:00
iven
7de294375b feat(auth): 添加异步密码哈希和验证函数
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor(relay): 复用HTTP客户端和请求体序列化结果

feat(kernel): 添加获取单个审批记录的方法

fix(store): 改进SaaS连接错误分类和降级处理

docs: 更新审计文档和系统架构文档

refactor(prompt): 优化SQL查询参数化绑定

refactor(migration): 使用静态SQL和COALESCE更新配置项

feat(commands): 添加审批执行状态追踪和事件通知

chore: 更新启动脚本以支持Admin后台

fix(auth-guard): 优化授权状态管理和错误处理

refactor(db): 使用异步密码哈希函数

refactor(totp): 使用异步密码验证函数

style: 清理无用文件和注释

docs: 更新功能全景和审计文档

refactor(service): 优化HTTP客户端重用和请求处理

fix(connection): 改进SaaS不可用时的降级处理

refactor(handlers): 使用异步密码验证函数

chore: 更新依赖和工具链配置
2026-03-29 21:45:29 +08:00
iven
a0ca35c9dd feat(saas): SQL 迁移系统 + TIMESTAMPTZ + 热路径重构
P0: SQL 迁移系统
- crates/zclaw-saas/migrations/ — 独立 SQL 迁移文件目录
- 20260329000001_initial_schema.sql — TIMESTAMPTZ 完整 schema
- 20260329000002_seed_data.sql — 角色种子数据
- db.rs: 移除 335 行内联 SCHEMA_SQL,改为文件加载
- 版本追踪: saas_schema_version 表管理迁移状态
- 向后兼容: 已有 TEXT 时间戳数据库不受影响

P1: 安全重构
- relay/service.rs: update_task_status 从 format!() 改为 3 条独立参数化查询
- config.rs: 移除 TODO 注释,补充字段文档说明
- state.rs: 添加 dispatch_log_operation 异步日志派发方法

P2: Worker 集成
- state.rs: WorkerDispatcher 接入 AppState
- 所有异步后台任务基础设施就绪
2026-03-29 19:41:03 +08:00
iven
8b9d506893 refactor(saas): 架构重构 + 性能优化 — 借鉴 loco-rs 模式
Phase 0: 知识库
- docs/knowledge-base/loco-rs-patterns.md — loco-rs 10 个可借鉴模式研究

Phase 1: 数据层重构
- crates/zclaw-saas/src/models/ — 15 个 FromRow 类型化模型
- Login 3 次查询合并为 1 次 AccountLoginRow 查询
- 所有 service 文件从元组解构迁移到 FromRow 结构体

Phase 2: Worker + Scheduler 系统
- crates/zclaw-saas/src/workers/ — Worker trait + 5 个具体实现
- crates/zclaw-saas/src/scheduler.rs — TOML 声明式调度器
- crates/zclaw-saas/src/tasks/ — CLI 任务系统

Phase 3: 性能修复
- Relay N+1 查询 → 精准 SQL (relay/handlers.rs)
- Config RwLock → AtomicU32 无锁 rate limit (state.rs, middleware.rs)
- SSE std::sync::Mutex → tokio::sync::Mutex (relay/service.rs)
- /auth/refresh 阻塞清理 → Scheduler 定期执行

Phase 4: 多环境配置
- config/saas-{development,production,test}.toml
- ZCLAW_ENV 环境选择 + ZCLAW_SAAS_CONFIG 精确覆盖
- scheduler 配置集成到 TOML
2026-03-29 19:21:48 +08:00
iven
5fdf96c3f5 chore: 提交所有工作进度 — SaaS 后端增强、Admin UI、桌面端集成
包含大量 SaaS 平台改进、Admin 管理后台更新、桌面端集成完善、
文档同步、测试文件重构等内容。为 QA 测试准备干净工作树。
2026-03-29 10:46:41 +08:00
iven
452ff45a5f feat(saas): P2 增强 — TOTP 2FA、Relay 重试、配置同步升级
- TOTP 2FA: totp-rs v5.7.1 + data-encoding Base32, setup/verify/disable 流程,
  登录时 TOTP 验证集成, SaasError::Totp 返回 400
- Relay 重试: 指数退避 (base_delay_ms * 2^attempt), 错误分类 (4xx 不重试),
  Admin POST /tasks/:id/retry 端点
- 配置同步: push (客户端覆盖) / merge (SaaS 优先) / diff (只读对比),
  实际写入 config_items 表
- 集成测试: 27 个测试全部通过 (新增 6 个 P2 测试)
- 文档: 更新 SaaS 平台总览 (模块完成度 + API 端点列表)
2026-03-27 17:58:14 +08:00
iven
d760b9ca10 feat(saas): Phase 1 后端能力补强 — API Token 认证、真实 SSE 流式、速率限制
Phase 1.1: API Token 认证中间件
- auth_middleware 新增 zclaw_ 前缀 token 分支 (SHA-256 验证)
- 合并 token 自身权限与角色权限,异步更新 last_used_at
- 添加 GET /api/v1/auth/me 端点返回当前用户信息
- get_role_permissions 改为 pub(crate) 供中间件调用

Phase 1.2: 真实 SSE 流式中转
- RelayResponse::Sse 改为 axum::body::Body (bytes_stream)
- 流式请求超时提升至 300s,转发 SSE headers (Cache-Control, Connection)
- 添加 futures 依赖用于 StreamExt

Phase 1.3: 滑动窗口速率限制中间件
- 按 account_id 做 per-minute 限流 (默认 60 rpm + 10 burst)
- 超限返回 429 + Retry-After header
- RateLimitConfig 支持配置化,DashMap 存储时间戳

21 tests passed, zero warnings.
2026-03-27 13:49:45 +08:00
iven
900430d93e fix(saas): 修复安全审查发现的 Critical/High/Medium 问题
- Critical: 移除注册接口的 role 字段,固定为 "user" 防止权限提升
- High: 生产环境未配置 cors_origins 时拒绝启动而非默认全开放
- Medium: 增强 SSRF 防护 — 阻止 IPv6 映射地址、私有 IP 网段、十进制 IP 格式
2026-03-27 13:09:59 +08:00
iven
94bf387aee fix(saas): 安全修复 — IDOR防护、SSRF防护、JWT密钥强制、错误信息脱敏、CORS配置化
- account: admin 权限守卫 (list_accounts/get_account/update_status/list_logs)
- relay: SSRF 防护 (禁止内网地址、限制 http scheme、30s 超时)
- config: 生产环境强制 ZCLAW_SAAS_JWT_SECRET 环境变量
- error: 500 错误不再泄露内部细节给客户端
- main: CORS 支持配置白名单 origins
- 全部 21 个测试通过 (7 unit + 14 integration)
2026-03-27 13:07:20 +08:00
iven
a99a3df9dd feat(saas): Phase 3 — 模型请求中转服务
- OpenAI 兼容 API 代理 (/api/v1/relay/chat/completions)
- 中转任务管理 (创建/查询/状态跟踪)
- 可用模型列表端点 (仅 enabled providers+models)
- 任务生命周期 (queued → processing → completed/failed)
- 用量自动记录 (token 统计 + 错误追踪)
- 3 个新集成测试覆盖中转端点
2026-03-27 12:58:02 +08:00