Commit Graph

44 Commits

Author SHA1 Message Date
iven
76cdfd0c00 fix(saas): SSE 用量统计一致性修复 — 回写 usage_records + 消除 relay_requests 双重计数
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- service.rs: SSE 流结束后回写 usage_records 真实 token (status=success)
- service.rs: spawned task 中调用 increment_usage 统一递增 tokens + relay_requests
- handlers.rs: 移除 SSE 路径的 increment_dimension("relay_requests") 消除双重计数
- 从 request_body 提取 model_id 用于 usage_records 精准归因
2026-04-15 01:40:27 +08:00
iven
9c59e6e82a fix(saas): SSE relay token capture 修复 — stream_done 标志 + 前缀兼容
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- SseUsageCapture 增加 stream_done 标志,[DONE] 和 stream 结束时设置
- parse_sse_line 兼容 "data:" 和 "data: " 两种前缀
- 增加 total_tokens 兜底解析(某些 provider 不返回 prompt_tokens)
- 轮询逻辑优先检测 stream_done,而非依赖 total > 0 条件
- 超时时增加 warn 日志记录实际 token 值

根因: 上游 provider 不在 SSE chunk 中返回 usage 时,轮询稳定逻辑
(total > 0 条件) 永远不满足,导致 token 始终为 0。
2026-04-15 00:15:03 +08:00
iven
dd854479eb fix: 三端联调测试 2 P1 + 2 P2 + 4 P3 修复
P1-07: billing get_or_create_usage 同步 max_* 列到当前计划限额
P1-08: relay handler 增加直接配额检查 (relay_requests/input/output_tokens)
P2-09: relay failover 成功后记录 tokens 并标记 completed
P2-10: Tauri agentStore saas-relay 模式下从 SaaS API 获取真实用量
P2-14: super_admin 合成 subscription + check_quota 放行
P3-19: 新建 ApiKeys.tsx 页面替代 ModelServices 路由
P3-15: antd destroyOnClose → destroyOnHidden (3处)
P3-16: ProTable onSearch → onSubmit (2处)
2026-04-14 17:48:22 +08:00
iven
4c3136890b fix: 三端联调测试 2 P0 + 6 P1 + 2 P2 修复
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
P0-1: SaaS relay 模型别名解析 — "glm-4-flash" → "glm-4-flash-250414" (resolve_model)
P0-2: config.rs interpolate_env_vars UTF-8 修复 (chars 迭代器替代 bytes as char)
      + DB 启动编码检查 + docker-compose UTF-8 编码参数

P1-3: UI 模型选择器覆盖 Agent 默认模型 (model_override 全链路: TS→Tauri→Rust kernel)
P1-6: 知识搜索管道修复 — seed_knowledge 创建 chunks + 默认分类 (seed/uploaded/distillation)
P1-7: 用量限额从当前 Plan 读取 (非 stale usage 表)
P1-8: relay 双维度配额检查 (relay_requests + input_tokens)

P2-9: SSE 路径 token 计数修复 — 流结束检测替代固定 500ms sleep + billing increment
2026-04-14 00:17:08 +08:00
iven
5599cefc41 feat(saas): 接通 embedding 模型管理全栈
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
数据库 migration 已有 is_embedding/model_type 列但全栈未使用。
打通 4 层: ModelRow → ModelInfo/CRUD → CachedModel → Admin 前端。
relay/models 端点也返回 is_embedding 字段,前端可按类型过滤。
2026-04-12 08:10:50 +08:00
iven
bd6cf8e05f fix(saas): add ::bigint cast to all SUM() aggregates for PG NUMERIC compat
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
PostgreSQL SUM() on bigint returns NUMERIC, causing sqlx decode errors
when Rust expects i64/Option<i64>. Root cause: key_pool.rs
select_best_key() token_count SUM was missing ::bigint, causing
DATABASE_ERROR on every relay request.

Fixed in 4 files:
- relay/key_pool.rs: SUM(token_count) — root cause of relay failure
- relay/service.rs: SUM(remaining_rpm) in sort_candidates_by_quota
- account/handlers.rs: SUM(input/output_tokens) in dashboard stats
- workers/aggregate_usage.rs: SUM(input/output_tokens) in aggregation
2026-04-09 22:16:27 +08:00
iven
a081a97678 fix(relay): audit fixes — abort signal, model selector guard, SSE CRLF, SQL format
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Addresses findings from deep code audit:

H-1: Pass abortController.signal to saasClient.chatCompletion() so
     user-cancelled streams actually abort the HTTP connection (was only
     stopping the read loop, leaving server-side SSE connection open).

H-2: ModelSelector now shows only when (!isTauriRuntime() || isLoggedIn).
     Prevents decorative model list in Tauri local kernel mode where model
     selection has no effect (violates CLAUDE.md §5.2).

M-1: Normalize CRLF to LF before SSE event boundary parsing (\n\n).
     Prevents buffer overflow when behind nginx/CDN with CRLF line endings.

M-2: SQL window_minute comparison uses to_char(NOW()-interval, format)
     instead of (NOW()-interval)::TEXT, matching the stored format exactly.

M-3: sort_candidates_by_quota uses same sliding 60s window as select_best_key.

LOW: Fix misleading invalidate_cache doc comment.
2026-04-09 19:51:34 +08:00
iven
e6eb97dcaa perf(relay): full-chain optimization — key pool, model sync, SSE stream
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Phase 1 (Key Pool correctness):
- RPM: fixed-minute window → sliding 60s aggregation (prevents 2x burst)
- Remove fallback-to-provider-key bypass when all keys rate-limited
- SSE semaphore: 16→64 permits, cleanup delay 60s→5s
- Default 429 cooldown: 5min→60s (better for Coding Plan quotas)
- Expire old key_usage_window rows on record

Phase 2 (Frontend model sync):
- currentModel empty-string fallback to glm-4-flash-250414 in relay client
- Merge duplicate listModels() calls in connectionStore SaaS path
- Show ModelSelector in Tauri mode when models available
- Clear currentModel on SaaS logout

Phase 3 (Relay performance):
- Key Pool: DashMap in-memory cache (TTL 5s) for select_best_key
- Cache invalidation on 429 marking

Phase 4 (SSE stream):
- AbortController integration for user-cancelled streams
- SSE parsing: split by event boundaries (\n\n) instead of per-line
- streamStore cancelStream adapts to 0-arg and 1-arg cancel fns
2026-04-09 19:34:02 +08:00
iven
0883bb28ff fix: validation hardening — agent import prompt limit, relay retry tracking, heartbeat validation
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- agent_import: add system_prompt length validation (max 50K chars)
  to prevent excessive token consumption from imported configs
- relay retry_task: wrap JoinHandle to log abort on server shutdown
- device_heartbeat: validate device_id length (1-64 chars) matching
  register endpoint constraints
2026-04-09 17:24:36 +08:00
iven
ade534d1ce feat: 添加MCP调试插件并优化流式超时处理
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor(relay): 将Provider Key管理路由移至model_config模块
fix(saas): 修复demo_keys与provider_keys的匹配逻辑
perf(runtime): 将流式响应超时从60秒延长至180秒以适配思考型模型
docs: 新增模块化审计和上线前功能测试方案文档
chore: 添加tauri-plugin-mcp依赖及相关配置
2026-04-08 13:39:06 +08:00
iven
f9303ae0c3 fix(saas): SQL type cast fixes for E2E relay flow
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- key_pool.rs: cast cooldown_until to timestamptz for comparison with NOW()
- key_pool.rs: cast request_count to bigint (INT4→INT8) for sqlx decoding
- service.rs: cast cooldown_until to timestamptz in quota sort query
- scheduler.rs: cast last_seen_at to timestamptz in device cleanup
- totp.rs: use DateTime<Utc> instead of rfc3339 string for updated_at
2026-04-07 22:24:19 +08:00
iven
7de486bfca test(saas): Phase 1 integration tests — billing + scheduled_task + knowledge (68 tests)
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- Fix TIMESTAMPTZ decode errors: add ::TEXT cast to all SELECT queries
  where Row structs use String for TIMESTAMPTZ columns (~22 locations)
- Fix Axum 0.7 route params: {id} → :id in billing/knowledge/scheduled_task routes
- Fix JSONB bind: scheduled_task INSERT uses ::jsonb cast for input_payload
- Add billing_test.rs (14 tests): plans, subscription, usage, payments, invoices
- Add scheduled_task_test.rs (12 tests): CRUD, validation, isolation
- Add knowledge_test.rs (20 tests): categories, items, versions, search, analytics, permissions
- Fix auth test regression: 6 tests were failing due to TIMESTAMPTZ type mismatch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:25:34 +08:00
iven
828be3cc9e fix: resolve 6 remaining defects (P2-18, P2-21, P3-04, P3-05, P3-06, P3-02)
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- P2-18: TOTP QR code local generation via qrcode lib (no external service)
- P2-21: Suspend foreign LLM providers (OpenAI/Anthropic/Gemini) for early stage
- P3-04: get_progress() now calculates actual percentage from completed/total steps
- P3-05: saveSaaSSession calls now have .catch() error logging
- P3-06: SaaS relay chatStream passes session_key/agent_id to backend
- P3-02: Whiteboard unification plan document created

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 09:52:28 +08:00
iven
26a833d1c8 fix: resolve 17 P2 defects and 5 P3 defects from pre-launch audit
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Batch fix covering multiple modules:
- P2-01: HandRegistry Semaphore-based max_concurrent enforcement
- P2-03: Populate toolCount/metricCount from Hand trait methods
- P2-06: heartbeat_update_config minimum interval validation
- P2-07: ReflectionResult used_fallback marker for rule-based fallback
- P2-08/09: identity_propose_change parameter naming consistency
- P2-10: ClassroomMetadata is_placeholder flag for LLM failure
- P2-11: classroomStore userDidCloseDuringGeneration intent tracking
- P2-12: workflowStore pipeline_create sends actionType
- P2-13/14: PipelineInfo step_count + PipelineStepInfo for proper step mapping
- P2-15: Pipe transform support in context.resolve (8 transforms)
- P2-16: Mustache {{...}} → \${...} auto-normalization
- P2-17: SaaSLogin password placeholder 6→8
- P2-19: serialize_skill_md + update_skill preserve tools field
- P2-22: ToolOutputGuard sensitive patterns from warn→block
- P2-23: Mutex::unwrap() → unwrap_or_else in relay/service.rs
- P3-01/03/07/08/09: Various P3 fixes
- DEFECT_LIST.md: comprehensive status sync (43/51 fixed, 8 remaining)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 00:49:16 +08:00
iven
90855dc83e fix(desktop): resolve 2 release-blocking P1 defects
P1-04: GenerationPipeline hardcoded model="default" causing classroom
generation 404. Added model field to GenerationPipeline struct, passed
from kernel config via with_driver(driver, model). Static scene
generation now receives model parameter.

P1-03: LLM API concurrent 500 DATABASE_ERROR. Added transient DB error
retry (PoolTimedOut/Io) in create_relay_task with 200ms backoff.
Recommend setting ZCLAW_DB_MIN_CONNECTIONS=10 for burst resilience.
2026-04-05 19:18:41 +08:00
iven
5c48d62f7e fix(saas): harden model group failover + relay reliability
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- cache: insert-then-retain pattern avoids empty-window race during refresh
- relay: manage_task_status flag for proper failover state transitions
- relay: retry_task re-resolves model groups instead of blind provider reuse
- relay: filter empty-member groups from available models list
- relay: quota cache stale entry cleanup (TTL 5x expiry)
- error: from_sqlx_unique helper for 409 vs 500 distinction
- model_config: unique constraint handling, duplicate member check
- model_config: failover_strategy whitelist, model_id vs group name conflict check
- model_config: group-scoped member removal with group_id validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 12:26:55 +08:00
iven
be0a78a523 feat(saas): add model groups for cross-provider failover
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Model Groups provide logical model names that map to multiple physical
models across providers, with automatic failover when one provider's
key pool is exhausted.

Backend:
- New model_groups + model_group_members tables with FK constraints
- Full CRUD API (7 endpoints) with admin-only write permissions
- Cache layer: DashMap-backed CachedModelGroup with load_from_db
- Relay integration: ModelResolution enum for Direct/Group routing
- Cross-provider failover: sort_candidates_by_quota + OnceLock cache
- Relay failure path: record failure usage + relay_dequeue (fixes
  queue counter leak that caused connection pool exhaustion)
- add_group_member: validate model_id exists before insert

Frontend:
- saas-relay-client: accept getModel() callback for dynamic model selection
- connectionStore: prefer conversationStore.currentModel over first available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 09:56:21 +08:00
iven
1048901665 fix(saas): industry template audit fixes + pgvector optional + relay timeout
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- Fix seed template tools to match actual runtime tool names
  (file_read/file_write/shell_exec/web_fetch)
- Persist system_prompt/temperature/max_tokens via identity system
  in agentStore.createFromTemplate()
- Fire-and-forget assignTemplate() in AgentOnboardingWizard
- Fix saas-relay-client unused variable warning
- Make pgvector extension optional in knowledge_base migration
- Increase StreamBridge timeout from 30s to 90s for thinking models
2026-04-03 15:10:13 +08:00
iven
52bdafa633 refactor(crates): kernel/generation module split + DeerFlow optimizations + middleware + dead code cleanup
- Split zclaw-kernel/kernel.rs (1486 lines) into 9 domain modules
- Split zclaw-kernel/generation.rs (1080 lines) into 3 modules
- Add DeerFlow-inspired middleware: DanglingTool, SubagentLimit, ToolError, ToolOutputGuard
- Add PromptBuilder for structured system prompt assembly
- Add FactStore (zclaw-memory) for persistent fact extraction
- Add task builtin tool for agent task management
- Driver improvements: Anthropic/OpenAI extended thinking, Gemini safety settings
- Replace let _ = with proper log::warn! across SaaS handlers
- Remove unused dependency (url) from zclaw-hands
2026-04-03 00:28:03 +08:00
iven
28299807b6 fix(desktop): DeerFlow UI — ChatArea refactor + ai-elements + dead CSS cleanup
ChatArea retry button uses setInput instead of direct sendToGateway,
fix bootstrap spinner stuck for non-logged-in users,
remove dead CSS (aurora-title/sidebar-open/quick-action-chips),
add ai components (ReasoningBlock/StreamingText/ChatMode/ModelSelector/TaskProgress),
add ClassroomPlayer + ResizableChatLayout + artifact panel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 19:24:44 +08:00
iven
11e3d37468 feat(billing): activate real-time quota enforcement pipeline
- Wire relay handler to increment_usage() for JSON responses (tokens + relay_requests)
- Wire relay handler to increment_dimension("relay_requests") for SSE streams
- Add increment_dimension() function for hand_executions/pipeline_runs dimensions
- Schedule AggregateUsageWorker hourly for reconciliation (run_on_start=true)
- Mount mock payment routes in dev mode (ZCLAW_SAAS_DEV=true)

Previously the quota middleware always allowed requests because usage
counters were never incremented. Now relay requests update billing_usage_quotas
in real-time, with the aggregator providing hourly reconciliation.
2026-04-02 01:52:01 +08:00
iven
9905a8d0d5 fix(saas-relay): eliminate DATABASE_ERROR by removing DB queries from critical path
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Root cause: each relay request executes 13-17 serial DB queries, exhausting
the 50-connection pool under concurrency. When pool is exhausted, sqlx returns
PoolTimedOut which maps to 500 DATABASE_ERROR.

Fixes:
1. log_operation → dispatch_log_operation (async Worker dispatch, non-blocking)
2. record_usage → tokio::spawn (3 DB queries moved off critical path)
3. DB pool: max_connections 50→100 (env-configurable), acquire_timeout 5s→8s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 14:08:21 +08:00
iven
2ff696289f fix(saas): reduce DB connection pool pressure in relay path
Some checks failed
CI / Rust Check (push) Has been cancelled
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
1. key_pool: merge 3 serial UPDATE queries into 2 (cumulative stats +
   last_used_at combined into single UPDATE)
2. service: reduce SSE spawn sleep from 3s to 500ms and add 5s timeout
   on DB operations to prevent connection hoarding
2026-03-31 13:47:43 +08:00
iven
3e5d64484e fix(relay): fix llm_routing read path bug and add User-Agent header for Coding Plan
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
1. connectionStore.ts: storedAccount.account.llm_routing → storedAccount.llm_routing
   - saveSaaSSession stores SaaSAccountInfo directly, not { account: SaaSAccountInfo }
   - This bug caused admin llm_routing config to never take effect

2. relay/service.rs: add User-Agent: claude-code/1.0 header
   - Kimi Coding Plan requires recognized coding agent User-Agent
   - Default reqwest UA is rejected with 403

3. Docs: add llm_routing routing mode explanation and troubleshooting entries
2026-03-31 12:02:32 +08:00
iven
ee51d5abcd feat(admin-v2): add ProTable search, scenarios/quick_commands form, tests, remove quota_reset_interval
- Enable ProTable search on Accounts (username/email), Models (model_id/alias),
  Providers (display_name/name) with hideInSearch for non-searchable columns
- Add scenarios (Select tags) and quick_commands (Form.List) to AgentTemplates
  create form, plus service type updates
- Remove unused quota_reset_interval from ProviderKey model, key_pool SQL,
  handlers, and frontend types; add migration + bump schema to v11
- Add Vitest config, test setup, request interceptor tests (7 cases),
  authStore tests (8 cases) — all 15 passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 11:13:16 +08:00
iven
8e6abc91e1 feat(key-pool): add LRU sorting via last_used_at column
- Add migration to add last_used_at TIMESTAMPTZ column to provider_keys
- Update select_best_key() SQL to sort by last_used_at ASC NULLS FIRST
- Update record_key_usage() to set last_used_at = NOW() on each use
- Bump SCHEMA_VERSION to 10
2026-03-31 10:14:49 +08:00
iven
eb956d0dce feat: 新增管理后台前端项目及安全加固
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor(saas): 重构认证中间件与限流策略
- 登录限流调整为5次/分钟/IP
- 注册限流调整为3次/小时/IP
- GET请求不计入限流

fix(saas): 修复调度器时间戳处理
- 使用NOW()替代文本时间戳
- 兼容TEXT和TIMESTAMPTZ列类型

feat(saas): 实现环境变量插值
- 支持${ENV_VAR}语法解析
- 数据库密码支持环境变量注入

chore: 新增前端管理界面
- 基于React+Ant Design Pro
- 包含路由守卫/错误边界
- 对接58个API端点

docs: 更新安全加固文档
- 新增密钥管理规范
- 记录P0安全项审计结果
- 补充TLS终止说明

test: 完善配置解析单元测试
- 新增环境变量插值测试用例
2026-03-31 00:11:33 +08:00
iven
544358764e fix(relay): 移除 SSE usage 记录中重复的 sleep
service.rs L316-317 有两行相同的 tokio::time::sleep(3s),
导致 SSE 流结束后实际等待 6 秒而非 3 秒才记录 usage。
2026-03-30 14:26:22 +08:00
iven
ba2c6a6105 fix(saas): P1 审计修复 — 连接池断路器 + Worker重试 + XSS防护 + 状态机SQL解析器
P1 修复内容:
- F7: health handler 连接池容量检查 (80%阈值返回503 degraded)
- F9: SSE spawned task 并发限制 (Semaphore 16 permits)
- F10: Key Pool 单次 JOIN 查询优化 (消除 N+1)
- F12: CORS panic → 配置错误
- F14: 连接池使用率计算修正 (ratio = used*100/total)
- F15: SQL 迁移解析器替换为状态机 (支持 $$, DO $body$, 存储过程)
- Worker 重试机制: 失败任务通过 mpsc channel 重新入队
- DOMPurify XSS 防护 (PipelineResultPreview)
- Admin V2: ErrorBoundary + SWR全局配置 + 请求优化
2026-03-30 14:21:39 +08:00
iven
bc8c77e7fe fix(security): P0 审计修复 — 6项关键安全/编译问题
F1: kernel.rs multi-agent 编译错误 — 重排 spawn_agent 中 A2A 注册顺序,
    在 config 被 registry.register() 消费前使用
F2: saas-config.toml 从 git 追踪中移除 — 包含数据库密码已进入版本历史
F3: config.rs 硬编码开发密钥改用 #[cfg(debug_assertions)] 编译时门控 —
    dev fallback 密钥不再进入 release 构建
F4: 公共认证端点添加 IP 速率限制 (20 RPM) — 防止暴力破解
F5: SSE relay 路由分离出全局 15s TimeoutLayer — 避免长流式响应被截断
F6: Provider API 密钥入库前 AES-256-GCM 加密 — 明文存储修复

附带:完整审计报告 docs/superpowers/specs/2026-03-30-comprehensive-audit-report.md
2026-03-30 13:32:22 +08:00
iven
834aa12076 fix: P0 panic风险修复 + P1编译warnings清零 + P2代码/文档清理
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
P0 安全性:
- account/handlers.rs: .unwrap() → .expect() 语义化错误信息
- relay/handlers.rs: SSE Response .unwrap() → .expect()

P1 编译质量 (6 warnings → 0):
- kernel.rs: 移除未使用的 Capability import 和 config_clone 变量
- pipeline_commands.rs: 未使用变量 id → _id
- db.rs: 移除多余括号
- relay/service.rs: 移除未使用的 StreamExt import
- telemetry/service.rs: 抑制 param_idx 未读赋值警告
- main.rs: TcpKeepalive::with_retries() Linux-only 条件编译

P2 代码清理:
- 移除 handStore/HandsPanel/HandTaskPanel/gateway-api/SchedulerPanel 调试 console.log
- SchedulerPanel: 修复 updateWorkflow 未解构导致 TS 编译错误
- 文档清理 zclaw-channels 已移除 crate 的引用
2026-03-30 11:33:47 +08:00
iven
13c0b18bbc feat: Batch 5-9 — GrowthIntegration桥接、验证补全、死代码清理、Pipeline模板、Speech/Twitter真实实现
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Batch 5 (P0): GrowthIntegration 接入 Tauri
- Kernel 新增 set_viking()/set_extraction_driver() 桥接 SqliteStorage
- 中间件链共享存储,MemoryExtractor 接入 LLM 驱动

Batch 6 (P1): 输入验证 + Heartbeat
- Relay 验证补全(stream 兼容检查、API key 格式校验)
- UUID 类型校验、SessionId 错误返回
- Heartbeat 默认开启 + 首次聊天自动初始化

Batch 7 (P2): 死代码清理
- zclaw-channels 整体移除(317 行)
- multi-agent 特性门控、admin 方法标注

Batch 8 (P2): Pipeline 模板
- PipelineMetadata 新增 annotations 字段
- pipeline_templates 命令 + 2 个示例模板
- fallback driver base_url 修复(doubao/qwen/deepseek 端点)

Batch 9 (P1): SpeechHand/TwitterHand 真实实现
- SpeechHand: tts_method 字段 + Browser TTS 前端集成 (Web Speech API)
- TwitterHand: 12 个 action 全部替换为 Twitter API v2 真实 HTTP 调用
- chatStore/useAutomationEvents 双路径 TTS 触发
2026-03-30 09:24:50 +08:00
iven
7de294375b feat(auth): 添加异步密码哈希和验证函数
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
refactor(relay): 复用HTTP客户端和请求体序列化结果

feat(kernel): 添加获取单个审批记录的方法

fix(store): 改进SaaS连接错误分类和降级处理

docs: 更新审计文档和系统架构文档

refactor(prompt): 优化SQL查询参数化绑定

refactor(migration): 使用静态SQL和COALESCE更新配置项

feat(commands): 添加审批执行状态追踪和事件通知

chore: 更新启动脚本以支持Admin后台

fix(auth-guard): 优化授权状态管理和错误处理

refactor(db): 使用异步密码哈希函数

refactor(totp): 使用异步密码验证函数

style: 清理无用文件和注释

docs: 更新功能全景和审计文档

refactor(service): 优化HTTP客户端重用和请求处理

fix(connection): 改进SaaS不可用时的降级处理

refactor(handlers): 使用异步密码验证函数

chore: 更新依赖和工具链配置
2026-03-29 21:45:29 +08:00
iven
a0ca35c9dd feat(saas): SQL 迁移系统 + TIMESTAMPTZ + 热路径重构
P0: SQL 迁移系统
- crates/zclaw-saas/migrations/ — 独立 SQL 迁移文件目录
- 20260329000001_initial_schema.sql — TIMESTAMPTZ 完整 schema
- 20260329000002_seed_data.sql — 角色种子数据
- db.rs: 移除 335 行内联 SCHEMA_SQL,改为文件加载
- 版本追踪: saas_schema_version 表管理迁移状态
- 向后兼容: 已有 TEXT 时间戳数据库不受影响

P1: 安全重构
- relay/service.rs: update_task_status 从 format!() 改为 3 条独立参数化查询
- config.rs: 移除 TODO 注释,补充字段文档说明
- state.rs: 添加 dispatch_log_operation 异步日志派发方法

P2: Worker 集成
- state.rs: WorkerDispatcher 接入 AppState
- 所有异步后台任务基础设施就绪
2026-03-29 19:41:03 +08:00
iven
8b9d506893 refactor(saas): 架构重构 + 性能优化 — 借鉴 loco-rs 模式
Phase 0: 知识库
- docs/knowledge-base/loco-rs-patterns.md — loco-rs 10 个可借鉴模式研究

Phase 1: 数据层重构
- crates/zclaw-saas/src/models/ — 15 个 FromRow 类型化模型
- Login 3 次查询合并为 1 次 AccountLoginRow 查询
- 所有 service 文件从元组解构迁移到 FromRow 结构体

Phase 2: Worker + Scheduler 系统
- crates/zclaw-saas/src/workers/ — Worker trait + 5 个具体实现
- crates/zclaw-saas/src/scheduler.rs — TOML 声明式调度器
- crates/zclaw-saas/src/tasks/ — CLI 任务系统

Phase 3: 性能修复
- Relay N+1 查询 → 精准 SQL (relay/handlers.rs)
- Config RwLock → AtomicU32 无锁 rate limit (state.rs, middleware.rs)
- SSE std::sync::Mutex → tokio::sync::Mutex (relay/service.rs)
- /auth/refresh 阻塞清理 → Scheduler 定期执行

Phase 4: 多环境配置
- config/saas-{development,production,test}.toml
- ZCLAW_ENV 环境选择 + ZCLAW_SAAS_CONFIG 精确覆盖
- scheduler 配置集成到 TOML
2026-03-29 19:21:48 +08:00
iven
5fdf96c3f5 chore: 提交所有工作进度 — SaaS 后端增强、Admin UI、桌面端集成
包含大量 SaaS 平台改进、Admin 管理后台更新、桌面端集成完善、
文档同步、测试文件重构等内容。为 QA 测试准备干净工作树。
2026-03-29 10:46:41 +08:00
iven
452ff45a5f feat(saas): P2 增强 — TOTP 2FA、Relay 重试、配置同步升级
- TOTP 2FA: totp-rs v5.7.1 + data-encoding Base32, setup/verify/disable 流程,
  登录时 TOTP 验证集成, SaasError::Totp 返回 400
- Relay 重试: 指数退避 (base_delay_ms * 2^attempt), 错误分类 (4xx 不重试),
  Admin POST /tasks/:id/retry 端点
- 配置同步: push (客户端覆盖) / merge (SaaS 优先) / diff (只读对比),
  实际写入 config_items 表
- 集成测试: 27 个测试全部通过 (新增 6 个 P2 测试)
- 文档: 更新 SaaS 平台总览 (模块完成度 + API 端点列表)
2026-03-27 17:58:14 +08:00
iven
8cce2283f7 fix(saas): P0 安全修复 + P1 功能补全 — 角色提升、Admin 引导、IP 记录、密码修改
P0 安全修复:
- 修复 account update 自角色提升漏洞: 非 admin 用户更新自己时剥离 role 字段
- 添加 Admin 引导机制: accounts 表为空时自动从环境变量创建 super_admin

P1 功能补全:
- 所有 17 个 log_operation 调用点传入真实客户端 IP (ConnectInfo + X-Forwarded-For)
- AuthContext 新增 client_ip 字段, middleware 层自动提取
- main.rs 使用 into_make_service_with_connect_info 启用 SocketAddr 注入
- 新增 PUT /api/v1/auth/password 密码修改端点 (验证旧密码 + argon2 哈希)
- 桌面端 SaaS 设置页添加密码修改 UI (折叠式表单)
- SaaSClient 添加 changePassword() 方法
- 集成测试修复: 注入模拟 ConnectInfo 适配 onshot 测试模式
2026-03-27 14:45:47 +08:00
iven
d760b9ca10 feat(saas): Phase 1 后端能力补强 — API Token 认证、真实 SSE 流式、速率限制
Phase 1.1: API Token 认证中间件
- auth_middleware 新增 zclaw_ 前缀 token 分支 (SHA-256 验证)
- 合并 token 自身权限与角色权限,异步更新 last_used_at
- 添加 GET /api/v1/auth/me 端点返回当前用户信息
- get_role_permissions 改为 pub(crate) 供中间件调用

Phase 1.2: 真实 SSE 流式中转
- RelayResponse::Sse 改为 axum::body::Body (bytes_stream)
- 流式请求超时提升至 300s,转发 SSE headers (Cache-Control, Connection)
- 添加 futures 依赖用于 StreamExt

Phase 1.3: 滑动窗口速率限制中间件
- 按 account_id 做 per-minute 限流 (默认 60 rpm + 10 burst)
- 超限返回 429 + Retry-After header
- RateLimitConfig 支持配置化,DashMap 存储时间戳

21 tests passed, zero warnings.
2026-03-27 13:49:45 +08:00
iven
a0d59b1947 fix(saas): 统一权限体系 — check_permission 辅助函数 + admin:full 超级权限
- 新增 check_permission() 统一权限检查,admin:full 自动通过所有检查
- 统一种子角色权限名称与 handler 检查一致 (provider:manage, model:manage, config:write)
- super_admin 拥有 admin:full + 所有模块管理权限
- 全部 handler 迁移到 check_permission(),消除手动 contains 检查
2026-03-27 13:12:09 +08:00
iven
900430d93e fix(saas): 修复安全审查发现的 Critical/High/Medium 问题
- Critical: 移除注册接口的 role 字段,固定为 "user" 防止权限提升
- High: 生产环境未配置 cors_origins 时拒绝启动而非默认全开放
- Medium: 增强 SSRF 防护 — 阻止 IPv6 映射地址、私有 IP 网段、十进制 IP 格式
2026-03-27 13:09:59 +08:00
iven
94bf387aee fix(saas): 安全修复 — IDOR防护、SSRF防护、JWT密钥强制、错误信息脱敏、CORS配置化
- account: admin 权限守卫 (list_accounts/get_account/update_status/list_logs)
- relay: SSRF 防护 (禁止内网地址、限制 http scheme、30s 超时)
- config: 生产环境强制 ZCLAW_SAAS_JWT_SECRET 环境变量
- error: 500 错误不再泄露内部细节给客户端
- main: CORS 支持配置白名单 origins
- 全部 21 个测试通过 (7 unit + 14 integration)
2026-03-27 13:07:20 +08:00
iven
a99a3df9dd feat(saas): Phase 3 — 模型请求中转服务
- OpenAI 兼容 API 代理 (/api/v1/relay/chat/completions)
- 中转任务管理 (创建/查询/状态跟踪)
- 可用模型列表端点 (仅 enabled providers+models)
- 任务生命周期 (queued → processing → completed/failed)
- 用量自动记录 (token 统计 + 错误追踪)
- 3 个新集成测试覆盖中转端点
2026-03-27 12:58:02 +08:00
iven
a2f8112d69 feat(saas): Phase 1 — 基础框架与账号管理模块
- 新增 zclaw-saas crate 作为 workspace 成员
- 配置系统 (TOML + 环境变量覆盖)
- 错误类型体系 (SaasError 16 变体, IntoResponse)
- SQLite 数据库 (12 表 schema, 内存/文件双模式, 3 系统角色种子数据)
- JWT 认证 (签发/验证/刷新)
- Argon2id 密码哈希
- 认证中间件 (公开/受保护路由分层)
- 账号管理 CRUD + API Token 管理 + 操作日志
- 7 单元测试 + 5 集成测试全部通过
2026-03-27 12:58:01 +08:00