Commit Graph

14 Commits

Author SHA1 Message Date
iven
4ee587d070 fix(relay,store): Provider Key 自动恢复 + Agent 创建友好错误 + 登录后重连
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
P0-1: key_pool.rs 新增 cooldown 过期 Key 自动恢复逻辑。
当所有 Key 的 is_active=false 且 cooldown_until 已过期时,
自动重新激活并重试选择,避免 relay/models 返回空数组导致聊天失败。

P0-2: agentStore.ts createClone/createFromTemplate 错误信息
从原始 HTTP 错误改为可操作的中文提示(502/503/401 分类处理)。

P1-2: auth.ts login 成功后触发 connectionStore.connect(),
确保 kernel 使用新 JWT 而非旧 token。
2026-04-19 13:16:12 +08:00
iven
b69dc6115d fix(relay): API Key 解密失败自愈 — 启动迁移 + 容错跳过
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
根因: select_best_key 遇到解密失败时直接 500 返回,
不会尝试下一个 key。如果 DB 中有旧的加密格式 key,
整个 relay 请求被阻断。

修复:
- key_pool: 解密失败时 warn + skip 到下一个 key,不再 500
- key_pool: 新增 heal_provider_keys() 启动自愈迁移
  - 逐个尝试解密所有加密 key
  - 解密成功 → 用当前密钥重新加密(幂等)
  - 解密失败 → 标记 is_active=false + warn
- main.rs: 启动时调用自愈迁移(在 TOTP 迁移之后)
2026-04-16 02:40:44 +08:00
iven
bd6cf8e05f fix(saas): add ::bigint cast to all SUM() aggregates for PG NUMERIC compat
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
PostgreSQL SUM() on bigint returns NUMERIC, causing sqlx decode errors
when Rust expects i64/Option<i64>. Root cause: key_pool.rs
select_best_key() token_count SUM was missing ::bigint, causing
DATABASE_ERROR on every relay request.

Fixed in 4 files:
- relay/key_pool.rs: SUM(token_count) — root cause of relay failure
- relay/service.rs: SUM(remaining_rpm) in sort_candidates_by_quota
- account/handlers.rs: SUM(input/output_tokens) in dashboard stats
- workers/aggregate_usage.rs: SUM(input/output_tokens) in aggregation
2026-04-09 22:16:27 +08:00
iven
a081a97678 fix(relay): audit fixes — abort signal, model selector guard, SSE CRLF, SQL format
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Addresses findings from deep code audit:

H-1: Pass abortController.signal to saasClient.chatCompletion() so
     user-cancelled streams actually abort the HTTP connection (was only
     stopping the read loop, leaving server-side SSE connection open).

H-2: ModelSelector now shows only when (!isTauriRuntime() || isLoggedIn).
     Prevents decorative model list in Tauri local kernel mode where model
     selection has no effect (violates CLAUDE.md §5.2).

M-1: Normalize CRLF to LF before SSE event boundary parsing (\n\n).
     Prevents buffer overflow when behind nginx/CDN with CRLF line endings.

M-2: SQL window_minute comparison uses to_char(NOW()-interval, format)
     instead of (NOW()-interval)::TEXT, matching the stored format exactly.

M-3: sort_candidates_by_quota uses same sliding 60s window as select_best_key.

LOW: Fix misleading invalidate_cache doc comment.
2026-04-09 19:51:34 +08:00
iven
e6eb97dcaa perf(relay): full-chain optimization — key pool, model sync, SSE stream
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Phase 1 (Key Pool correctness):
- RPM: fixed-minute window → sliding 60s aggregation (prevents 2x burst)
- Remove fallback-to-provider-key bypass when all keys rate-limited
- SSE semaphore: 16→64 permits, cleanup delay 60s→5s
- Default 429 cooldown: 5min→60s (better for Coding Plan quotas)
- Expire old key_usage_window rows on record

Phase 2 (Frontend model sync):
- currentModel empty-string fallback to glm-4-flash-250414 in relay client
- Merge duplicate listModels() calls in connectionStore SaaS path
- Show ModelSelector in Tauri mode when models available
- Clear currentModel on SaaS logout

Phase 3 (Relay performance):
- Key Pool: DashMap in-memory cache (TTL 5s) for select_best_key
- Cache invalidation on 429 marking

Phase 4 (SSE stream):
- AbortController integration for user-cancelled streams
- SSE parsing: split by event boundaries (\n\n) instead of per-line
- streamStore cancelStream adapts to 0-arg and 1-arg cancel fns
2026-04-09 19:34:02 +08:00
iven
f9303ae0c3 fix(saas): SQL type cast fixes for E2E relay flow
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- key_pool.rs: cast cooldown_until to timestamptz for comparison with NOW()
- key_pool.rs: cast request_count to bigint (INT4→INT8) for sqlx decoding
- service.rs: cast cooldown_until to timestamptz in quota sort query
- scheduler.rs: cast last_seen_at to timestamptz in device cleanup
- totp.rs: use DateTime<Utc> instead of rfc3339 string for updated_at
2026-04-07 22:24:19 +08:00
iven
7de486bfca test(saas): Phase 1 integration tests — billing + scheduled_task + knowledge (68 tests)
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
- Fix TIMESTAMPTZ decode errors: add ::TEXT cast to all SELECT queries
  where Row structs use String for TIMESTAMPTZ columns (~22 locations)
- Fix Axum 0.7 route params: {id} → :id in billing/knowledge/scheduled_task routes
- Fix JSONB bind: scheduled_task INSERT uses ::jsonb cast for input_payload
- Add billing_test.rs (14 tests): plans, subscription, usage, payments, invoices
- Add scheduled_task_test.rs (12 tests): CRUD, validation, isolation
- Add knowledge_test.rs (20 tests): categories, items, versions, search, analytics, permissions
- Fix auth test regression: 6 tests were failing due to TIMESTAMPTZ type mismatch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:25:34 +08:00
iven
28299807b6 fix(desktop): DeerFlow UI — ChatArea refactor + ai-elements + dead CSS cleanup
ChatArea retry button uses setInput instead of direct sendToGateway,
fix bootstrap spinner stuck for non-logged-in users,
remove dead CSS (aurora-title/sidebar-open/quick-action-chips),
add ai components (ReasoningBlock/StreamingText/ChatMode/ModelSelector/TaskProgress),
add ClassroomPlayer + ResizableChatLayout + artifact panel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 19:24:44 +08:00
iven
2ff696289f fix(saas): reduce DB connection pool pressure in relay path
Some checks failed
CI / Rust Check (push) Has been cancelled
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
1. key_pool: merge 3 serial UPDATE queries into 2 (cumulative stats +
   last_used_at combined into single UPDATE)
2. service: reduce SSE spawn sleep from 3s to 500ms and add 5s timeout
   on DB operations to prevent connection hoarding
2026-03-31 13:47:43 +08:00
iven
ee51d5abcd feat(admin-v2): add ProTable search, scenarios/quick_commands form, tests, remove quota_reset_interval
- Enable ProTable search on Accounts (username/email), Models (model_id/alias),
  Providers (display_name/name) with hideInSearch for non-searchable columns
- Add scenarios (Select tags) and quick_commands (Form.List) to AgentTemplates
  create form, plus service type updates
- Remove unused quota_reset_interval from ProviderKey model, key_pool SQL,
  handlers, and frontend types; add migration + bump schema to v11
- Add Vitest config, test setup, request interceptor tests (7 cases),
  authStore tests (8 cases) — all 15 passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 11:13:16 +08:00
iven
8e6abc91e1 feat(key-pool): add LRU sorting via last_used_at column
- Add migration to add last_used_at TIMESTAMPTZ column to provider_keys
- Update select_best_key() SQL to sort by last_used_at ASC NULLS FIRST
- Update record_key_usage() to set last_used_at = NOW() on each use
- Bump SCHEMA_VERSION to 10
2026-03-31 10:14:49 +08:00
iven
ba2c6a6105 fix(saas): P1 审计修复 — 连接池断路器 + Worker重试 + XSS防护 + 状态机SQL解析器
P1 修复内容:
- F7: health handler 连接池容量检查 (80%阈值返回503 degraded)
- F9: SSE spawned task 并发限制 (Semaphore 16 permits)
- F10: Key Pool 单次 JOIN 查询优化 (消除 N+1)
- F12: CORS panic → 配置错误
- F14: 连接池使用率计算修正 (ratio = used*100/total)
- F15: SQL 迁移解析器替换为状态机 (支持 $$, DO $body$, 存储过程)
- Worker 重试机制: 失败任务通过 mpsc channel 重新入队
- DOMPurify XSS 防护 (PipelineResultPreview)
- Admin V2: ErrorBoundary + SWR全局配置 + 请求优化
2026-03-30 14:21:39 +08:00
iven
8b9d506893 refactor(saas): 架构重构 + 性能优化 — 借鉴 loco-rs 模式
Phase 0: 知识库
- docs/knowledge-base/loco-rs-patterns.md — loco-rs 10 个可借鉴模式研究

Phase 1: 数据层重构
- crates/zclaw-saas/src/models/ — 15 个 FromRow 类型化模型
- Login 3 次查询合并为 1 次 AccountLoginRow 查询
- 所有 service 文件从元组解构迁移到 FromRow 结构体

Phase 2: Worker + Scheduler 系统
- crates/zclaw-saas/src/workers/ — Worker trait + 5 个具体实现
- crates/zclaw-saas/src/scheduler.rs — TOML 声明式调度器
- crates/zclaw-saas/src/tasks/ — CLI 任务系统

Phase 3: 性能修复
- Relay N+1 查询 → 精准 SQL (relay/handlers.rs)
- Config RwLock → AtomicU32 无锁 rate limit (state.rs, middleware.rs)
- SSE std::sync::Mutex → tokio::sync::Mutex (relay/service.rs)
- /auth/refresh 阻塞清理 → Scheduler 定期执行

Phase 4: 多环境配置
- config/saas-{development,production,test}.toml
- ZCLAW_ENV 环境选择 + ZCLAW_SAAS_CONFIG 精确覆盖
- scheduler 配置集成到 TOML
2026-03-29 19:21:48 +08:00
iven
5fdf96c3f5 chore: 提交所有工作进度 — SaaS 后端增强、Admin UI、桌面端集成
包含大量 SaaS 平台改进、Admin 管理后台更新、桌面端集成完善、
文档同步、测试文件重构等内容。为 QA 测试准备干净工作树。
2026-03-29 10:46:41 +08:00