P0-2: GET /usage 500 "text >= timestamptz" — usage_records.created_at
is TEXT in actual DB despite migration declaring TIMESTAMPTZ. Fixed by
using dynamic SQL with ::timestamptz explicit casts for all date
comparisons, avoiding sqlx NULL-without-type-OID binding issues.
P0-3: POST /auth/refresh 500 — refresh_tokens.expires_at/used_at are
TEXT columns. Added ::timestamptz cast to SQL queries in auth handlers
and cleanup worker.
- P0-01: Account lockout now enforced via SQL-level comparison
(locked_until > NOW()) instead of broken RFC3339 text parsing
- P1-01: Logout handler accepts JSON body with optional refresh_token,
revokes ALL refresh tokens for the account (not just current)
- P1-03: Provider display_name is now optional, falls back to name
All 6 smoke tests pass (S1-S6).
- API Key encryption at rest: verify enc: prefix in DB for provider keys
and main provider api_key
- Key pool: toggle active/inactive + delete with DB state verification
- Model Groups: full CRUD lifecycle + cascade delete + user permission
- Quota enforcement: relay_requests exhaustion verified at DB level
(middleware test infra issue noted — DB state confirmed correct)
- Provider disable: model hidden from relay/models list after disable
BUG-009 (P1): Add frontend DataMasking in saas-relay-client.ts
- Masks ID cards, phones, emails, money, company names before relay
- Unmasks tokens in AI response so user sees original data
- Mirrors Rust DataMasking middleware patterns
BUG-010 (P3): Send button transforms to Stop during streaming
- Shows square icon when isStreaming, calls cancelStream()
- Normal arrow icon when idle, calls handleSend()
BUG-011 (P2): Add ::timestamptz casts for old TEXT timestamp columns
- account/handlers.rs: dashboard stats query
- telemetry/service.rs: reported_at comparisons
- workers/aggregate_usage.rs: usage aggregation query
- saasStore.ts: replace require('./chat/conversationStore') with await import()
to fix ReferenceError in Vite ESM environment (P1)
- main.rs: fix health check pool usage formula from max_connections - num_idle
to pool.size() - num_idle, preventing false "degraded" status (P1)
- error.rs: show detailed error messages in ZCLAW_SAAS_DEV=true mode
- Update bug tracker with BUG-003 through BUG-007
PostgreSQL SUM() on bigint returns NUMERIC, causing sqlx decode errors
when Rust expects i64/Option<i64>. Root cause: key_pool.rs
select_best_key() token_count SUM was missing ::bigint, causing
DATABASE_ERROR on every relay request.
Fixed in 4 files:
- relay/key_pool.rs: SUM(token_count) — root cause of relay failure
- relay/service.rs: SUM(remaining_rpm) in sort_candidates_by_quota
- account/handlers.rs: SUM(input/output_tokens) in dashboard stats
- workers/aggregate_usage.rs: SUM(input/output_tokens) in aggregation
Addresses findings from deep code audit:
H-1: Pass abortController.signal to saasClient.chatCompletion() so
user-cancelled streams actually abort the HTTP connection (was only
stopping the read loop, leaving server-side SSE connection open).
H-2: ModelSelector now shows only when (!isTauriRuntime() || isLoggedIn).
Prevents decorative model list in Tauri local kernel mode where model
selection has no effect (violates CLAUDE.md §5.2).
M-1: Normalize CRLF to LF before SSE event boundary parsing (\n\n).
Prevents buffer overflow when behind nginx/CDN with CRLF line endings.
M-2: SQL window_minute comparison uses to_char(NOW()-interval, format)
instead of (NOW()-interval)::TEXT, matching the stored format exactly.
M-3: sort_candidates_by_quota uses same sliding 60s window as select_best_key.
LOW: Fix misleading invalidate_cache doc comment.
Root cause: start_consumer() was called in new() before any register() calls,
so the consumer's cloned HashMap was always empty. Workers like log_operation
and record_usage were never found, causing "Unknown worker" errors.
- Add WorkerDispatcher::start() method to be called after all register()s
- Update main.rs to call dispatcher.start() after 7 workers registered
- key_pool.rs: cast cooldown_until to timestamptz for comparison with NOW()
- key_pool.rs: cast request_count to bigint (INT4→INT8) for sqlx decoding
- service.rs: cast cooldown_until to timestamptz in quota sort query
- scheduler.rs: cast last_seen_at to timestamptz in device cleanup
- totp.rs: use DateTime<Utc> instead of rfc3339 string for updated_at
- P2-18: TOTP QR code local generation via qrcode lib (no external service)
- P2-21: Suspend foreign LLM providers (OpenAI/Anthropic/Gemini) for early stage
- P3-04: get_progress() now calculates actual percentage from completed/total steps
- P3-05: saveSaaSSession calls now have .catch() error logging
- P3-06: SaaS relay chatStream passes session_key/agent_id to backend
- P3-02: Whiteboard unification plan document created
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P1-04: GenerationPipeline hardcoded model="default" causing classroom
generation 404. Added model field to GenerationPipeline struct, passed
from kernel config via with_driver(driver, model). Static scene
generation now receives model parameter.
P1-03: LLM API concurrent 500 DATABASE_ERROR. Added transient DB error
retry (PoolTimedOut/Io) in create_relay_task with 200ms backoff.
Recommend setting ZCLAW_DB_MIN_CONNECTIONS=10 for burst resilience.
- Add IF NOT EXISTS to accounts_template_assignment ALTER COLUMN
- Add IF NOT EXISTS to webhooks CREATE INDEX statements
- Add created_at/updated_at columns + ON CONFLICT DO NOTHING to industry templates
- Bump SCHEMA_VERSION 13→14 to force migration re-run on existing DB
- 16 down SQL files in migrations/down/ for each incremental migration
- db::run_down_migrations() executes rollback files in reverse order
- migrate_down CLI task: task=migrate_down timestamp=20260402
- Initial schema and seed data excluded (would be destructive)
- cache: insert-then-retain pattern avoids empty-window race during refresh
- relay: manage_task_status flag for proper failover state transitions
- relay: retry_task re-resolves model groups instead of blind provider reuse
- relay: filter empty-member groups from available models list
- relay: quota cache stale entry cleanup (TTL 5x expiry)
- error: from_sqlx_unique helper for 409 vs 500 distinction
- model_config: unique constraint handling, duplicate member check
- model_config: failover_strategy whitelist, model_id vs group name conflict check
- model_config: group-scoped member removal with group_id validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sprint 2: 产品体验打磨 + 行业模板
- Create PipelineResultPreview component with tab-based output switching
- Connect workflow/hand messages to PresentationContainer in ChatArea
- Add auto-trigger first Hand after onboarding (industry-specific queries)
- Seed 3 industry agent templates (education, healthcare, design-shantou)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Model Groups provide logical model names that map to multiple physical
models across providers, with automatic failover when one provider's
key pool is exhausted.
Backend:
- New model_groups + model_group_members tables with FK constraints
- Full CRUD API (7 endpoints) with admin-only write permissions
- Cache layer: DashMap-backed CachedModelGroup with load_from_db
- Relay integration: ModelResolution enum for Direct/Group routing
- Cross-provider failover: sort_candidates_by_quota + OnceLock cache
- Relay failure path: record failure usage + relay_dequeue (fixes
queue counter leak that caused connection pool exhaustion)
- add_group_member: validate model_id exists before insert
Frontend:
- saas-relay-client: accept getModel() callback for dynamic model selection
- connectionStore: prefer conversationStore.currentModel over first available
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- saas test harness: align WorkerDispatcher::new and AppState::new
signatures with SpawnLimiter addition and init_db(&DatabaseConfig)
- growth sqlite: add CJK fallback (LIKE-based) when FTS5 unicode61
tokenizer fails on Chinese queries (unicode61 doesn't index CJK)
- ChatArea.tsx: remove duplicate useState(searchOpen) declaration on line 70
- scheduled_task/mod.rs: fix route from /api/scheduler/tasks to /api/v1/scheduler/tasks
(matches admin-v2 service baseURL pattern and all other modules)
- scheduled_task/handlers.rs: remove @reserved annotations (now has Admin V2 frontend)
- scheduled_task/handlers.rs: update doc comments with correct /api/v1/ paths
- Add welcomeMessage/quickCommands fields to Clone interface
- Persist template welcome/quick data via updateClone after creation
- FirstConversationPrompt: prefer template-provided welcome message
over dynamically generated one
- FirstConversationPrompt: render template quick_commands as chips
instead of hardcoded QUICK_ACTIONS when available
- Tighten assign/unassign template endpoint permissions from model:read
to relay:use (self-service operation for all authenticated users)
- Fix seed template tools to match actual runtime tool names
(file_read/file_write/shell_exec/web_fetch)
- Persist system_prompt/temperature/max_tokens via identity system
in agentStore.createFromTemplate()
- Fire-and-forget assignTemplate() in AgentOnboardingWizard
- Fix saas-relay-client unused variable warning
- Make pgvector extension optional in knowledge_base migration
- Increase StreamBridge timeout from 30s to 90s for thinking models
- create_item: wrap item + version INSERT in transaction for atomicity
- update_item handler: validate content length (100KB) before DB hit
- KnowledgeChunk: document missing embedding field, safe per explicit SELECT usage
Server side:
- POST /api/v1/billing/usage/increment endpoint with dimension whitelist
(hand_executions, pipeline_runs, relay_requests) and count validation (1-100)
- Returns updated usage quota after increment
Desktop side:
- New saas-billing.ts mixin with incrementUsageDimension() and
reportUsageFireAndForget() (non-blocking, safe for finally blocks)
- handStore.triggerHand: reports hand_executions after successful run
- PipelinesPanel.handleRunComplete: reports pipeline_runs on completion
- SaaSClient type declarations for new billing methods
Billing pipeline now covers all three dimensions:
relay_requests → relay handler (server-side, real-time)
hand_executions → handStore (client-side, fire-and-forget)
pipeline_runs → PipelinesPanel (client-side, fire-and-forget)
- Wire relay handler to increment_usage() for JSON responses (tokens + relay_requests)
- Wire relay handler to increment_dimension("relay_requests") for SSE streams
- Add increment_dimension() function for hand_executions/pipeline_runs dimensions
- Schedule AggregateUsageWorker hourly for reconciliation (run_on_start=true)
- Mount mock payment routes in dev mode (ZCLAW_SAAS_DEV=true)
Previously the quota middleware always allowed requests because usage
counters were never incremented. Now relay requests update billing_usage_quotas
in real-time, with the aggregator providing hourly reconciliation.
- payment.rs: create_payment, handle_payment_callback, query_payment_status
- Mock pay page for development mode with HTML confirm/cancel flow
- Payment callback handler with subscription auto-creation on success
- Alipay form-urlencoded and WeChat JSON callback parsing
- 7 new routes including callback and mock-pay endpoints
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Markdown-aware content splitting (512 token chunks with 64 overlap)
- CJK keyword extraction from chunk content with stop-word filtering
- Full refresh strategy (delete old chunks → re-insert on update)
- Phase 2 placeholder for vector embedding API integration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>