perf(relay): full-chain optimization — key pool, model sync, SSE stream
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled

Phase 1 (Key Pool correctness):
- RPM: fixed-minute window → sliding 60s aggregation (prevents 2x burst)
- Remove fallback-to-provider-key bypass when all keys rate-limited
- SSE semaphore: 16→64 permits, cleanup delay 60s→5s
- Default 429 cooldown: 5min→60s (better for Coding Plan quotas)
- Expire old key_usage_window rows on record

Phase 2 (Frontend model sync):
- currentModel empty-string fallback to glm-4-flash-250414 in relay client
- Merge duplicate listModels() calls in connectionStore SaaS path
- Show ModelSelector in Tauri mode when models available
- Clear currentModel on SaaS logout

Phase 3 (Relay performance):
- Key Pool: DashMap in-memory cache (TTL 5s) for select_best_key
- Cache invalidation on 429 marking

Phase 4 (SSE stream):
- AbortController integration for user-cancelled streams
- SSE parsing: split by event boundaries (\n\n) instead of per-line
- streamStore cancelStream adapts to 0-arg and 1-arg cancel fns
This commit is contained in:
iven
2026-04-09 19:34:02 +08:00
parent 5c6964f52a
commit e6eb97dcaa
7 changed files with 191 additions and 105 deletions

View File

@@ -581,11 +581,20 @@ export const useStreamStore = create<StreamState>()(
if (!isStreaming) return;
// 1. Tell backend to abort — use sessionKey (which is the sessionId in Tauri)
// Also abort the frontend SSE fetch via cancelStream()
try {
const client = getClient();
const client = getClient() as unknown as Record<string, unknown>;
if ('cancelStream' in client) {
const sessionId = useConversationStore.getState().sessionKey || activeRunId || '';
(client as { cancelStream: (id: string) => void }).cancelStream(sessionId);
const fn = client.cancelStream;
if (typeof fn === 'function') {
// Call with or without sessionId depending on arity
if (fn.length > 0) {
const sessionId = useConversationStore.getState().sessionKey || activeRunId || '';
(fn as (id: string) => void)(sessionId);
} else {
(fn as () => void)();
}
}
}
} catch {
// Backend cancel is best-effort; proceed with local cleanup

View File

@@ -441,9 +441,10 @@ export const useConnectionStore = create<ConnectionStore>((set, get) => {
// Configure the singleton client (cookie auth — no token needed)
saasClient.setBaseUrl(session.saasUrl);
// Health check via GET /api/v1/relay/models
// Health check + model list: merged single listModels() call
let relayModels: Array<{ id: string; alias?: string }> | null = null;
try {
await saasClient.listModels();
relayModels = await saasClient.listModels();
} catch (err: unknown) {
// Handle expired session — clear auth and trigger re-login
const status = (err as { status?: number })?.status;
@@ -473,15 +474,8 @@ export const useConnectionStore = create<ConnectionStore>((set, get) => {
// baseUrl = saasUrl + /api/v1/relay → kernel appends /chat/completions
// apiKey = SaaS JWT token → sent as Authorization: Bearer <jwt>
// Fetch available models from SaaS relay (shared by both branches)
let relayModels: Array<{ id: string }>;
try {
relayModels = await saasClient.listModels();
} catch {
throw new Error('无法获取可用模型列表,请确认管理后台已配置 Provider 和模型');
}
if (relayModels.length === 0) {
// Models already fetched during health check above
if (!relayModels || relayModels.length === 0) {
throw new Error('SaaS 平台没有可用模型,请先在管理后台配置 Provider 和模型');
}

View File

@@ -425,6 +425,12 @@ export const useSaaSStore = create<SaaSStore>((set, get) => {
stopTelemetryCollector();
stopPromptOTASync();
// Clear currentModel so next connection uses fresh model resolution
try {
const { useConversationStore } = require('./chat/conversationStore');
useConversationStore.getState().setCurrentModel('');
} catch { /* non-critical */ }
set({
isLoggedIn: false,
account: null,