perf(relay): full-chain optimization — key pool, model sync, SSE stream
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Some checks failed
CI / Lint & TypeCheck (push) Has been cancelled
CI / Unit Tests (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
CI / Rust Check (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Phase 1 (Key Pool correctness): - RPM: fixed-minute window → sliding 60s aggregation (prevents 2x burst) - Remove fallback-to-provider-key bypass when all keys rate-limited - SSE semaphore: 16→64 permits, cleanup delay 60s→5s - Default 429 cooldown: 5min→60s (better for Coding Plan quotas) - Expire old key_usage_window rows on record Phase 2 (Frontend model sync): - currentModel empty-string fallback to glm-4-flash-250414 in relay client - Merge duplicate listModels() calls in connectionStore SaaS path - Show ModelSelector in Tauri mode when models available - Clear currentModel on SaaS logout Phase 3 (Relay performance): - Key Pool: DashMap in-memory cache (TTL 5s) for select_best_key - Cache invalidation on 429 marking Phase 4 (SSE stream): - AbortController integration for user-cancelled streams - SSE parsing: split by event boundaries (\n\n) instead of per-line - streamStore cancelStream adapts to 0-arg and 1-arg cancel fns
This commit is contained in:
@@ -19,8 +19,8 @@ const STREAMBRIDGE_HEARTBEAT_INTERVAL: Duration = Duration::from_secs(15);
|
||||
/// 实测 Kimi for Coding 的 thinking→content 间隔可达 60s+,需要更宽容的超时。
|
||||
const STREAMBRIDGE_TIMEOUT: Duration = Duration::from_secs(180);
|
||||
|
||||
/// 流结束后延迟清理的时间窗口
|
||||
const STREAMBRIDGE_CLEANUP_DELAY: Duration = Duration::from_secs(60);
|
||||
/// 流结束后延迟清理的时间窗口(缩短到 5s,仅用于 Arc 引用释放)
|
||||
const STREAMBRIDGE_CLEANUP_DELAY: Duration = Duration::from_secs(5);
|
||||
|
||||
/// 判断 HTTP 状态码是否为可重试的瞬态错误 (5xx + 429)
|
||||
fn is_retryable_status(status: u16) -> bool {
|
||||
@@ -357,7 +357,7 @@ pub async fn execute_relay(
|
||||
// SSE 流结束后异步记录 usage + Key 使用量
|
||||
// 使用全局 Arc<Semaphore> 限制并发 spawned tasks,防止高并发时耗尽连接池
|
||||
static SSE_SPAWN_SEMAPHORE: std::sync::OnceLock<Arc<tokio::sync::Semaphore>> = std::sync::OnceLock::new();
|
||||
let semaphore = SSE_SPAWN_SEMAPHORE.get_or_init(|| Arc::new(tokio::sync::Semaphore::new(16)));
|
||||
let semaphore = SSE_SPAWN_SEMAPHORE.get_or_init(|| Arc::new(tokio::sync::Semaphore::new(64)));
|
||||
let permit = match semaphore.clone().try_acquire_owned() {
|
||||
Ok(p) => p,
|
||||
Err(_) => {
|
||||
|
||||
Reference in New Issue
Block a user