fix: 发布前审计 Batch 1 — Pipeline 内存泄漏/超时 + Director 死锁 + Rate Limit Worker

Pipeline executor:
- 添加 cleanup() 方法,MAX_COMPLETED_RUNS=100 上限淘汰旧记录
- 每步执行添加 tokio::time::timeout(使用 PipelineSpec.timeout_secs,默认 300s)
- Delay ms 上限 60000,超出 warn 并截断

Director send_to_agent:
- 重构为 oneshot::channel 响应模式,避免 inbox + pending_requests 锁竞争
- 添加 ensure_inbox_reader() 独立任务分发响应到对应 oneshot sender

cleanup_rate_limit Worker:
- 实现 Worker body: DELETE FROM rate_limit_events WHERE created_at < NOW() - INTERVAL '1 hour'

651 tests passed, 0 failed
This commit is contained in:
iven
2026-04-18 14:09:16 +08:00
parent 35a11504d7
commit f3fb5340b5
3 changed files with 171 additions and 57 deletions

View File

@@ -1,4 +1,7 @@
//! 清理过期 Rate Limit 条目 Worker
//!
//! rate_limit_events 表中的持久化条目会无限增长。
//! 此 Worker 定期删除超过 1 小时的旧条目,防止数据库膨胀。
use async_trait::async_trait;
use sqlx::PgPool;
@@ -21,10 +24,31 @@ impl Worker for CleanupRateLimitWorker {
"cleanup_rate_limit"
}
async fn perform(&self, _db: &PgPool, _args: Self::Args) -> SaasResult<()> {
// Rate limit entries are in-memory (DashMap), not in DB
// This worker is a placeholder for when rate limits are persisted
// Currently the cleanup happens in main.rs background task
async fn perform(&self, db: &PgPool, args: Self::Args) -> SaasResult<()> {
let retention_secs = args.window_secs.max(3600); // 至少保留 1 小时
let result = sqlx::query(
"DELETE FROM rate_limit_events WHERE created_at < NOW() - ($1 || ' seconds')::interval"
)
.bind(retention_secs.to_string())
.execute(db)
.await;
match result {
Ok(r) => {
let deleted = r.rows_affected();
if deleted > 0 {
tracing::info!(
"[cleanup_rate_limit] Deleted {} expired rate limit events (retention: {}s)",
deleted, retention_secs
);
}
}
Err(e) => {
tracing::error!("[cleanup_rate_limit] Failed to clean up rate limit events: {}", e);
}
}
Ok(())
}
}