perf(runtime): Hermes Phase 1-3 — prompt caching + parallel tools + smart retry
Phase 1: Anthropic prompt caching - Add cache_control ephemeral on system prompt blocks - Track cache_creation/cache_read tokens in CompletionResponse + StreamChunk Phase 2A: Parallel tool execution - Add ToolConcurrency enum (ReadOnly/Exclusive/Interactive) - JoinSet + Semaphore(3) for bounded parallel tool calls - 7 tools annotated with correct concurrency level - AtomicU32 for lock-free failure tracking in ToolErrorMiddleware Phase 2B: Tool output pruning - prune_tool_outputs() trims old ToolResult > 2000 chars to 500 chars - Integrated into CompactionMiddleware before token estimation Phase 3: Error classification + smart retry - LlmErrorKind + ClassifiedLlmError for structured error mapping - RetryDriver decorator with jittered exponential backoff - Kernel wraps all LLM calls with RetryDriver - CONTEXT_OVERFLOW recovery triggers emergency compaction in loop_runner
This commit is contained in:
@@ -223,6 +223,33 @@ impl Serialize for ZclawError {
|
||||
/// Result type alias for ZCLAW operations
|
||||
pub type Result<T> = std::result::Result<T, ZclawError>;
|
||||
|
||||
/// LLM 调用错误的细粒度分类,指导重试和恢复策略
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum LlmErrorKind {
|
||||
Auth,
|
||||
AuthPermanent,
|
||||
BillingExhausted,
|
||||
RateLimited,
|
||||
Overloaded,
|
||||
ServerError,
|
||||
Timeout,
|
||||
ContextOverflow,
|
||||
ModelNotFound,
|
||||
Unknown,
|
||||
}
|
||||
|
||||
/// 分类后的 LLM 错误,附带恢复提示
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct ClassifiedLlmError {
|
||||
pub kind: LlmErrorKind,
|
||||
pub retryable: bool,
|
||||
pub should_compress: bool,
|
||||
pub should_rotate_credential: bool,
|
||||
pub retry_after: Option<std::time::Duration>,
|
||||
pub message: String,
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
Reference in New Issue
Block a user