iven
9060935401
perf(runtime): Hermes Phase 1-3 — prompt caching + parallel tools + smart retry
Phase 1: Anthropic prompt caching
- Add cache_control ephemeral on system prompt blocks
- Track cache_creation/cache_read tokens in CompletionResponse + StreamChunk
Phase 2A: Parallel tool execution
- Add ToolConcurrency enum (ReadOnly/Exclusive/Interactive)
- JoinSet + Semaphore(3) for bounded parallel tool calls
- 7 tools annotated with correct concurrency level
- AtomicU32 for lock-free failure tracking in ToolErrorMiddleware
Phase 2B: Tool output pruning
- prune_tool_outputs() trims old ToolResult > 2000 chars to 500 chars
- Integrated into CompactionMiddleware before token estimation
Phase 3: Error classification + smart retry
- LlmErrorKind + ClassifiedLlmError for structured error mapping
- RetryDriver decorator with jittered exponential backoff
- Kernel wraps all LLM calls with RetryDriver
- CONTEXT_OVERFLOW recovery triggers emergency compaction in loop_runner
2026-04-24 08:39:56 +08:00
..
2026-04-19 08:54:57 +08:00
2026-04-19 08:54:57 +08:00
2026-04-24 08:39:56 +08:00
2026-03-27 07:56:53 +08:00
2026-04-18 14:09:36 +08:00
2026-04-19 08:46:12 +08:00
2026-04-18 14:09:36 +08:00
2026-04-18 08:17:58 +08:00
2026-04-11 12:51:27 +08:00
2026-04-15 10:02:49 +08:00
2026-04-04 07:44:42 +08:00
2026-04-15 10:02:49 +08:00