Files
openfang/docs/security.md
iven 92e5def702
Some checks failed
CI / Check / macos-latest (push) Has been cancelled
CI / Check / ubuntu-latest (push) Has been cancelled
CI / Check / windows-latest (push) Has been cancelled
CI / Test / macos-latest (push) Has been cancelled
CI / Test / ubuntu-latest (push) Has been cancelled
CI / Test / windows-latest (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Format (push) Has been cancelled
CI / Security Audit (push) Has been cancelled
CI / Secrets Scan (push) Has been cancelled
CI / Install Script Smoke Test (push) Has been cancelled
初始化提交
2026-03-01 16:24:24 +08:00

1491 lines
46 KiB
Markdown

# OpenFang Security Architecture
This document provides a comprehensive technical reference for every security
system in the OpenFang Agent Operating System. All struct names, function
signatures, constant values, and algorithm descriptions are drawn directly from
the source code.
---
## Table of Contents
1. [Security Overview](#1-security-overview)
2. [Capability-Based Security](#2-capability-based-security)
3. [WASM Dual Metering](#3-wasm-dual-metering)
4. [Merkle Hash Chain Audit Trail](#4-merkle-hash-chain-audit-trail)
5. [Information Flow Taint Tracking](#5-information-flow-taint-tracking)
6. [Ed25519 Manifest Signing](#6-ed25519-manifest-signing)
7. [SSRF Protection](#7-ssrf-protection)
8. [Secret Zeroization](#8-secret-zeroization)
9. [OFP Mutual Authentication](#9-ofp-mutual-authentication)
10. [Security Headers](#10-security-headers)
11. [GCRA Rate Limiter](#11-gcra-rate-limiter)
12. [Path Traversal Prevention](#12-path-traversal-prevention)
13. [Subprocess Sandbox](#13-subprocess-sandbox)
14. [Prompt Injection Scanner](#14-prompt-injection-scanner)
15. [Loop Guard](#15-loop-guard)
16. [Session Repair](#16-session-repair)
17. [Health Endpoint Redaction](#17-health-endpoint-redaction)
18. [Security Configuration](#18-security-configuration)
19. [Security Dependencies](#19-security-dependencies)
---
## 1. Security Overview
OpenFang implements **defense-in-depth** security. No single mechanism is
trusted to be the sole protector; instead, 16 independent systems form
overlapping layers so that a failure in any one layer is caught by others.
| # | System | Crate | Protects Against |
|---|--------|-------|------------------|
| 1 | Capability-Based Security | `openfang-types` | Unauthorized actions by agents |
| 2 | WASM Dual Metering | `openfang-runtime` | Infinite loops, CPU DoS |
| 3 | Merkle Audit Trail | `openfang-runtime` | Tampered audit logs |
| 4 | Taint Tracking | `openfang-types` | Prompt injection, data exfiltration |
| 5 | Ed25519 Manifest Signing | `openfang-types` | Supply chain attacks |
| 6 | SSRF Protection | `openfang-runtime` | Server-Side Request Forgery |
| 7 | Secret Zeroization | `openfang-runtime`, `openfang-channels` | Memory forensics, key leakage |
| 8 | OFP Mutual Auth | `openfang-wire` | Unauthorized peer connections |
| 9 | Security Headers | `openfang-api` | XSS, clickjacking, MIME sniffing |
| 10 | GCRA Rate Limiter | `openfang-api` | API abuse, denial of service |
| 11 | Path Traversal Prevention | `openfang-runtime` | Directory traversal attacks |
| 12 | Subprocess Sandbox | `openfang-runtime` | Secret leakage via child processes |
| 13 | Prompt Injection Scanner | `openfang-skills` | Malicious skill prompts |
| 14 | Loop Guard | `openfang-runtime` | Stuck agent tool loops |
| 15 | Session Repair | `openfang-runtime` | Corrupted LLM conversation history |
| 16 | Health Endpoint Redaction | `openfang-api` | Information leakage |
---
## 2. Capability-Based Security
**Source:** `openfang-types/src/capability.rs`
OpenFang uses capability-based security. An agent can only perform actions
it has been explicitly granted permission to do. Capabilities are immutable
after agent creation and are enforced at the kernel level.
### 2.1 Capability Variants
The `Capability` enum defines every permission type:
```rust
pub enum Capability {
// Filesystem
FileRead(String), // Glob pattern, e.g. "/data/*"
FileWrite(String),
// Network
NetConnect(String), // Host:port pattern, e.g. "*.openai.com:443"
NetListen(u16),
// Tools
ToolInvoke(String), // Specific tool ID
ToolAll, // All tools (dangerous)
// LLM
LlmQuery(String),
LlmMaxTokens(u64),
// Agent interaction
AgentSpawn,
AgentMessage(String),
AgentKill(String),
// Memory
MemoryRead(String),
MemoryWrite(String),
// Shell
ShellExec(String),
EnvRead(String),
// OFP Wire Protocol
OfpDiscover,
OfpConnect(String),
OfpAdvertise,
// Economic
EconSpend(f64),
EconEarn,
EconTransfer(String),
}
```
### 2.2 Pattern Matching
The `capability_matches(granted, required)` function implements glob-style
matching:
- **Exact match:** `"api.openai.com:443"` matches `"api.openai.com:443"`
- **Full wildcard:** `"*"` matches anything
- **Prefix wildcard:** `"*.openai.com:443"` matches `"api.openai.com:443"`
- **Suffix wildcard:** `"api.*"` matches `"api.openai.com"`
- **Middle wildcard:** `"api.*.com"` matches `"api.openai.com"`
- **ToolAll special case:** `ToolAll` grants any `ToolInvoke(_)`
- **Numeric bounds:** `LlmMaxTokens(10000)` grants `LlmMaxTokens(5000)` (granted >= required)
### 2.3 Enforcement Point
In the WASM sandbox, every host call is checked **before** execution by
`check_capability()` in `host_functions.rs`:
```rust
fn check_capability(
capabilities: &[Capability],
required: &Capability,
) -> Result<(), serde_json::Value> {
for granted in capabilities {
if capability_matches(granted, required) {
return Ok(());
}
}
Err(json!({"error": format!("Capability denied: {required:?}")}))
}
```
If no granted capability matches the required one, the operation returns a
JSON error immediately -- the tool is never invoked.
### 2.4 Capability Inheritance
When an agent spawns a child agent, `validate_capability_inheritance()` ensures
the child's capabilities are a **subset** of the parent's. This prevents
privilege escalation:
```rust
pub fn validate_capability_inheritance(
parent_caps: &[Capability],
child_caps: &[Capability],
) -> Result<(), String> {
for child_cap in child_caps {
let is_covered = parent_caps
.iter()
.any(|parent_cap| capability_matches(parent_cap, child_cap));
if !is_covered {
return Err(format!(
"Privilege escalation denied: child requests {:?} \
but parent does not have a matching grant",
child_cap
));
}
}
Ok(())
}
```
The `host_agent_spawn()` function in `host_functions.rs` calls
`kernel.spawn_agent_checked(manifest_toml, Some(&state.agent_id), &state.capabilities)`
which invokes this validation before the child is created.
---
## 3. WASM Dual Metering
**Source:** `openfang-runtime/src/sandbox.rs`
Untrusted WASM modules run inside a Wasmtime sandbox with **two
independent** metering mechanisms running simultaneously.
### 3.1 Fuel Metering (Deterministic)
Fuel metering counts WASM instructions. The engine deducts fuel for every
instruction executed. When the budget is exhausted, execution traps with
`Trap::OutOfFuel`.
```rust
// SandboxConfig defaults
pub fuel_limit: u64, // Default: 1_000_000
// Applied at execution time
if config.fuel_limit > 0 {
store.set_fuel(config.fuel_limit)?;
}
```
After execution, fuel consumed is reported:
```rust
let fuel_remaining = store.get_fuel().unwrap_or(0);
let fuel_consumed = config.fuel_limit.saturating_sub(fuel_remaining);
```
### 3.2 Epoch Interruption (Wall-Clock)
A watchdog thread sleeps for the configured timeout, then increments the
engine epoch. When the epoch advances past the store's deadline, execution
traps with `Trap::Interrupt`.
```rust
store.set_epoch_deadline(1);
let engine_clone = engine.clone();
let timeout = config.timeout_secs.unwrap_or(30);
let _watchdog = std::thread::spawn(move || {
std::thread::sleep(std::time::Duration::from_secs(timeout));
engine_clone.increment_epoch();
});
```
### 3.3 Why Both?
| Property | Fuel | Epoch |
|----------|------|-------|
| **Metric** | Instruction count | Wall-clock time |
| **Precision** | Deterministic, reproducible | Non-deterministic |
| **Catches** | CPU-intensive loops | Host call blocking, I/O waits |
| **Evasion** | Can waste time in host calls | Can busy-loop cheaply |
Together they form a complete defense: fuel catches compute-intensive loops,
while epochs catch host-call abuse or environmental slowdowns.
### 3.4 SandboxConfig
```rust
pub struct SandboxConfig {
pub fuel_limit: u64, // Default: 1_000_000
pub max_memory_bytes: usize, // Default: 16 MB
pub capabilities: Vec<Capability>,
pub timeout_secs: Option<u64>, // Default: 30 seconds
}
```
### 3.5 Error Types
```rust
pub enum SandboxError {
Compilation(String),
Instantiation(String),
Execution(String),
FuelExhausted, // Trap::OutOfFuel
AbiError(String),
}
```
---
## 4. Merkle Hash Chain Audit Trail
**Source:** `openfang-runtime/src/audit.rs`
Every security-critical action is appended to a tamper-evident Merkle hash
chain, similar to a blockchain. Each entry contains the SHA-256 hash of its
own contents concatenated with the hash of the previous entry.
### 4.1 Auditable Actions
```rust
pub enum AuditAction {
ToolInvoke,
CapabilityCheck,
AgentSpawn,
AgentKill,
AgentMessage,
MemoryAccess,
FileAccess,
NetworkAccess,
ShellExec,
AuthAttempt,
WireConnect,
ConfigChange,
}
```
### 4.2 Entry Structure
```rust
pub struct AuditEntry {
pub seq: u64, // Monotonically increasing sequence number
pub timestamp: String, // ISO-8601
pub agent_id: String,
pub action: AuditAction,
pub detail: String, // e.g. tool name, file path
pub outcome: String, // "ok", "denied", error message
pub prev_hash: String, // SHA-256 of previous entry (or 64 zeros)
pub hash: String, // SHA-256 of this entry + prev_hash
}
```
### 4.3 Hash Computation
Each entry's hash is computed from all of its fields concatenated with the
previous entry's hash:
```rust
fn compute_entry_hash(
seq: u64, timestamp: &str, agent_id: &str,
action: &AuditAction, detail: &str,
outcome: &str, prev_hash: &str,
) -> String {
let mut hasher = Sha256::new();
hasher.update(seq.to_string().as_bytes());
hasher.update(timestamp.as_bytes());
hasher.update(agent_id.as_bytes());
hasher.update(action.to_string().as_bytes());
hasher.update(detail.as_bytes());
hasher.update(outcome.as_bytes());
hasher.update(prev_hash.as_bytes());
hex::encode(hasher.finalize())
}
```
### 4.4 Chain Integrity Verification
`AuditLog::verify_integrity()` walks the entire chain and recomputes every
hash. If any entry has been tampered with, the recomputed hash will not match
the stored hash, or the `prev_hash` linkage will be broken:
```rust
pub fn verify_integrity(&self) -> Result<(), String> {
let entries = self.entries.lock().unwrap_or_else(|e| e.into_inner());
let mut expected_prev = "0".repeat(64); // Genesis sentinel
for entry in entries.iter() {
if entry.prev_hash != expected_prev {
return Err(format!(
"chain break at seq {}: expected prev_hash {} but found {}",
entry.seq, expected_prev, entry.prev_hash
));
}
let recomputed = compute_entry_hash(/* ... */);
if recomputed != entry.hash {
return Err(format!(
"hash mismatch at seq {}: expected {} but found {}",
entry.seq, recomputed, entry.hash
));
}
expected_prev = entry.hash.clone();
}
Ok(())
}
```
### 4.5 Thread Safety
`AuditLog` uses `Mutex<Vec<AuditEntry>>` and `Mutex<String>` for the tip hash.
Both locks use `unwrap_or_else(|e| e.into_inner())` to recover from poisoned
mutexes, ensuring the audit log remains available even after a panic.
### 4.6 API
| Method | Description |
|--------|-------------|
| `AuditLog::new()` | Creates an empty log with genesis sentinel (`"0" * 64`) |
| `record(agent_id, action, detail, outcome)` | Appends an entry, returns its hash |
| `verify_integrity()` | Validates the entire chain |
| `tip_hash()` | Returns the hash of the most recent entry |
| `len()` / `is_empty()` | Entry count |
| `recent(n)` | Returns the most recent `n` entries (cloned) |
---
## 5. Information Flow Taint Tracking
**Source:** `openfang-types/src/taint.rs`
OpenFang implements a lattice-based taint propagation model that prevents
tainted values from flowing into sensitive sinks without explicit
declassification. This guards against prompt injection, data exfiltration,
and confused-deputy attacks.
### 5.1 Taint Labels
```rust
pub enum TaintLabel {
ExternalNetwork, // Data from external network requests
UserInput, // Direct user input
Pii, // Personally identifiable information
Secret, // API keys, tokens, passwords
UntrustedAgent, // Data from sandboxed/untrusted agents
}
```
### 5.2 Tainted Values
```rust
pub struct TaintedValue {
pub value: String, // The payload
pub labels: HashSet<TaintLabel>, // Attached taint labels
pub source: String, // Human-readable origin
}
```
Key methods:
| Method | Description |
|--------|-------------|
| `TaintedValue::new(value, labels, source)` | Create with labels |
| `TaintedValue::clean(value, source)` | Create with no labels (untainted) |
| `merge_taint(&mut self, other)` | Union of labels (for concatenation) |
| `check_sink(&self, sink)` | Check if value can flow to sink |
| `declassify(&mut self, label)` | Remove a specific label (explicit security decision) |
| `is_tainted(&self) -> bool` | True if any labels present |
### 5.3 Taint Sinks
A `TaintSink` defines which labels are **blocked** from reaching it:
| Sink | Blocked Labels | Rationale |
|------|---------------|-----------|
| `TaintSink::shell_exec()` | `ExternalNetwork`, `UntrustedAgent`, `UserInput` | Prevents command injection |
| `TaintSink::net_fetch()` | `Secret`, `Pii` | Prevents data exfiltration |
| `TaintSink::agent_message()` | `Secret` | Prevents secret leakage to other agents |
### 5.4 Violation Handling
When `check_sink()` finds a blocked label, it returns a `TaintViolation`:
```rust
pub struct TaintViolation {
pub label: TaintLabel, // The offending label
pub sink_name: String, // "shell_exec", "net_fetch", etc.
pub source: String, // Where the tainted value came from
}
```
Display: `taint violation: label 'Secret' from source 'env_var' is not allowed to reach sink 'net_fetch'`
### 5.5 Declassification
Declassification is an **explicit security decision**. The caller asserts
that the value has been sanitized:
```rust
tainted.declassify(&TaintLabel::ExternalNetwork);
tainted.declassify(&TaintLabel::UserInput);
// After declassification, value can flow to shell_exec
assert!(tainted.check_sink(&TaintSink::shell_exec()).is_ok());
```
### 5.6 Taint Propagation
When two values are combined (concatenation, interpolation), the result must
carry the union of both label sets:
```rust
let mut combined = TaintedValue::new(/* ... */);
combined.merge_taint(&other_value);
// combined.labels is now the union of both
```
---
## 6. Ed25519 Manifest Signing
**Source:** `openfang-types/src/manifest_signing.rs`
Agent manifests define an agent's capabilities, tools, and configuration.
A compromised manifest can grant elevated privileges. This module provides
Ed25519-based cryptographic signing.
### 6.1 Signing Scheme
1. Compute SHA-256 of the manifest content (raw TOML text).
2. Sign the hash with Ed25519 (via `ed25519-dalek`).
3. Bundle the signature, public key, and content hash into a `SignedManifest` envelope.
### 6.2 SignedManifest Structure
```rust
pub struct SignedManifest {
pub manifest: String, // Raw TOML content
pub content_hash: String, // Hex SHA-256 of manifest
pub signature: Vec<u8>, // Ed25519 signature (64 bytes)
pub signer_public_key: Vec<u8>, // Ed25519 public key (32 bytes)
pub signer_id: String, // Human-readable signer ID
}
```
### 6.3 Signing
```rust
let signing_key = SigningKey::generate(&mut OsRng);
let signed = SignedManifest::sign(manifest_toml, &signing_key, "admin@org.com");
```
Internally:
```rust
pub fn sign(manifest: impl Into<String>, signing_key: &SigningKey, signer_id: impl Into<String>) -> Self {
let manifest = manifest.into();
let content_hash = hash_manifest(&manifest); // SHA-256
let signature = signing_key.sign(content_hash.as_bytes());
let verifying_key = signing_key.verifying_key();
Self {
manifest,
content_hash,
signature: signature.to_bytes().to_vec(),
signer_public_key: verifying_key.to_bytes().to_vec(),
signer_id: signer_id.into(),
}
}
```
### 6.4 Verification
Two-phase verification:
1. **Hash check:** Recompute SHA-256 of `manifest` and compare to `content_hash`.
2. **Signature check:** Verify the Ed25519 signature over `content_hash` using `signer_public_key`.
```rust
pub fn verify(&self) -> Result<(), String> {
let recomputed = hash_manifest(&self.manifest);
if recomputed != self.content_hash {
return Err("content hash mismatch: ...");
}
let verifying_key = VerifyingKey::from_bytes(&pk_bytes)?;
let signature = Signature::from_bytes(&sig_bytes);
verifying_key.verify(self.content_hash.as_bytes(), &signature)
.map_err(|e| format!("signature verification failed: {}", e))
}
```
### 6.5 Tamper Detection
- Modifying the manifest content after signing causes a **content hash mismatch**.
- Replacing the public key with a different key causes a **signature verification failure**.
- Both attacks are caught by `verify()`.
---
## 7. SSRF Protection
**Source:** `openfang-runtime/src/host_functions.rs`
The `host_net_fetch` function (WASM host call for network requests) includes
comprehensive Server-Side Request Forgery protection.
### 7.1 Scheme Validation
Only `http://` and `https://` schemes are allowed. All others (`file://`,
`gopher://`, `ftp://`) are blocked immediately:
```rust
if !url.starts_with("http://") && !url.starts_with("https://") {
return Err(json!({"error": "Only http:// and https:// URLs are allowed"}));
}
```
### 7.2 Hostname Blocklist
Before DNS resolution, these hostnames are blocked:
- `localhost`
- `metadata.google.internal`
- `metadata.aws.internal`
- `instance-data`
- `169.254.169.254` (AWS/GCP metadata endpoint)
### 7.3 DNS Resolution Check
After the hostname blocklist, the function resolves the hostname to IP
addresses and checks **every resolved IP** against private ranges. This
defeats DNS rebinding attacks:
```rust
let socket_addr = format!("{hostname}:{port}");
if let Ok(addrs) = socket_addr.to_socket_addrs() {
for addr in addrs {
let ip = addr.ip();
if ip.is_loopback() || ip.is_unspecified() || is_private_ip(&ip) {
return Err(json!({"error": format!(
"SSRF blocked: {hostname} resolves to private IP {ip}"
)}));
}
}
}
```
### 7.4 Private IP Detection
The `is_private_ip()` function covers:
**IPv4:**
- `10.0.0.0/8` -- RFC 1918
- `172.16.0.0/12` -- RFC 1918
- `192.168.0.0/16` -- RFC 1918
- `169.254.0.0/16` -- Link-local (AWS metadata)
**IPv6:**
- `fc00::/7` -- Unique Local Address
- `fe80::/10` -- Link-local
```rust
fn is_private_ip(ip: &std::net::IpAddr) -> bool {
match ip {
IpAddr::V4(v4) => {
let octets = v4.octets();
matches!(
octets,
[10, ..] | [172, 16..=31, ..] | [192, 168, ..] | [169, 254, ..]
)
}
IpAddr::V6(v6) => {
let segments = v6.segments();
(segments[0] & 0xfe00) == 0xfc00 || (segments[0] & 0xffc0) == 0xfe80
}
}
}
```
### 7.5 Host Extraction
`extract_host_from_url()` parses the URL to extract `host:port` for both
SSRF checking and capability matching:
```
https://api.openai.com/v1/chat -> api.openai.com:443
http://localhost:8080/api -> localhost:8080
http://example.com -> example.com:80
```
---
## 8. Secret Zeroization
**Source:** All LLM driver modules, channel adapters, and web search modules.
OpenFang uses `Zeroizing<String>` from the `zeroize` crate on every field
that holds secret material. When the value is dropped, its memory is
overwritten with zeros, preventing secrets from lingering in memory.
### 8.1 How It Works
`Zeroizing<T>` is a smart-pointer wrapper from the `zeroize` crate. It
implements `Deref<Target=T>` for transparent usage and `Drop` for automatic
zeroization:
```rust
// On Drop, the inner String's buffer is overwritten with zeros
let key = Zeroizing::new("sk-secret-key".to_string());
// Use key transparently via Deref
client.post(url).header("authorization", format!("Bearer {}", &*key));
// When key goes out of scope, memory is zeroed
```
### 8.2 Fields Using Zeroization
**LLM Drivers** (`openfang-runtime/src/drivers/`):
| Driver | Field |
|--------|-------|
| `AnthropicDriver` | `api_key: Zeroizing<String>` |
| `GeminiDriver` | `api_key: Zeroizing<String>` |
| `OpenAiCompatDriver` | `api_key: Zeroizing<String>` |
**Channel Adapters** (`openfang-channels/src/`):
| Adapter | Field(s) |
|---------|----------|
| `DiscordAdapter` | `token: Zeroizing<String>` |
| `EmailAdapter` | `password: Zeroizing<String>` |
| `BlueskyAdapter` | `app_password: Zeroizing<String>` |
| `DingTalkAdapter` | `access_token: Zeroizing<String>`, `secret: Zeroizing<String>` |
| `FeishuAdapter` | `app_secret: Zeroizing<String>` |
| `FlockAdapter` | `bot_token: Zeroizing<String>` |
| `GitterAdapter` | `token: Zeroizing<String>` |
| `GotifyAdapter` | `app_token: Zeroizing<String>`, `client_token: Zeroizing<String>` |
**Web Search** (`openfang-runtime/src/web_search.rs`):
```rust
fn resolve_api_key(env_var: &str) -> Option<Zeroizing<String>> {
std::env::var(env_var).ok().filter(|k| !k.is_empty()).map(Zeroizing::new)
}
```
**Embedding** (`openfang-runtime/src/embedding.rs`):
| Struct | Field |
|--------|-------|
| `EmbeddingClient` | `api_key: Zeroizing<String>` |
### 8.3 Why It Matters
Without zeroization, secrets remain in memory after use until the OS
reclaims the page. An attacker with access to a core dump, swap file, or
memory forensics tool can recover API keys. `Zeroizing<String>` ensures
the secret is overwritten as soon as it is no longer needed.
---
## 9. OFP Mutual Authentication
**Source:** `openfang-wire/src/peer.rs`
The OpenFang Wire Protocol (OFP) uses HMAC-SHA256 with nonce-based mutual
authentication over TCP connections.
### 9.1 Pre-Shared Key Requirement
OFP refuses to start without a `shared_secret`:
```rust
if config.shared_secret.is_empty() {
return Err(WireError::HandshakeFailed(
"OFP requires shared_secret. Set [network] shared_secret in config.toml".into(),
));
}
```
### 9.2 HMAC Functions
```rust
type HmacSha256 = Hmac<Sha256>;
fn hmac_sign(secret: &str, data: &[u8]) -> String {
let mut mac = HmacSha256::new_from_slice(secret.as_bytes())
.expect("HMAC accepts any key size");
mac.update(data);
hex::encode(mac.finalize().into_bytes())
}
fn hmac_verify(secret: &str, data: &[u8], signature: &str) -> bool {
let expected = hmac_sign(secret, data);
subtle::ConstantTimeEq::ct_eq(expected.as_bytes(), signature.as_bytes()).into()
}
```
**Constant-time comparison** (`subtle::ConstantTimeEq`) prevents
timing side-channel attacks.
### 9.3 Handshake Protocol
**Initiator (client):**
1. Generate a random UUID nonce.
2. Compute `auth_data = nonce + node_id`.
3. Compute `auth_hmac = hmac_sign(shared_secret, auth_data)`.
4. Send `Handshake { node_id, node_name, protocol_version, agents, nonce, auth_hmac }`.
**Responder (server):**
1. Receive the `Handshake` message.
2. Verify the incoming HMAC: `hmac_verify(shared_secret, nonce + node_id, auth_hmac)`.
3. If verification fails, return error code 403.
4. Generate a new UUID nonce for the ack.
5. Compute `ack_auth_data = ack_nonce + self.node_id`.
6. Compute `ack_hmac = hmac_sign(shared_secret, ack_auth_data)`.
7. Send `HandshakeAck { node_id, node_name, protocol_version, agents, nonce: ack_nonce, auth_hmac: ack_hmac }`.
**Initiator (verification):**
1. Receive `HandshakeAck`.
2. Verify: `hmac_verify(shared_secret, ack_nonce + node_id, ack_hmac)`.
3. If verification fails, return `WireError::HandshakeFailed`.
### 9.4 Security Properties
| Property | How It Is Achieved |
|----------|-------------------|
| **Mutual authentication** | Both sides prove knowledge of the shared secret |
| **Replay protection** | Random UUID nonces per handshake |
| **Timing-attack resistance** | `subtle::ConstantTimeEq` for HMAC comparison |
| **Mandatory secret** | OFP refuses to start with an empty `shared_secret` |
| **Message size limit** | `MAX_MESSAGE_SIZE = 16 MB` prevents memory DoS |
| **Protocol version check** | `PROTOCOL_VERSION` mismatch returns `WireError::VersionMismatch` |
---
## 10. Security Headers
**Source:** `openfang-api/src/middleware.rs`
The `security_headers` middleware is applied to **all** API responses:
```rust
pub async fn security_headers(request: Request<Body>, next: Next) -> Response<Body> {
let mut response = next.run(request).await;
let headers = response.headers_mut();
headers.insert("x-content-type-options", "nosniff".parse().unwrap());
headers.insert("x-frame-options", "DENY".parse().unwrap());
headers.insert("x-xss-protection", "1; mode=block".parse().unwrap());
headers.insert("content-security-policy", /* CSP policy */);
headers.insert("referrer-policy", "strict-origin-when-cross-origin".parse().unwrap());
headers.insert("cache-control", "no-store, no-cache, must-revalidate".parse().unwrap());
response
}
```
| Header | Value | Protects Against |
|--------|-------|------------------|
| `X-Content-Type-Options` | `nosniff` | MIME type sniffing attacks |
| `X-Frame-Options` | `DENY` | Clickjacking via iframes |
| `X-XSS-Protection` | `1; mode=block` | Reflected XSS (legacy browsers) |
| `Content-Security-Policy` | See below | XSS, code injection, data exfiltration |
| `Referrer-Policy` | `strict-origin-when-cross-origin` | Referrer leakage |
| `Cache-Control` | `no-store, no-cache, must-revalidate` | Sensitive data caching |
### 10.1 CSP Breakdown
| Directive | Value | Purpose |
|-----------|-------|---------|
| `default-src` | `'self'` | Deny all external resources by default |
| `script-src` | `'self' 'unsafe-inline' 'unsafe-eval' cdn.jsdelivr.net` | Allow scripts from self and CDN |
| `style-src` | `'self' 'unsafe-inline' cdn.jsdelivr.net fonts.googleapis.com` | Allow styles from self, CDN, Google Fonts |
| `img-src` | `'self' data:` | Allow images from self and data URIs |
| `connect-src` | `'self' ws: wss:` | Allow WebSocket connections |
| `font-src` | `'self' cdn.jsdelivr.net fonts.gstatic.com` | Allow fonts from CDN |
| `object-src` | `'none'` | Block all plugins (Flash, Java, etc.) |
| `base-uri` | `'self'` | Prevent base tag hijacking |
| `form-action` | `'self'` | Restrict form submission targets |
---
## 11. GCRA Rate Limiter
**Source:** `openfang-api/src/rate_limiter.rs`
OpenFang uses the Generic Cell Rate Algorithm (GCRA) for cost-aware API
rate limiting via the `governor` crate.
### 11.1 Algorithm
GCRA is a leaky-bucket variant that tracks a single "virtual scheduling time"
(TAT -- Theoretical Arrival Time) per key. Each request consumes a number of
tokens proportional to its cost. The bucket refills at a constant rate.
**Budget:** 500 tokens per minute per IP address.
```rust
pub fn create_rate_limiter() -> Arc<KeyedRateLimiter> {
Arc::new(RateLimiter::keyed(Quota::per_minute(NonZeroU32::new(500).unwrap())))
}
```
### 11.2 Operation Costs
Each API operation has a configurable token cost:
```rust
pub fn operation_cost(method: &str, path: &str) -> NonZeroU32 {
match (method, path) {
(_, "/api/health") => 1,
("GET", "/api/status") => 1,
("GET", "/api/version") => 1,
("GET", "/api/tools") => 1,
("GET", "/api/agents") => 2,
("GET", "/api/skills") => 2,
("GET", "/api/peers") => 2,
("GET", "/api/config") => 2,
("GET", "/api/usage") => 3,
("GET", p) if p.starts_with("/api/audit") => 5,
("GET", p) if p.starts_with("/api/marketplace")=> 10,
("POST", "/api/agents") => 50,
("POST", p) if p.contains("/message") => 30,
("POST", p) if p.contains("/run") => 100,
("POST", "/api/skills/install") => 50,
("POST", "/api/skills/uninstall") => 10,
("POST", "/api/migrate") => 100,
("PUT", p) if p.contains("/update") => 10,
_ => 5,
}
}
```
The cost hierarchy is intentional: read-only health checks cost 1 token while
expensive operations like workflow runs cost 100, meaning a client can perform
500 health checks per minute but only 5 workflow runs.
### 11.3 Middleware
```rust
pub async fn gcra_rate_limit(
State(limiter): State<Arc<KeyedRateLimiter>>,
request: Request<Body>,
next: Next,
) -> Response<Body> {
let ip = /* extract from ConnectInfo, default 127.0.0.1 */;
let cost = operation_cost(&method, &path);
if limiter.check_key_n(&ip, cost).is_err() {
tracing::warn!(ip, cost, path, "GCRA rate limit exceeded");
return Response::builder()
.status(StatusCode::TOO_MANY_REQUESTS)
.header("retry-after", "60")
.body(/* JSON error */)
.unwrap_or_default();
}
next.run(request).await
}
```
### 11.4 Rate Limiter Type
```rust
pub type KeyedRateLimiter = RateLimiter<IpAddr, DashMapStateStore<IpAddr>, DefaultClock>;
```
The `DashMapStateStore` provides concurrent per-IP state with automatic stale
entry cleanup.
---
## 12. Path Traversal Prevention
**Source:** `openfang-runtime/src/host_functions.rs`
Two functions provide defense-in-depth against directory traversal.
### 12.1 safe_resolve_path (for reads)
Used for `fs_read` and `fs_list` operations where the target file must exist:
```rust
fn safe_resolve_path(path: &str) -> Result<std::path::PathBuf, serde_json::Value> {
let p = Path::new(path);
// Phase 1: Reject any path with ".." components
for component in p.components() {
if matches!(component, Component::ParentDir) {
return Err(json!({"error": "Path traversal denied: '..' components forbidden"}));
}
}
// Phase 2: Canonicalize to resolve symlinks and normalize
std::fs::canonicalize(p)
.map_err(|e| json!({"error": format!("Cannot resolve path: {e}")}))
}
```
### 12.2 safe_resolve_parent (for writes)
Used for `fs_write` operations where the target file may not exist yet:
```rust
fn safe_resolve_parent(path: &str) -> Result<std::path::PathBuf, serde_json::Value> {
let p = Path::new(path);
// Phase 1: Reject ".." in any component
for component in p.components() {
if matches!(component, Component::ParentDir) {
return Err(json!({"error": "Path traversal denied: '..' components forbidden"}));
}
}
// Phase 2: Canonicalize the parent directory
let parent = p.parent().filter(|par| !par.as_os_str().is_empty())
.ok_or_else(|| json!({"error": "Invalid path: no parent directory"}))?;
let canonical_parent = std::fs::canonicalize(parent)?;
// Phase 3: Belt-and-suspenders check on filename
let file_name = p.file_name()
.ok_or_else(|| json!({"error": "Invalid path: no file name"}))?;
if file_name.to_string_lossy().contains("..") {
return Err(json!({"error": "Path traversal denied in file name"}));
}
Ok(canonical_parent.join(file_name))
}
```
### 12.3 Enforcement Order
1. **Capability check** runs first with the raw path.
2. **Path traversal check** runs second.
3. **Operation** runs only if both pass.
This ordering ensures that even if a capability is misconfigured with a broad
pattern like `"*"`, path traversal is still blocked.
---
## 13. Subprocess Sandbox
**Source:** `openfang-runtime/src/subprocess_sandbox.rs`
When the runtime spawns child processes (e.g., for the shell tool or skill
execution), the inherited environment must be stripped to prevent accidental
leakage of secrets.
### 13.1 Environment Clearing
```rust
pub fn sandbox_command(cmd: &mut tokio::process::Command, allowed_env_vars: &[String]) {
cmd.env_clear(); // Remove ALL inherited env vars
// Re-add platform-independent safe vars
for var in SAFE_ENV_VARS {
if let Ok(val) = std::env::var(var) {
cmd.env(var, val);
}
}
// Re-add Windows-specific safe vars (on Windows)
#[cfg(windows)]
for var in SAFE_ENV_VARS_WINDOWS { /* ... */ }
// Re-add caller-specified allowed vars
for var in allowed_env_vars { /* ... */ }
}
```
### 13.2 Safe Environment Variables
**All platforms:**
```rust
pub const SAFE_ENV_VARS: &[&str] = &[
"PATH", "HOME", "TMPDIR", "TMP", "TEMP", "LANG", "LC_ALL", "TERM",
];
```
**Windows-only:**
```rust
pub const SAFE_ENV_VARS_WINDOWS: &[&str] = &[
"USERPROFILE", "SYSTEMROOT", "APPDATA", "LOCALAPPDATA",
"COMSPEC", "WINDIR", "PATHEXT",
];
```
Variables not in these lists and not in `allowed_env_vars` are **never**
passed to the child process. This means `OPENAI_API_KEY`, `GEMINI_API_KEY`,
database credentials, and all other secrets are stripped.
### 13.3 Executable Path Validation
```rust
pub fn validate_executable_path(path: &str) -> Result<(), String> {
let p = Path::new(path);
for component in p.components() {
if let std::path::Component::ParentDir = component {
return Err(format!(
"executable path '{}' contains '..' component which is not allowed",
path
));
}
}
Ok(())
}
```
This prevents an agent from escaping its working directory via crafted paths
like `../../bin/dangerous`.
### 13.4 Shell Injection Prevention
The `host_shell_exec` function uses `Command::new(command).args(&args)` which
does **not** invoke a shell. Each argument is passed directly to the
process, preventing shell injection via metacharacters like `;`, `|`, `&&`.
---
## 14. Prompt Injection Scanner
**Source:** `openfang-skills/src/verify.rs`
The `SkillVerifier` provides two scanning functions: `security_scan()` for
skill manifests and `scan_prompt_content()` for skill prompt text (SKILL.md
body).
### 14.1 Manifest Security Scan
`SkillVerifier::security_scan(manifest)` inspects a skill's declared
requirements:
| Check | Severity | Trigger |
|-------|----------|---------|
| Node.js runtime | Warning | `runtime_type == SkillRuntime::Node` |
| Shell execution capability | Critical | Capability contains `shellexec` or `shell_exec` |
| Unrestricted network | Warning | Capability contains `netconnect(*)` |
| Shell tool | Critical | Tool is `shell_exec` or `bash` |
| Filesystem write tool | Warning | Tool is `file_write` or `file_delete` |
| Too many tools | Info | More than 10 tools required |
### 14.2 Prompt Injection Scan
`SkillVerifier::scan_prompt_content(content)` detects common attack patterns
in skill prompt text:
**Critical -- Prompt override attempts:**
```
"ignore previous instructions", "ignore all previous",
"disregard previous", "forget your instructions",
"you are now", "new instructions:", "system prompt override",
"ignore the above", "do not follow", "override system"
```
**Warning -- Data exfiltration patterns:**
```
"send to http", "send to https", "post to http", "post to https",
"exfiltrate", "forward all", "send all data",
"base64 encode and send", "upload to"
```
**Warning -- Shell command references:**
```
"rm -rf", "chmod ", "sudo "
```
**Info -- Excessive length:**
Content over 50,000 bytes triggers an info-level warning about potential LLM
performance degradation.
### 14.3 SHA256 Checksum Verification
```rust
pub fn verify_checksum(data: &[u8], expected_sha256: &str) -> bool {
let actual = Self::sha256_hex(data);
actual == expected_sha256.to_lowercase()
}
```
Skills installed from ClawHub have their content verified against a known
SHA256 hash to detect tampering during download.
### 14.4 Warning Structure
```rust
pub struct SkillWarning {
pub severity: WarningSeverity, // Info, Warning, Critical
pub message: String,
}
```
---
## 15. Loop Guard
**Source:** `openfang-runtime/src/loop_guard.rs`
The `LoopGuard` tracks tool calls within a single agent loop execution to
detect when the agent is stuck calling the same tool repeatedly.
### 15.1 Configuration
```rust
pub struct LoopGuardConfig {
pub warn_threshold: u32, // Default: 3
pub block_threshold: u32, // Default: 5
pub global_circuit_breaker: u32, // Default: 30
}
```
### 15.2 Detection Algorithm
1. For each tool call, compute SHA-256 of `tool_name + "|" + serialized_params`.
2. Increment the count for that hash in a `HashMap<String, u32>`.
3. Increment `total_calls`.
4. Return a graduated verdict:
```rust
pub fn check(&mut self, tool_name: &str, params: &serde_json::Value) -> LoopGuardVerdict {
self.total_calls += 1;
// Global circuit breaker
if self.total_calls > self.config.global_circuit_breaker {
return LoopGuardVerdict::CircuitBreak(/* ... */);
}
let hash = Self::compute_hash(tool_name, params);
let count = self.call_counts.entry(hash).or_insert(0);
*count += 1;
if *count >= self.config.block_threshold {
LoopGuardVerdict::Block(/* ... */)
} else if *count >= self.config.warn_threshold {
LoopGuardVerdict::Warn(/* ... */)
} else {
LoopGuardVerdict::Allow
}
}
```
### 15.3 Verdict Types
| Verdict | Meaning | Action |
|---------|---------|--------|
| `Allow` | Normal operation | Run the tool |
| `Warn(msg)` | Same call repeated >= 3 times | Run, append warning to result |
| `Block(msg)` | Same call repeated >= 5 times | Skip execution, return error |
| `CircuitBreak(msg)` | > 30 total tool calls | Terminate the entire agent loop |
### 15.4 Hash Computation
```rust
fn compute_hash(tool_name: &str, params: &serde_json::Value) -> String {
let mut hasher = Sha256::new();
hasher.update(tool_name.as_bytes());
hasher.update(b"|");
let params_str = serde_json::to_string(params).unwrap_or_default();
hasher.update(params_str.as_bytes());
hex::encode(hasher.finalize())
}
```
Note: `serde_json::to_string` produces deterministic output (object keys are
sorted), ensuring that semantically identical parameters produce the same hash.
### 15.5 Key Property
Calls with **different parameters** are tracked separately. An agent that
calls `web_search` with 10 different queries will not trigger the guard, but
an agent that calls `web_search({"query": "test"})` 5 times will be blocked.
---
## 16. Session Repair
**Source:** `openfang-runtime/src/session_repair.rs`
Before sending message history to the LLM, this module validates and repairs
common structural issues that would cause API errors.
### 16.1 Three-Phase Repair
```rust
pub fn validate_and_repair(messages: &[Message]) -> Vec<Message>
```
**Phase 1 -- Collect ToolUse IDs:**
Scan all messages for `ContentBlock::ToolUse { id, .. }` blocks and collect
their IDs into a `HashSet<String>`.
**Phase 2 -- Filter orphans and empties:**
- **Orphaned ToolResults:** `ContentBlock::ToolResult { tool_use_id, .. }`
blocks where `tool_use_id` is not in the ToolUse ID set are dropped.
- **Empty messages:** Messages with empty text or no content blocks are
dropped.
**Phase 3 -- Merge consecutive same-role messages:**
The Anthropic API requires strict role alternation (user, assistant, user,
assistant...). If two consecutive messages have the same role, they are
merged into a single message with combined content blocks.
### 16.2 Why Each Repair Is Needed
| Issue | Cause | Effect Without Repair |
|-------|-------|----------------------|
| Orphaned ToolResult | Compaction or truncation removed the ToolUse | API error: "tool_use_id not found" |
| Empty messages | Cancelled generation, empty user submission | API error: empty content |
| Consecutive same-role | Manual history editing, session repair itself | API error: role alternation violation |
### 16.3 Content Merging
When merging consecutive same-role messages, both are converted to block
format and concatenated:
```rust
fn merge_content(dst: &mut MessageContent, src: MessageContent) {
let dst_blocks = content_to_blocks(std::mem::replace(dst, MessageContent::Text(String::new())));
let src_blocks = content_to_blocks(src);
let mut combined = dst_blocks;
combined.extend(src_blocks);
*dst = MessageContent::Blocks(combined);
}
```
---
## 17. Health Endpoint Redaction
**Source:** `openfang-api/src/routes.rs`
OpenFang provides two health endpoints with different information levels.
### 17.1 Public Endpoint: `GET /api/health`
**No authentication required.** Returns only liveness information:
```json
{
"status": "ok",
"version": "0.1.0"
}
```
This endpoint does not expose agent count, database details, configuration
warnings, uptime, or any internal system information. It is suitable for
load balancer health checks.
### 17.2 Detail Endpoint: `GET /api/health/detail`
**Requires authentication.** Returns full diagnostics:
```json
{
"status": "ok",
"version": "0.1.0",
"uptime_seconds": 3600,
"panic_count": 0,
"restart_count": 2,
"agent_count": 15,
"database": "connected",
"config_warnings": []
}
```
### 17.3 Localhost Fallback
When no API key is configured, the `auth` middleware restricts all
non-health endpoints to loopback addresses only:
```rust
if api_key.is_empty() {
let is_loopback = request.extensions()
.get::<ConnectInfo<SocketAddr>>()
.map(|ci| ci.0.ip().is_loopback())
.unwrap_or(false);
if !is_loopback {
return Response::builder()
.status(StatusCode::FORBIDDEN)
.body(/* "No API key configured. Remote access denied." */)
...;
}
}
```
---
## 18. Security Configuration
### 18.1 config.toml Reference
```toml
# API Authentication
api_key = "your-secret-api-key" # Empty = localhost-only mode
# OFP Wire Protocol
[network]
shared_secret = "your-pre-shared-key" # Required for OFP
# WASM Sandbox
[sandbox]
fuel_limit = 1000000 # CPU instruction budget per execution
timeout_secs = 30 # Wall-clock timeout per execution
max_memory_bytes = 16777216 # 16 MB max WASM memory
# Rate Limiting
# 500 tokens/minute/IP (not currently configurable via config.toml)
# Web Search SSRF Protection
[web]
# SSRF protection is always on and cannot be disabled
```
### 18.2 Environment Variables for Secrets
| Variable | Used By |
|----------|---------|
| `OPENAI_API_KEY` | OpenAI-compat driver |
| `ANTHROPIC_API_KEY` | Anthropic driver |
| `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Gemini driver |
| `DEEPSEEK_API_KEY` | DeepSeek provider |
| `GROQ_API_KEY` | Groq provider |
| `BRAVE_API_KEY` | Brave web search |
| `TAVILY_API_KEY` | Tavily web search |
| `PERPLEXITY_API_KEY` | Perplexity web search |
All environment variable API keys are wrapped in `Zeroizing<String>` when
loaded into driver structs.
### 18.3 Capability Declaration (Agent Manifest)
Capabilities are declared in the agent's TOML manifest:
```toml
[agent]
name = "my-agent"
[[capabilities]]
type = "FileRead"
value = "/data/*"
[[capabilities]]
type = "NetConnect"
value = "*.openai.com:443"
[[capabilities]]
type = "ToolInvoke"
value = "web_search"
[[capabilities]]
type = "LlmMaxTokens"
value = 4096
```
### 18.4 Loop Guard Tuning
The default `LoopGuardConfig` values are:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `warn_threshold` | 3 | Identical calls before warning |
| `block_threshold` | 5 | Identical calls before blocking |
| `global_circuit_breaker` | 30 | Total calls before circuit break |
### 18.5 Subprocess Sandbox Allowlists
To pass specific environment variables to subprocesses:
```rust
sandbox_command(&mut cmd, &["MY_CUSTOM_VAR".to_string()]);
```
Only variables explicitly listed in `allowed_env_vars` (plus the safe
defaults) will be inherited.
---
## 19. Security Dependencies
| Crate | Purpose |
|-------|---------|
| `sha2` | SHA-256 hashing (audit trail, loop guard, SSRF, checksums) |
| `hmac` | HMAC-SHA256 for OFP authentication |
| `hex` | Hex encoding/decoding of hashes and signatures |
| `subtle` | Constant-time comparison (`ConstantTimeEq`) for HMAC verification |
| `ed25519-dalek` | Ed25519 signing/verification for manifest signing |
| `rand` | Cryptographic RNG for key generation (`OsRng`) |
| `zeroize` | `Zeroizing<T>` wrapper for automatic secret memory wiping |
| `governor` | GCRA rate limiting algorithm |
| `wasmtime` | WASM sandbox with fuel + epoch metering |
| `uuid` | Nonce generation for OFP handshakes |
| `chrono` | ISO-8601 timestamps for audit entries |
| `reqwest` | HTTP client (used inside SSRF-protected `host_net_fetch`) |
### 19.1 Why These Specific Crates
- **sha2/hmac:** Part of the RustCrypto project, audited, widely used in production Rust.
- **ed25519-dalek:** De facto standard Ed25519 library in Rust, extensively audited.
- **subtle:** Provides constant-time operations to prevent timing side-channels.
- **zeroize:** Official RustCrypto approach to zeroing secrets; integrates with `Drop`.
- **governor:** Battle-tested GCRA implementation with `DashMap`-backed concurrent state.
---
## Threat Model Summary
| Threat | Mitigated By |
|--------|-------------|
| Agent requests unauthorized file access | Capability-based security (Section 2) |
| Agent spawns child with elevated privileges | Capability inheritance validation (Section 2.4) |
| WASM skill runs infinite loop | Dual metering: fuel + epoch (Section 3) |
| Attacker tampers with audit log | Merkle hash chain (Section 4) |
| Prompt injection via external data | Taint tracking (Section 5) |
| Data exfiltration via LLM | Taint sinks block Secret/PII to net_fetch (Section 5.3) |
| Tampered agent manifest | Ed25519 signing (Section 6) |
| SSRF to cloud metadata | Private IP + hostname blocking + DNS check (Section 7) |
| API key recovery from memory dump | Zeroizing<String> (Section 8) |
| Unauthorized peer-to-peer connections | HMAC-SHA256 mutual auth (Section 9) |
| XSS / clickjacking on API | Security headers (Section 10) |
| API brute force / DoS | GCRA rate limiter (Section 11) |
| Path traversal via `../` | safe_resolve_path / safe_resolve_parent (Section 12) |
| Secret leakage to child processes | env_clear() + allowlist (Section 13) |
| Malicious skills from ClawHub | Prompt injection scanner + SHA256 checksum (Section 14) |
| Agent stuck in tool loop | LoopGuard with graduated response (Section 15) |
| Corrupted LLM session history | Session repair (Section 16) |
| Information leakage from health endpoint | Redacted public endpoint (Section 17) |
| Timing attacks on HMAC verification | subtle::ConstantTimeEq (Section 9.2) |
| Shell injection via metacharacters | Command::new (no shell) + env_clear (Section 13.4) |
| DNS rebinding for SSRF bypass | Resolved IP check, not hostname check (Section 7.3) |