# Workflow Engine Guide ## Overview The OpenFang workflow engine enables multi-step agent pipelines -- orchestrated sequences of tasks where each step routes work to a specific agent, and output from one step flows as input to the next. Workflows let you compose complex behaviors from simple, single-purpose agents without writing any Rust code. Use workflows when you need to: - Chain multiple agents together in a processing pipeline (e.g., research then write then review). - Fan work out to several agents in parallel and collect their results. - Conditionally branch execution based on an earlier step's output. - Iterate a step in a loop until a quality gate is met. - Build reproducible, auditable multi-agent processes that can be triggered via API or CLI. The implementation lives in `openfang-kernel/src/workflow.rs`. The workflow engine is decoupled from the kernel through closures -- it never directly owns or references the kernel, making it testable in isolation. --- ## Core Types | Rust type | Description | |---|---| | `WorkflowId(Uuid)` | Unique identifier for a workflow definition. | | `WorkflowRunId(Uuid)` | Unique identifier for a running workflow instance. | | `Workflow` | A named definition containing a list of `WorkflowStep` entries. | | `WorkflowStep` | A single step: agent reference, prompt template, mode, timeout, error handling. | | `WorkflowRun` | A running instance: tracks state, step results, final output, timestamps. | | `WorkflowRunState` | Enum: `Pending`, `Running`, `Completed`, `Failed`. | | `StepResult` | Result from one step: agent info, output text, token counts, duration. | | `WorkflowEngine` | The engine itself: stores definitions and runs in `Arc>`. | --- ## Workflow Definition Workflows are registered via the REST API as JSON. The top-level structure is: ```json { "name": "my-pipeline", "description": "Describe what the workflow does", "steps": [ ... ] } ``` The corresponding Rust struct is: ```rust pub struct Workflow { pub id: WorkflowId, // Auto-assigned on creation pub name: String, // Human-readable name pub description: String, // What this workflow does pub steps: Vec, // Ordered list of steps pub created_at: DateTime, // Auto-assigned on creation } ``` --- ## Step Configuration Each step in the `steps` array has the following fields: | JSON field | Rust field | Type | Default | Description | |---|---|---|---|---| | `name` | `name` | `String` | `"step"` | Step name for logging and display. | | `agent_name` | `agent` | `StepAgent::ByName` | -- | Reference an agent by its name (first match). Mutually exclusive with `agent_id`. | | `agent_id` | `agent` | `StepAgent::ById` | -- | Reference an agent by its UUID. Mutually exclusive with `agent_name`. | | `prompt` | `prompt_template` | `String` | `"{{input}}"` | Prompt template with variable placeholders. | | `mode` | `mode` | `StepMode` | `"sequential"` | Execution mode (see below). | | `timeout_secs` | `timeout_secs` | `u64` | `120` | Maximum time in seconds before the step times out. | | `error_mode` | `error_mode` | `ErrorMode` | `"fail"` | How to handle errors (see below). | | `max_retries` | (inside `ErrorMode::Retry`) | `u32` | `3` | Number of retries when `error_mode` is `"retry"`. | | `output_var` | `output_var` | `Option` | `null` | If set, stores this step's output in a named variable for later reference. | | `condition` | (inside `StepMode::Conditional`) | `String` | `""` | Substring to match in previous output (case-insensitive). | | `max_iterations` | (inside `StepMode::Loop`) | `u32` | `5` | Maximum loop iterations before forced termination. | | `until` | (inside `StepMode::Loop`) | `String` | `""` | Substring to match in output to terminate the loop (case-insensitive). | ### Agent Resolution Every step must specify exactly one of `agent_name` or `agent_id`. The `StepAgent` enum is: ```rust pub enum StepAgent { ById { id: String }, // UUID of an existing agent ByName { name: String }, // Name match (first agent with this name) } ``` If the agent cannot be resolved at execution time, the workflow fails with `"Agent not found for step ''"`. --- ## Step Modes The `mode` field controls how a step executes relative to other steps in the workflow. ### Sequential (default) ```json { "mode": "sequential" } ``` The step runs after the previous step completes. The previous step's output becomes `{{input}}` for this step. This is the default mode when `mode` is omitted. ### Fan-Out ```json { "mode": "fan_out" } ``` Fan-out steps run **in parallel**. The engine collects all consecutive `fan_out` steps and launches them simultaneously using `futures::future::join_all`. All fan-out steps receive the same `{{input}}` -- the output from the last step that ran before the fan-out group. If any fan-out step fails or times out, the entire workflow fails immediately. ### Collect ```json { "mode": "collect" } ``` The `collect` step gathers all outputs from the preceding fan-out group. It does not execute an agent -- it is a **data-only** step that joins all accumulated outputs with the separator `"\n\n---\n\n"` and sets the result as `{{input}}` for subsequent steps. A typical fan-out/collect pattern: ``` step 1: fan_out --> runs in parallel step 2: fan_out --> runs in parallel step 3: collect --> joins outputs from steps 1 and 2 step 4: sequential --> receives joined output as {{input}} ``` ### Conditional ```json { "mode": "conditional", "condition": "ERROR" } ``` The step only executes if the previous step's output **contains** the `condition` substring (case-insensitive comparison via `to_lowercase().contains()`). If the condition is not met, the step is skipped entirely and `{{input}}` is not modified. When the condition is met, the step executes like a sequential step. ### Loop ```json { "mode": "loop", "max_iterations": 5, "until": "APPROVED" } ``` The step repeats up to `max_iterations` times. After each iteration, the engine checks whether the output **contains** the `until` substring (case-insensitive). If found, the loop terminates early. Each iteration feeds its output back as `{{input}}` for the next iteration. Step results are recorded with names like `"refine (iter 1)"`, `"refine (iter 2)"`, etc. If the `until` condition is never met, the loop runs exactly `max_iterations` times and continues to the next step with the last iteration's output. --- ## Variable Substitution Prompt templates support two kinds of variable references: ### `{{input}}` -- Previous step output Always available. Contains the output from the immediately preceding step (or the workflow's initial input for the first step). ### `{{variable_name}}` -- Named variables When a step has `"output_var": "my_var"`, its output is stored in a variable map under the key `my_var`. Any subsequent step can reference it with `{{my_var}}` in its prompt template. The expansion logic (from `WorkflowEngine::expand_variables`): ```rust fn expand_variables(template: &str, input: &str, vars: &HashMap) -> String { let mut result = template.replace("{{input}}", input); for (key, value) in vars { result = result.replace(&format!("{{{{{key}}}}}"), value); } result } ``` Variables persist for the entire workflow run. A later step can overwrite a variable by using the same `output_var` name. **Example**: A three-step workflow where step 3 references outputs from both step 1 and step 2: ```json { "steps": [ { "name": "research", "output_var": "research_output", "prompt": "Research: {{input}}" }, { "name": "outline", "output_var": "outline_output", "prompt": "Outline based on: {{input}}" }, { "name": "combine", "prompt": "Write article.\nResearch: {{research_output}}\nOutline: {{outline_output}}" } ] } ``` --- ## Error Handling Each step has an `error_mode` that controls behavior when the step fails or times out. ### Fail (default) ```json { "error_mode": "fail" } ``` The workflow aborts immediately. The run state is set to `Failed`, the error message is recorded, and `completed_at` is set. The error message format is `"Step '' failed: "` or `"Step '' timed out after s"`. ### Skip ```json { "error_mode": "skip" } ``` The step is silently skipped on error or timeout. A warning is logged, but the workflow continues. The `{{input}}` for the next step remains unchanged (it keeps the value from before the skipped step). No `StepResult` is recorded for the skipped step. ### Retry ```json { "error_mode": "retry", "max_retries": 3 } ``` The step is retried up to `max_retries` times after the initial attempt (so `max_retries: 3` means up to 4 total attempts: 1 initial + 3 retries). Each attempt gets the full `timeout_secs` budget independently. If all attempts fail, the workflow aborts with `"Step '' failed after retries: "`. ### Timeout Behavior Every step execution is wrapped in `tokio::time::timeout(Duration::from_secs(step.timeout_secs), ...)`. The default timeout is 120 seconds. Timeouts are treated as errors and handled according to the step's `error_mode`. For fan-out steps, each parallel step gets its own timeout individually. --- ## Examples ### Example 1: Code Review Pipeline A sequential pipeline where code is analyzed, reviewed, and a summary is produced. ```json { "name": "code-review-pipeline", "description": "Analyze code, review for issues, and produce a summary report", "steps": [ { "name": "analyze", "agent_name": "code-reviewer", "prompt": "Analyze the following code for bugs, style issues, and security vulnerabilities:\n\n{{input}}", "mode": "sequential", "timeout_secs": 180, "error_mode": "fail", "output_var": "analysis" }, { "name": "security-check", "agent_name": "security-auditor", "prompt": "Review this code analysis for security issues. Flag anything critical:\n\n{{analysis}}", "mode": "sequential", "timeout_secs": 120, "error_mode": "retry", "max_retries": 2, "output_var": "security_review" }, { "name": "summary", "agent_name": "writer", "prompt": "Write a concise code review summary.\n\nCode Analysis:\n{{analysis}}\n\nSecurity Review:\n{{security_review}}", "mode": "sequential", "timeout_secs": 60, "error_mode": "fail" } ] } ``` ### Example 2: Research and Write Article Research a topic, outline it, then write -- with a conditional fact-check step. ```json { "name": "research-and-write", "description": "Research a topic, outline, write, and optionally fact-check", "steps": [ { "name": "research", "agent_name": "researcher", "prompt": "Research the following topic thoroughly. Cite sources where possible:\n\n{{input}}", "mode": "sequential", "timeout_secs": 300, "error_mode": "retry", "max_retries": 1, "output_var": "research" }, { "name": "outline", "agent_name": "planner", "prompt": "Create a detailed article outline based on this research:\n\n{{research}}", "mode": "sequential", "timeout_secs": 60, "output_var": "outline" }, { "name": "write", "agent_name": "writer", "prompt": "Write a complete article.\n\nOutline:\n{{outline}}\n\nResearch:\n{{research}}", "mode": "sequential", "timeout_secs": 300, "output_var": "article" }, { "name": "fact-check", "agent_name": "analyst", "prompt": "Fact-check this article and note any claims that need verification:\n\n{{article}}", "mode": "conditional", "condition": "claim", "timeout_secs": 120, "error_mode": "skip" } ] } ``` The fact-check step only runs if the article contains the word "claim" (case-insensitive). If the fact-check agent fails, the workflow continues with the article as-is. ### Example 3: Multi-Agent Brainstorm (Fan-Out + Collect) Three agents brainstorm in parallel, then a fourth agent synthesizes their ideas. ```json { "name": "brainstorm", "description": "Parallel brainstorm with 3 agents, then synthesize", "steps": [ { "name": "creative-ideas", "agent_name": "writer", "prompt": "Brainstorm 5 creative ideas for: {{input}}", "mode": "fan_out", "timeout_secs": 60, "output_var": "creative" }, { "name": "technical-ideas", "agent_name": "architect", "prompt": "Brainstorm 5 technically feasible ideas for: {{input}}", "mode": "fan_out", "timeout_secs": 60, "output_var": "technical" }, { "name": "business-ideas", "agent_name": "analyst", "prompt": "Brainstorm 5 ideas with strong business potential for: {{input}}", "mode": "fan_out", "timeout_secs": 60, "output_var": "business" }, { "name": "gather", "agent_name": "planner", "prompt": "unused", "mode": "collect" }, { "name": "synthesize", "agent_name": "orchestrator", "prompt": "You received brainstorm results from three perspectives. Synthesize them into the top 5 actionable ideas, ranked by impact:\n\n{{input}}", "mode": "sequential", "timeout_secs": 120 } ] } ``` The three fan-out steps run in parallel. The `collect` step joins their outputs with `---` separators. The `synthesize` step receives the combined output. ### Example 4: Iterative Refinement (Loop) An agent refines a draft until it meets a quality bar. ```json { "name": "iterative-refinement", "description": "Refine a document until approved or max iterations reached", "steps": [ { "name": "first-draft", "agent_name": "writer", "prompt": "Write a first draft about: {{input}}", "mode": "sequential", "timeout_secs": 120, "output_var": "draft" }, { "name": "review-and-refine", "agent_name": "code-reviewer", "prompt": "Review this draft. If it meets quality standards, respond with APPROVED at the start. Otherwise, provide specific feedback and a revised version:\n\n{{input}}", "mode": "loop", "max_iterations": 4, "until": "APPROVED", "timeout_secs": 180, "error_mode": "retry", "max_retries": 1 } ] } ``` The loop runs the reviewer up to 4 times. Each iteration receives the previous iteration's output as `{{input}}`. Once the reviewer includes "APPROVED" in its response, the loop terminates early. --- ## Trigger Engine The trigger engine (`openfang-kernel/src/triggers.rs`) provides event-driven automation. Triggers watch the kernel's event bus and automatically send messages to agents when matching events arrive. ### Core Types | Rust type | Description | |---|---| | `TriggerId(Uuid)` | Unique identifier for a trigger. | | `Trigger` | A registered trigger: agent, pattern, prompt template, fire count, limits. | | `TriggerPattern` | Enum defining which events to match. | | `TriggerEngine` | The engine: `DashMap`-backed concurrent storage with agent-to-trigger index. | ### Trigger Definition ```rust pub struct Trigger { pub id: TriggerId, pub agent_id: AgentId, // Which agent receives the message pub pattern: TriggerPattern, // What events to match pub prompt_template: String, // Template with {{event}} placeholder pub enabled: bool, // Can be toggled on/off pub created_at: DateTime, pub fire_count: u64, // How many times it has fired pub max_fires: u64, // 0 = unlimited } ``` ### Event Patterns The `TriggerPattern` enum supports 9 matching modes: | Pattern | JSON | Description | |---|---|---| | `All` | `"all"` | Matches every event (wildcard). | | `Lifecycle` | `"lifecycle"` | Matches any lifecycle event (spawned, started, terminated, etc.). | | `AgentSpawned` | `{"agent_spawned": {"name_pattern": "coder"}}` | Matches when an agent with a name containing `name_pattern` is spawned. Use `"*"` for any agent. | | `AgentTerminated` | `"agent_terminated"` | Matches when any agent terminates or crashes. | | `System` | `"system"` | Matches any system event (health checks, quota warnings, etc.). | | `SystemKeyword` | `{"system_keyword": {"keyword": "quota"}}` | Matches system events whose debug representation contains the keyword (case-insensitive). | | `MemoryUpdate` | `"memory_update"` | Matches any memory change event. | | `MemoryKeyPattern` | `{"memory_key_pattern": {"key_pattern": "config"}}` | Matches memory updates where the key contains `key_pattern`. Use `"*"` for any key. | | `ContentMatch` | `{"content_match": {"substring": "error"}}` | Matches any event whose human-readable description contains the substring (case-insensitive). | ### Pattern Matching Details The `matches_pattern` function determines how each pattern evaluates: - **`All`**: Always returns `true`. - **`Lifecycle`**: Checks `EventPayload::Lifecycle(_)`. - **`AgentSpawned`**: Checks for `LifecycleEvent::Spawned` where `name.contains(name_pattern)` or `name_pattern == "*"`. - **`AgentTerminated`**: Checks for `LifecycleEvent::Terminated` or `LifecycleEvent::Crashed`. - **`System`**: Checks `EventPayload::System(_)`. - **`SystemKeyword`**: Formats the system event via `Debug` trait, lowercases it, and checks `contains(keyword)`. - **`MemoryUpdate`**: Checks `EventPayload::MemoryUpdate(_)`. - **`MemoryKeyPattern`**: Checks `delta.key.contains(key_pattern)` or `key_pattern == "*"`. - **`ContentMatch`**: Uses the `describe_event()` function to produce a human-readable string, then checks `contains(substring)` (case-insensitive). ### Prompt Template and `{{event}}` When a trigger fires, the engine replaces `{{event}}` in the `prompt_template` with a human-readable event description. The `describe_event()` function produces strings like: - `"Agent 'coder' (id: ) was spawned"` - `"Agent terminated: shutdown requested"` - `"Agent crashed: out of memory"` - `"Kernel started"` - `"Quota warning: agent , tokens at 85.0%"` - `"Health check failed: agent , unresponsive for 30s"` - `"Memory Created on key 'config' for agent "` - `"Tool 'web_search' succeeded (450ms): ..."` ### Max Fires and Auto-Disable When `max_fires` is set to a value greater than 0, the trigger automatically disables itself (sets `enabled = false`) once `fire_count >= max_fires`. Setting `max_fires` to 0 means the trigger fires indefinitely. ### Trigger Use Cases **Monitor agent health:** ```json { "agent_id": "", "pattern": {"content_match": {"substring": "health check failed"}}, "prompt_template": "ALERT: {{event}}. Investigate and report the status of all agents.", "max_fires": 0 } ``` **React to new agent spawns:** ```json { "agent_id": "", "pattern": {"agent_spawned": {"name_pattern": "*"}}, "prompt_template": "A new agent was just created: {{event}}. Update the fleet roster.", "max_fires": 0 } ``` **One-shot quota alert:** ```json { "agent_id": "", "pattern": {"system_keyword": {"keyword": "quota"}}, "prompt_template": "Quota event detected: {{event}}. Recommend corrective action.", "max_fires": 1 } ``` --- ## API Endpoints ### Workflow Endpoints #### `POST /api/workflows` -- Create a workflow Register a new workflow definition. **Request body:** ```json { "name": "my-pipeline", "description": "Description of the workflow", "steps": [ { "name": "step-1", "agent_name": "researcher", "prompt": "Research: {{input}}", "mode": "sequential", "timeout_secs": 120, "error_mode": "fail", "output_var": "research" } ] } ``` **Response (201 Created):** ```json { "workflow_id": "" } ``` #### `GET /api/workflows` -- List all workflows Returns an array of registered workflow summaries. **Response (200 OK):** ```json [ { "id": "", "name": "my-pipeline", "description": "Description of the workflow", "steps": 3, "created_at": "2026-01-15T10:30:00Z" } ] ``` #### `POST /api/workflows/:id/run` -- Execute a workflow Start a synchronous workflow execution. The call blocks until the workflow completes or fails. **Request body:** ```json { "input": "The initial input text for the first step" } ``` **Response (200 OK):** ```json { "run_id": "", "output": "Final output from the last step", "status": "completed" } ``` **Response (500 Internal Server Error):** ```json { "error": "Workflow execution failed" } ``` #### `GET /api/workflows/:id/runs` -- List workflow runs Returns all workflow runs (not filtered by workflow ID in the current implementation). **Response (200 OK):** ```json [ { "id": "", "workflow_name": "my-pipeline", "state": "completed", "steps_completed": 3, "started_at": "2026-01-15T10:30:00Z", "completed_at": "2026-01-15T10:32:15Z" } ] ``` ### Trigger Endpoints #### `POST /api/triggers` -- Create a trigger Register a new event trigger for an agent. **Request body:** ```json { "agent_id": "", "pattern": "lifecycle", "prompt_template": "A lifecycle event occurred: {{event}}", "max_fires": 0 } ``` **Response (201 Created):** ```json { "trigger_id": "", "agent_id": "" } ``` #### `GET /api/triggers` -- List all triggers Optionally filter by agent: `GET /api/triggers?agent_id=` **Response (200 OK):** ```json [ { "id": "", "agent_id": "", "pattern": "lifecycle", "prompt_template": "Event: {{event}}", "enabled": true, "fire_count": 5, "max_fires": 0, "created_at": "2026-01-15T10:00:00Z" } ] ``` #### `PUT /api/triggers/:id` -- Enable/disable a trigger Toggle a trigger's enabled state. **Request body:** ```json { "enabled": false } ``` **Response (200 OK):** ```json { "status": "updated", "trigger_id": "", "enabled": false } ``` #### `DELETE /api/triggers/:id` -- Delete a trigger **Response (200 OK):** ```json { "status": "removed", "trigger_id": "" } ``` **Response (404 Not Found):** ```json { "error": "Trigger not found" } ``` --- ## CLI Commands All workflow and trigger CLI commands require a running OpenFang daemon. ### Workflow Commands ``` openfang workflow list ``` Lists all registered workflows with their ID, name, step count, and creation date. ``` openfang workflow create ``` Creates a workflow from a JSON file. The file should contain the same JSON structure as the `POST /api/workflows` request body. ``` openfang workflow run ``` Executes a workflow by its UUID with the given input text. Blocks until completion and prints the output. ### Trigger Commands ``` openfang trigger list [--agent-id ] ``` Lists all registered triggers. Optionally filter by agent ID. ``` openfang trigger create [--prompt