IAF Architecture Map

AI Operations Navigation Guide — which files to touch, which to leave alone, and where to modify.

2026-04-02 MAP v0.4

This document serves as the operations navigation guide for external LLMs (Claude Code, GPT Agent, etc.) taking over IAF. It answers not "what functions are in each file," but "which files can be touched, which should be left alone, and which areas to modify when touching them."

Core principle: Do not touch infrastructure unless absolutely necessary. 99% of requirements can be fulfilled by adjusting Adjustable components.

v0.4 changes: Integrated full external LLM takeover plan. Added MANIFEST.json, validate.py, call_log.jsonl, tool hot-reload, Tube retry/data-passing, auto_commit.sh and other component descriptions. Maintains independent agent engine copy design (possibility management) without introducing engine inheritance.

0. External LLM Takeover Entry Point

When an external LLM first takes over an IAF instance, read these three files in order to begin work:

Order	File	Question It Answers
1	`MANIFEST.json`	What does the system look like right now? What agents, tubes, and dispatches exist?
2	`PLAYBOOK.md`	What file operations correspond to each intent? What are the steps?
3	This document	How does the system work? What is each file's role? Which can be touched and which cannot?

0.1 MANIFEST.json (System Map)

Auto-generated by generate_manifest.py scanning directories, also auto-refreshed when chat_server.py starts.

{
  "framework": "IAF",
  "version": "0.1.0",
  "generated_at": "2026-04-01T14:30:00Z",

  "structure": {
    "agents_dir": "agents/",
    "template_dir": "template/",
    "dispatch_dir": "dispatch/",
    "tube_dir": "tube/",
    "global_config": "config.json",
    "tube_config": "tube/tubes.json",
    "tube_log": "tube/tube_log.jsonl",
    "pages_dir": "pages/"
  },

  "agents": {
    "charlie": {
      "config": "agents/charlie/agent_config.json",
      "soul": "agents/charlie/SOUL.md",
      "tools": ["file_tools.py", "shell_tools.py", "dispatch_tools.py", "tube_tools.py"],
      "history": "agents/charlie/history.jsonl",
      "call_log": "agents/charlie/call_log.jsonl",
      "model": "google/gemini-3-flash-preview"
    }
  },

  "dispatches": {
    "roundtable": {
      "config": "dispatch/roundtable/dispatch_config.json",
      "agents": ["charlie", "mcmillan"],
      "ui": "dispatch/roundtable/roundtable.html"
    }
  },

  "tubes": {
    "morning_news": {
      "enabled": true,
      "triggers": ["cron:0 3 * * *", "manual"],
      "steps": ["agent:charlie"]
    }
  },

  "conventions": {
    "tool_file_pattern": "*_tools.py",
    "tool_export_variable": "TOOLS",
    "context_strategy_dir": "context/",
    "skill_dir": "skills/"
  }
}

External LLM manual refresh: python3 generate_manifest.py

0.2 PLAYBOOK.md (Operations Manual)

Plain-text operations manual covering complete steps for all common operations. See standalone file.

0.3 validate.py (Validation Script)

External LLM calls via Bash after modifying files to confirm changes are valid.

python3 validate.py agent charlie          # Validate single agent
python3 validate.py tool agents/charlie/tools/http_tools.py  # Validate tool file
python3 validate.py tube                   # Validate tubes.json
python3 validate.py all                    # Global validation

Output format: Success → OK, Failure → FAIL: N error(s) + line-by-line error descriptions. Plain text, no colours.

0.4 auto_commit.sh (Safety Snapshot)

External LLM calls before performing batch modifications:

bash auto_commit.sh "Pre-modification snapshot: adding http tools to charlie"

Rollback: git revert HEAD or git checkout -- {file}

1. Global Classification

All files in the framework fall into two categories:

Category	Meaning	AI Operation Mode
Infrastructure	How the framework operates. The plumbing.	Read once to build understanding, then don't touch
Adjustable	What the system does. The building blocks and wiring.	May be operated on every task

Decision rule: Upon receiving a requirement, first confirm whether it can be achieved by adjusting Adjustable components. Only when the requirement involves the framework's capability boundary itself (new LLM response formats, new storage mechanisms, new communication protocols) should you consider touching Infrastructure.

2. Infrastructure Inventory (Do Not Touch)

2.1 Shared Layer lib/

File	Lines	Role	Interface AI Needs to Know
`lib/llm_client.py`	~76	HTTP LLM calls + retry + error classification	`call_llm(url, key, model, messages, tools=None) → response`
`lib/token_utils.py`	~13	Token count estimation	`estimate_tokens(text) → int`

2.2 Web Service Layer

File	Lines	Role	What AI Needs to Know
`chat_server.py`	~220	Flask router + Tube Runner startup + MANIFEST generation	Contains no business logic, only route dispatch. Auto-starts TubeRunner background thread and generates MANIFEST.json on startup
`dispatch_routes.py`	~330	Dispatch layer Flask Blueprint	Calls dispatch.py public functions, contains no orchestration logic
`tube_routes.py`	~280	Tube layer Flask Blueprint	Manages tube listing, status queries, manual triggers, log read/clear

2.3 Agent Engine template/

File	Lines	Role	What AI Needs to Know
`template/core/direct_llm.py`	~290	Core loop engine	`call_agent(message, mode, max_loops) → response`. Loads system prompt via `context_files`. Includes call_log.jsonl structured logging
`template/core/tool_executor.py`	~60	Tool auto-discovery registry + hot-reload	Scans tools/*_tools.py, imports TOOLS dict. Checks tools/ directory mtime before each execute() call, auto-rescans on changes
`template/context/sliding_window.py`	~47	Default context trimming strategy	`trim(messages, max_tokens) → trimmed_messages`
`template/tools/file_tools.py`	~78	Default toolset (read/write files, list directory)	Can be replaced or extended after copying to new Agent
`template/tools/TOOL_CONTRACT.md`	—	Tool file format contract	External LLM references this when writing new tools

Note: template/ is the copy source. When creating a new Agent: cp -r template/ agents/xxx/, after which files in agents/xxx/ become Adjustable. Do not modify template/ itself.

direct_llm.py five-layer message assembly model:

Layer 1: context_files content → concatenated as system prompt
Layer 2: skills trigger matching → injected as user+assistant dialogue pairs when matched
Layer 3: history.jsonl → loaded in chat mode only, skipped in batch mode
Layer 4: current user message
Layer 5: trim → ensures total stays within max_context - 8000

direct_llm.py path resolution rules (shared by context_files and skill_file):

Priority	Resolution Method	Example
1	Agent directory relative path	`"SOUL.md"` → `agents/xxx/SOUL.md`
2	Framework root relative path	`"dispatch/roundtable/rules/default.md"`
3	Absolute path	`"/data/external/reference.md"`

direct_llm.py structured logging (call_log.jsonl):

Each call_agent() invocation records the following events in agents/xxx/call_log.jsonl:

Event	Key Fields	Meaning
`call_started`	model, mode, message_preview	Agent was called
`llm_call`	loop, tokens_est, duration_ms	One LLM API call
`tool_call`	loop, tool_name, args_summary, result_length, is_error	One tool execution
`call_completed`	loops_used, reply_length, total_duration_ms	Agent returned result
`call_failed`	error	Agent call failed

External LLM viewing: tail -20 agents/xxx/call_log.jsonl or grep "tool_error" agents/xxx/call_log.jsonl

2.4 Dispatch Strategy Infrastructure

Within each dispatch strategy folder, the following files are Infrastructure:

File	Lines	Role	What AI Needs to Know
`dispatch_base.py`	~478	Tool loop, LLM response parsing, staging management, status tracking	See interface table below
`session_manager.py`	~200	JSONL session CRUD + staging formatting	`create_session()`, `append_to_session()`, `load_session()`, `list_sessions()`, `delete_session()`, `format_session_history()`
`context_injector.py`	~80	Reads files by context_files path list, builds messages array	`build_context(agent_id, config, project_root, user_message) → (messages, provider, model)`
`context/sliding_window.py`	~150	Configuration-driven context trimming	`trim_records(records, max_tokens, trim_strategy) → trimmed_records`

dispatch_base.py key interfaces:

get_llm_caller(project_root) → call_llm_fn or None
load_global_config(project_root) → dict
resolve_llm_endpoint(provider, global_config) → (url, key)
load_agent_tools(agent_id, project_root) → (tool_definitions, tool_functions)
call_with_tool_loop(messages, url, key, model, call_llm_fn, tool_definitions, tool_functions, max_tool_loops) → (content, tool_history)
write_agent_memory(agent_id, tool_history) → None
write_staging_history(project_root, session_id) → None
clear_staging() → None
set_status(session_id, round_num, agent_id, agent_name, status) → None
clear_status() → None
get_status() → dict

2.5 Tube Layer Infrastructure

2.5.1 Core Engine

File	Lines	Role	What AI Needs to Know
`tube/tube_runner.py`	~370	Main loop engine: polls tubes.json, checks trigger conditions, executes steps serially (with retry and failure strategies), writes logs, manages staging data passing	`TubeRunner(interval=15).run()` — runs as daemon thread in chat_server

Runtime mechanism key points:

Polls tubes.json every 15 seconds (hot-loaded, config changes don't require restart)
Each triggered tube executes in an independent thread, multiple tubes can run in parallel
running_tubes dictionary prevents duplicate triggering of the same tube
Steps default to serial: next step runs only if previous exits with code 0; non-0 follows on_fail config (stop or continue)
Supports step-level retry (retry.max + retry.delay_sec)
Inter-step data passes via staging files ($PREV_OUTPUT placeholder)
type=tube steps execute inline recursively (depth limit 5 levels)
Other type steps build commands via targets/ modules, executed as subprocesses

2.5.2 Pluggable Trigger Sources triggers/

Each .py file is a trigger source type with unified interface:

def check(config: dict, state: dict) -> bool

File	Lines	Role	Config Example
`triggers/cron.py`	~44	Scheduled trigger (depends on croniter)	`{"expr": "0 3 * * *"}`
`triggers/manual.py`	~32	Flag file trigger (API or file creation)	`{}` (no config needed)

state fields (provided by tube_runner):

Field	Type	Description
`now`	datetime (UTC)	Current time
`last_triggered`	datetime or None	Last trigger time for this tube
`tube_id`	str	Current tube ID
`flag_dir`	str	Manual trigger flag file directory path

User extension: Drop a .py file in triggers/, implement check() interface. Zero code changes.

2.5.3 Pluggable Drive Targets targets/

Each .py file is a drive target type with unified interface:

def build_command(step: dict, project_root: str) -> list[str]

File	Lines	Role	step.type
`targets/agent.py`	~34	Builds Agent subprocess command	`"agent"`
`targets/dispatch.py`	~40	Builds Dispatch subprocess command	`"dispatch"`

User extension: Drop a .py file in targets/, implement build_command() interface. Zero code changes. Exception: type=tube steps don't go through targets/ modules; they're handled inline recursively by tube_runner.

2.5.4 Tube API Endpoints

Method	Path	Function
GET	`/api/tubes`	List all tube definitions + real-time running/idle status
GET	`/api/tube/status`	Lightweight status query (id, enabled, status only)
POST	`/api/tube/trigger`	Manual trigger (body: `{"tube_id": "xxx"}`)
GET	`/api/tube/log?tail=50&tube_id=xxx`	Query logs (filterable)
GET	`/api/tube/log/grouped?per_tube=10`	Return logs grouped by tube
DELETE	`/api/tube/log?tube_id=xxx`	Clear logs (can clear per tube)

2.5.5 tube_log.jsonl Event Types

Event	Key Fields	Meaning
`runner_started`	interval	tube_runner started
`runner_stopped`	—	tube_runner stopped
`tube_triggered`	tube_id, step_count	Tube was triggered
`step_started`	tube_id, step_index, step_type, step_target, payload	A step began
`step_completed`	tube_id, step_index, step_type, step_target, exit_code, duration_sec	A step succeeded
`step_failed`	tube_id, step_index, step_type, step_target, exit_code, duration_sec, stderr_tail	A step failed
`step_retry`	tube_id, step_index, attempt, max_attempts, delay_sec	A step retrying
`tube_completed`	tube_id, duration_sec	All steps completed
`tube_stopped`	tube_id, stopped_at_step, duration_sec	Stopped due to step failure
`trigger_error`	tube_id, trigger_type, error	Trigger check exception

2.6 Framework-Level Utility Tools

File	Role	User
`generate_manifest.py`	Scans directories to generate MANIFEST.json	External LLM via Bash
`validate.py`	Validates agent/tool/tube config legality	External LLM via Bash
`auto_commit.sh`	Git snapshot (safety net before batch modifications)	External LLM via Bash

These three files are Infrastructure, but external LLMs only call them, never modify them.

3. Adjustable Components Inventory (AI Operation Targets)

3.1 Agent Adjustable Components

Within each agents/xxx/ folder:

File	Purpose	Modification Scenarios
`agent_config.json`	Model, provider, context sources, skill trigger rules	Change model, modify context file list, add/remove skill triggers
`SOUL.md`	Agent identity definition (referenced via context_files)	Change personality, role, behavioural guidelines
`skills/*.md`	Task instruction files	Add/remove instruction content
`tools/*_tools.py`	Agent's available tools	Add/remove tools (drop file = effective, auto-discovered via hot-reload)
`context/sliding_window.py`	This Agent's trimming strategy	When different trimming behaviour than default is needed

Engine code (core/direct_llm.py, core/tool_executor.py) is semi-infrastructure: not modified by default after copying from template. Only modify when the user needs this Agent to be fundamentally different from others at the engine level — this is the core promise of "possibility management."

Engine modification protocol: Each agent has an independent engine copy. When modifying engine logic: 1) Modify one agent's copy and test. 2) After confirmation, diff against other agents' same-named files. 3) Sync individually to agents that need the same modification (not all agents need syncing — some may have special logic). 4) After syncing, run python3 validate.py all

3.2 Dispatch Adjustable Components

Within each dispatch/xxx/ folder:

File	Purpose	Modification Scenarios
`dispatch.py`	Orchestration logic	Change collaboration mode (round-robin → debate → star topology → serial pipeline)
`dispatch_config.json`	Participating agents, rounds, trimming strategy, context_files	Add/remove agents, change rounds, change context sources
`rules/*.md`	Agent's role definition in this collaboration	Change agent's role in collaboration

Key design: Dispatch does not call Agent's call_agent. It directly calls lib/llm_client.call_llm(), assembling context itself via context_injector. The Agent folder serves Dispatch only as a data source (SOUL.md, rules, config), not as an executor.

3.3 Tube Adjustable Components

File	Purpose	Modification Scenarios
`tube/tubes.json`	Declarative definition of all tubes — single source of topology truth	Add/remove tubes, change triggers, change steps, change signal topology
`pages/tube-dashboard.html`	Tube visual monitoring page	Customise UI display

3.4 UI Adjustable Components

File	Purpose	Modification Scenarios
`index.html`	Yellow pages, feature entry index	Generally unchanged (auto-discovery)
`chat.html`	Basic chat interface	Customise chat UI
`pages/*.html`	User-built feature pages	Add page = drop file
`dispatch/xxx/*.html`	Dispatch-specific UI	Customise collaboration interface

4. AI Operations Decision Tree

Received requirement
  │
  ├─ Need a new Agent?
  │   → bash auto_commit.sh "snapshot before adding agent"
  │   → cp -r template/ agents/xxx/
  │   → Edit SOUL.md
  │   → Edit agent_config.json (set context_files, skills, model)
  │   → Don't touch engine code
  │   → python3 validate.py agent xxx
  │   → python3 generate_manifest.py
  │
  ├─ Need to change Agent's loaded files?
  │   → Edit agent_config.json's context_files array
  │   → Add path = add file, remove path = remove file
  │   → Don't touch direct_llm.py
  │
  ├─ Need to add tools to Agent?
  │   → Reference template/tools/TOOL_CONTRACT.md for format
  │   → Drop *_tools.py file in agents/xxx/tools/
  │   → Hot-reload auto-discovery, no code changes
  │   → python3 validate.py tool agents/xxx/tools/new_tools.py
  │
  ├─ Need Agent to execute shell commands?
  │   → cp template/tools/shell_tools.py agents/xxx/tools/
  │
  ├─ Need Agent to trigger Tubes?
  │   → cp template/tools/tube_tools.py agents/xxx/tools/
  │
  ├─ Need Agent to initiate multi-Agent collaboration?
  │   → cp template/tools/dispatch_tools.py agents/xxx/tools/
  │
  ├─ Need to add skills to Agent?
  │   → Drop .md file in agents/xxx/skills/
  │   → Method A: Add trigger rule in agent_config.json skills array (exact match)
  │   → Method B: Add entry in skill_router.md (semantic trigger, needs file_tools)
  │   → Method C: Add directly to context_files (full injection, always loaded)
  │
  ├─ Need a new Dispatch strategy?
  │   → cp -r dispatch/roundtable/ dispatch/xxx/
  │   → Edit dispatch.py ORCHESTRATION LOGIC section
  │   → Edit dispatch_config.json, rules/
  │   → Don't touch dispatch_base.py, session_manager.py, context_injector.py
  │
  ├─ Need to adjust signal topology?
  │   → Edit tube/tubes.json
  │   → Don't touch tube_runner.py, triggers/, targets/
  │   → python3 validate.py tube
  │
  ├─ Need a new Tube trigger source type?
  │   → Drop .py file in tube/triggers/, implement check(config, state) → bool
  │   → Use filename as type in tubes.json
  │   → Don't touch tube_runner.py
  │
  ├─ Need a new Tube drive target type?
  │   → Drop .py file in tube/targets/, implement build_command(step, project_root) → list
  │   → Use filename as type in steps of tubes.json
  │   → Don't touch tube_runner.py
  │
  ├─ Need to manually trigger a Tube?
  │   → curl -X POST http://127.0.0.1:5000/api/tube/trigger \
  │       -H "Content-Type: application/json" \
  │       -d '{"tube_id": "xxx"}'
  │   → or: touch tube/manual_triggers/xxx
  │   → or: Agent triggers via tube_tools.py
  │
  ├─ Need to check system runtime status?
  │   → Agent chat history: tail agents/xxx/history.jsonl
  │   → Agent run log: tail agents/xxx/call_log.jsonl
  │   → Tube execution log: tail tube/tube_log.jsonl
  │   → Tube step output: Read tube/staging/{tube_id}_{timestamp}/
  │   → Error troubleshooting: grep "error\|failed" tube/tube_log.jsonl
  │   → Global validation: python3 validate.py all
  │
  ├─ Need to modify Agent engine logic?
  │   → bash auto_commit.sh "snapshot before engine modification"
  │   → Modify one agent's core/ copy first and test
  │   → diff against other agents' same-named files
  │   → Sync individually (note: not all agents need syncing)
  │   → python3 validate.py all
  │
  ├─ Need a new UI page?
  │   → Drop .html file in pages/
  │   → Auto-discovered by yellow pages
  │
  ├─ Need to rollback changes?
  │   → git diff → see what changed
  │   → git log --oneline -10 → see snapshot history
  │   → git checkout -- {file} → restore single file
  │   → git revert HEAD → rollback most recent commit
  │
  └─ None of the above?
      → May need to touch Infrastructure
      → First confirm: can this really not be achieved via Adjustable components?
      → If confirmed, read the corresponding Infrastructure source code, understand, then modify
      → After modification: python3 validate.py all to verify no other modules affected

5. Agent Capability Three-Layer Model Quick Reference

Layer	Carrier	Config Location	Load Timing	Purpose
Identity & Knowledge	.md / .txt files	agent_config.json `context_files`	Every call	Information the Agent always knows
Conditional Instructions	.md files + trigger rules	agent_config.json `skills`	On keyword match	Instructions injected for specific scenarios
Execution Capability	*_tools.py files	Auto-discovered (drop file = effective, hot-reload)	Every call	Operations the Agent can perform

6. File Change Impact Scope Quick Reference

File Changed	Impact Scope
agents/xxx/SOUL.md	That Agent only
agents/xxx/agent_config.json	That Agent only
agents/xxx/skills/*.md	That Agent only
agents/xxx/tools/*.py	That Agent only (hot-reload, effective next call)
agents/xxx/core/direct_llm.py	That Agent only (others have independent copies, unaffected)
dispatch/xxx/dispatch.py	That Dispatch strategy only
dispatch/xxx/dispatch_config.json	That Dispatch strategy only
dispatch/xxx/rules/*.md	That Dispatch strategy only
tube/tubes.json	Signal topology (does not affect Agent and Dispatch internals)
tube/triggers/new.py	Only tubes using that trigger type
tube/targets/new.py	Only tubes using that target type
pages/*.html	That page only
MANIFEST.json	No runtime impact (read-only for external LLMs)
lib/llm_client.py	Global — all Agent and Dispatch LLM calls
chat_server.py	Global — all API routes + Tube Runner startup
tube/tube_runner.py	Global — all Tube polling and execution
template/	No direct impact — only affects future new Agents

Key characteristic: The first 14 rows' impact scope is all "that module only" or "only tubes using it." This is the value of module isolation — AI can safely make local modifications with no global side effects.

7. Hot-Reload Mechanism Quick Reference

File Type	Hot-Reload Method	Effective Timing
agent_config.json	`_load_config()` re-reads on every call_agent()	Next agent call
SOUL.md / context_files	`_load_context_files()` re-reads every time	Next agent call
tools/*_tools.py	tool_executor checks tools/ directory mtime	Next agent call
tubes.json	tube_runner `_load_tubes()` on every polling cycle	Next polling cycle (≤15 seconds)
dispatch_config.json	Re-read on every dispatch call	Next dispatch call
rules/*.md	Re-read via context_injector every time	Next dispatch call
pages/*.html	Flask route re-reads on every request	Next page visit

Conclusion: After an external LLM modifies files, no server restart is needed. All Adjustable components are hot-loaded.

8. Dangerous Operations Checklist (Always Git Commit Before Modifying)

Operation	Risk Level	Description
Delete agents/ directory	High	Agent data and history permanently lost
Modify config.json provider key	High	All agents' LLM calls may fail
Modify any agent's core/direct_llm.py	Medium	Affects only that agent, but may break core loop
Modify dispatch/xxx/dispatch_base.py	High	Affects all collaboration sessions for that strategy
Modify lib/llm_client.py	High	Affects all global LLM calls
Modify tube/tube_runner.py	High	Affects all global Tube execution
Delete or heavily modify existing tubes in tubes.json	Medium	May interrupt running automation flows

Before operation: bash auto_commit.sh "description"
After operation: python3 validate.py all

9. External LLM Full Takeover Acceptance Scenario

1. Read MANIFEST.json
   → Understand what agents, tubes, dispatches the system has

2. Read PLAYBOOK.md
   → Know how to operate

3. Bash: bash auto_commit.sh "pre-modification snapshot"
   → Safety backup

4. Bash: cp -r template/ agents/monitor/
   → Create new agent

5. Write agents/monitor/SOUL.md
   → Write "You are a website monitoring agent, check site status when called..."

6. Edit agents/monitor/agent_config.json
   → Set display_name, model

7. Read template/tools/TOOL_CONTRACT.md
   → Understand tool file format

8. Write agents/monitor/tools/http_tools.py
   → Write HTTP check tool conforming to contract

9. Bash: python3 validate.py agent monitor
   → Output "OK"

10. Edit tube/tubes.json
    → Add a tube triggering monitor agent every 10 minutes

11. Bash: python3 validate.py tube
    → Output "OK"

12. Bash: python3 generate_manifest.py
    → Update MANIFEST.json

13. Bash: bash auto_commit.sh "Added monitor agent and scheduled check tube"
    → Save modifications

14. After 10 minutes:
    Read tube/tube_log.jsonl → Confirm tube triggered
    Read agents/monitor/call_log.jsonl → Confirm agent ran

15. If issues:
    Bash: git diff → See what changed
    Bash: git revert HEAD → Rollback