Forge Architecture Debate — All-Model Convergence

4 models (Grok CTO, Gemini, Opus, Codex) debating the architecture for Forge as a meta-orchestrator. Updated: April 1, 2026

70%

System Built

Products Managed

Critical Gaps Found

Repos (antilles + lottery)

Priority Items

The Architecture (Converged)

YOU (Telegram / CLI) │ ▼ FORGE (Meta-Orchestrator) ├── Hermes (brain, supervises all tracks event-driven) ├── Memory Router (8 providers, CXDB, hermes-memory) ├── API Bridge (connects OpenClaw ↔ Memory ↔ Toolkit) ├── Research Harness (Swarma, MiroShark, Evals, Auto-research) │ ├── tracks.yaml (maps tracks to product repos + branches) │ ├── rmt-reputation → antilles-v2 / feature/rmt-core-porting │ ├── identity → antilles-v2 / feature/identity-foundation │ ├── x402 → antilles-v2 / agent/x402-trust-channels │ └── lottery → lottery / main │ └── Per-Track Instances: ├── Research State Machine (STRATEGY → EXPERIMENT → VERDICT) ├── Sprint Dev Pipeline (SEED → BUILD → VALIDATE → COMMIT) ├── Eval Matrix (5-dimensional: Scenario×Actor×Scale×Condition×Metric) └── Working Memory (synced to product repo docs/) │ ▼ PRODUCT REPOS (antilles-v2, lottery, future...) ├── Receive PRs from Forge sprint pipeline ├── Receive working memory summaries ├── Deploy to testnet/mainnet via Forge commands └── Stay clean — only production-ready code

The Flow: Research → Ship

1. Research Phase

You say: "optimize sybil detection"

Forge runs: Swarma loops, MiroShark sims, auto-research, evals

Findings stay in Forge (CXDB, hermes-memory)

You get: Telegram notification with narrative summary

2. Approval Gate

Hermes: "Detection improved 93.7%→99.2%. Approve build?"

You tap: [Approve] on Telegram

Grok CTO validates the finding

3. Sprint Pipeline

Forge creates worktree of antilles-v2 branch

Code written in that worktree (not in Forge)

Tests run, cross-model review

PR submitted to antilles-v2

4. Deploy

You say: "ship rmt testnet"

Forge deploys from antilles-v2 to Sepolia

Health checks, verification

Dashboard updated with deployment status

4-Model Debate: Critical Gaps Found

GROK GEMINI Both Found

Gap	Risk	Proposed Fix	Status
1. Concurrency Locking	HIGH — Two tracks modify same file → conflict	Lock manager or serialization queue	NEEDS DESIGN
2. OpenClaw Coexistence	MEDIUM — Parallel systems, race conditions	Migrate OpenClaw dispatch into Forge OR make it pure relay	NEEDS DECISION
3. External Repo Updates	MEDIUM — Manual commits make Forge stale	Git webhooks or periodic sync	NEEDS DESIGN
4. CI/CD Integration	MEDIUM — Unclear if Forge replaces GH Actions	Forge creates PRs, GH Actions handles post-merge	NEEDS DECISION
5. Security/Approvals	HIGH — No RBAC for production deploys	Role-based access + multi-party approval	NEEDS DESIGN
6. Cost Sustainability	MEDIUM — Research loops could overrun	Paperclip thresholds + auto-scaling	PARTIAL (Paperclip exists)

OPUS Additional Concerns

Research harness has stale copy of production algorithms — needs submodule or workspace reference
47 ARCH findings blocking pipeline — fix in antilles-v2, not forge
Hermes profiles have 0 memories — event-driven supervision not wired
90-day acceptance window too long for bots — needs redesign

CODEX Awaiting Response...

Codex is processing — will update when response arrives

Implementation Priority

P0: tracks.yaml Enhancement (4 hours)

Add repo + branch fields. Scope dispatch-review per-track. Foundation for everything else.

P0: Hermes Event-Driven Supervision (8 hours)

HermesSupervisorHandler in Memory Router event bus. Replaces cron-based approach. Proactive Telegram alerts.

P1: Memory Bridge (4 hours)

Research findings → target repo WORKING_MEMORY.md automatically. Curated summaries, not raw data.

P1: Test Deploy Pipeline (8 hours)

VALIDATE stage creates worktree of target repo, runs tests + testnet deploy.

P2: 47 ARCH Findings (4-8 hours)

Fix in antilles-v2 where they belong. Hardcoded values → config, missing error handling → try/catch.

What's Built vs What's Missing

Component	Status	Details
Forge CLI	BUILT	30+ commands including new-track (built today)
Telegram Dispatch (OpenClaw)	WORKING	Relays to/from Telegram, creates worktrees
Hermes (6 profiles)	PARTIAL	Running but passive — needs event-driven supervision
Memory Router	WORKING	8 providers, event bus, 17 hermes-memory entries
Research Loops (4 tracks)	RUNNING	Swarma 4-model rotation, all tracks polling
RMT Contracts + Oracle	BUILT	11 contracts, 34 oracle files, 92 tests
Identity Portal	BUILT	304 commits, 150+ React components, full backend
RMT Dashboard	BUILT	60 React components with D3.js graph viz
ERC-8004 Fetcher	BUILT	Needs Base mainnet config only
tracks.yaml Scoping	TODO	Needs repo + branch fields + dispatch-review scoping
HermesSupervisorHandler	TODO	Seed brief created, needs sprint pipeline
Memory Bridge to Repos	TODO	scanPaths added today, handler needed
Test Deploy Pipeline	TODO	VALIDATE stage needs worktree-into-target-repo
Multi-Chain Indexer	TODO	HelixaApiFetcher + ERC8183Fetcher needed (~220 LOC)
Eval Matrix (5-dim)	TODO	Framework designed, needs eval-matrix.ts (~300 LOC)
Concurrency Lock Manager	NOT DESIGNED	Critical gap — two tracks same repo

Competitive Landscape (Today's Research)

Competitor	Threat	Agents	Our Response
ERC-8004 ecosystem	HIGH	130K+	We complete it (sybil oracle + trust delegation)
ERC-8183 + Virtuals	HIGH	18K	Our hooks add trust gating
Helixa (CredOracle)	MED-HIGH	132K indexed	Centralized, no sybil detection, cap at 50
Cred Protocol	MED	200M addr	Credit scoring only
FairScale	DEAD	—	Treasury liquidated

Today's Research Session Summary

Full-day strategic research produced:

4-model converged strategic plan (FICO revenue, free sybil oracle, ERC-8183 hooks as GTM)
Complete competitive landscape (14 protocols analyzed)
Helixa deep dive (132K agents, centralized oracle, MIT-licensed code)
Virtuals Protocol economics analysis (REJECT bonding curves, ADOPT ve-staking)
Generalized eval framework extracted from lottery → all tracks
Architecture debate: Forge as meta-orchestrator with tracks.yaml scoping
20+ model review files saved
Master research document: 629+ lines

Forge Architecture Debate Dashboard — Generated April 1, 2026 | Research Dashboard | Execution Plan | Eval Framework