Multi-Instance Mesh Guide
The @randal/mesh package lets multiple Randal instances discover each other,
share work, and route jobs to the best-suited instance. This turns a
collection of single agents into a coordinated mesh.
Concept overview
A mesh is a peer-to-peer network of Randal instances. Each instance:
- Registers itself with the mesh on startup.
- Advertises its role, expertise profile, current load, and health.
- Accepts delegated jobs from other instances.
- Routes incoming jobs to the best peer when a better match exists.
There is no central controller. Every instance maintains a local view of the mesh by exchanging lightweight heartbeats over HTTP.
┌──────────────┐ heartbeat ┌──────────────┐
│ Instance │ ◄──────────────► │ Instance │
│ platform- │ │ product- │
│ infra │ │ engineering │
└──────┬───────┘ └──────┬───────┘
│ heartbeat │
└──────────────┬───────────────────┘
▼
┌──────────────┐
│ Instance │
│ security- │
│ compliance │
└──────────────┘
Instance registration and discovery
When mesh.enabled is true, the instance:
- Reads
mesh.endpointto determine its own reachable URL. - Contacts known peers (listed in
.envor discovered via DNS/mDNS). - Exchanges a registration payload containing:
- Instance name
- Role and expertise profile
- Gateway endpoint
- Current load (active jobs / capacity)
- Model availability
Bootstrap methods
| Method | Config | Description |
|---|---|---|
| Static peers | MESH_PEERS=url1,url2 env var | Comma-separated list of peer gateway URLs |
| DNS SRV | MESH_DNS_SRV=_randal._tcp.local | DNS service discovery |
| mDNS | Automatic on local networks | Zero-config LAN discovery |
On startup, the instance sends POST /api/mesh/register to each known peer
and begins periodic heartbeats.
Agent profiles
Each instance declares an expertise profile that the mesh uses for intelligent task routing. The profile has three tiers:
mesh.role — broad domain (recommended)
One of 10 predefined domain slugs. Used for pre-filtering candidates and analytics categorization.
| Domain Slug | Description | Typical expertise areas |
|---|---|---|
product-engineering | Full-stack development | React, TypeScript, APIs, databases, architecture |
platform-infrastructure | DevOps and SRE | Docker, Kubernetes, CI/CD, Terraform, observability |
security-compliance | Application and infra security | AppSec, OWASP, GDPR, SOC2, penetration testing |
data-intelligence | Data engineering and analytics | ETL, ML, BigQuery, Spark, dashboards, BI |
design-experience | UX/UI and accessibility | Figma, design systems, a11y, i18n, prototyping |
content-communications | Technical writing and comms | Docs, blog, release notes, marketing copy |
revenue-growth | Sales and business development | GTM, partnerships, pricing, conversion funnels |
customer-operations | Support and success | Zendesk, onboarding, SLAs, churn, NPS |
strategy-finance | Product management and finance | Roadmaps, OKRs, budgets, sprint planning |
legal-governance | Legal and policy | Contracts, NDAs, IP, licensing, regulatory |
mesh:
role: product-engineering
mesh.expertise — rich skill description (recommended)
A natural language description of the agent's detailed skills. This text is embedded (vectorized) at startup and used for semantic matching at routing time.
Three formats are supported:
Inline string:
mesh:
expertise: >
Expert in React, TypeScript, and frontend architecture.
Deep knowledge of Next.js SSR, design systems, and
responsive UI patterns.
File reference:
mesh:
expertise:
file: ./profiles/frontend-eng.md
Combined (file + additional context):
mesh:
expertise:
file: ./profiles/frontend-eng.md
additional: "Also experienced with the internal billing system and Stripe integration"
The file format follows the same pattern as identity.knowledge — point to a
markdown file containing a detailed expertise description. At boot, the file
is read, concatenated with any additional text, and the full text is
embedded for semantic matching.
Routing algorithm
When a job arrives, the mesh router scores every available instance and picks the best one. The score is a weighted sum of four factors:
score = w_e × expertise_match
+ w_r × reliability_score
+ w_l × (1 - load_ratio)
+ w_m × model_match
Weights
| Factor | Key | Default | Description |
|---|---|---|---|
| Expertise match | expertise | 0.4 | Semantic similarity between task and agent expertise profile (2-tier fallback) |
| Reliability score | reliability | 0.3 | Historical success rate for this domain (from @randal/analytics) |
| Load availability | load | 0.2 | Inverse of current load ratio (0 = fully loaded, 1 = idle) |
| Model match | modelMatch | 0.1 | 1.0 if the instance has access to the requested model |
Configure weights in your config:
mesh:
routingWeights:
expertise: 0.4
reliability: 0.3
load: 0.2
modelMatch: 0.1
2-tier expertise scoring
The expertise match factor uses a cascading fallback strategy:
-
Semantic (Tier 1): If both the task prompt and the agent's expertise profile have been embedded (requires
OPENROUTER_API_KEY), the router computes cosine similarity between the two vectors. This is the most accurate tier — it understands that "fix the login flow" matches an agent with "authentication and session management" expertise, even though the words differ. -
Role match (Tier 2): If embeddings are unavailable, the router performs an exact match on
mesh.roleagainst the auto-detected task domain. Score: 1.0 for exact match, 0.2 for no match.
Routing decision flow
- Auto-detect domain: Classify the task's domain from keywords using the
10-domain taxonomy — or accept an explicit
domainhint from the caller. - Embed the task: If the embedding service is available, vectorize the task description (single API call, <500ms).
- Pre-filter candidates: If enough peers exist (>2), narrow to those
whose
rolematches the detected domain. If no role matches, keep all candidates. - Score all candidates: Compute the weighted sum for each remaining peer (including self).
- Route: If the top-scoring peer is self, execute locally. If remote,
delegate via
POST /api/mesh/delegateand stream results back.
Health monitoring
Each instance sends heartbeats to all known peers at a configurable interval (default: 30 seconds). A heartbeat contains:
{
"name": "eng-agent",
"role": "product-engineering",
"expertise": "React, TypeScript, frontend architecture...",
"endpoint": "http://eng-agent:7600",
"load": 0.35,
"activeJobs": 2,
"uptime": 86400,
"version": "0.1"
}
An instance is marked unhealthy if it misses 3 consecutive heartbeats (~90 seconds by default). Unhealthy instances receive a routing score of 0 and are skipped during delegation.
When an unhealthy instance resumes heartbeats, it is automatically re-admitted to the mesh.
Cross-instance job delegation
Delegation follows these rules:
- Depth limit: Delegated jobs carry a
depthcounter. An instance will not re-delegate a job that has already been delegatedrunner.maxDelegationDepthtimes (default: 2). - Sticky sessions: Once a job is delegated to a peer, follow-up messages in the same conversation are routed to the same peer unless it becomes unhealthy.
- Fallback: If the chosen peer rejects or times out, the originating instance falls back to local execution.
- Streaming: Delegated jobs stream events back via SSE so the end user sees real-time progress.
API endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/mesh/register | POST | Register this instance with a peer |
/api/mesh/heartbeat | POST | Send health heartbeat |
/api/mesh/delegate | POST | Delegate a job to this instance |
/api/mesh/status | GET | Return mesh topology and health |
CLI commands
randal mesh status
Display the current mesh topology:
$ randal mesh status
Mesh Status
──────────────────────────────────────────────────────────────────────────────────
Instance Role Expertise Load Health
──────────────────────────────────────────────────────────────────────────────────
local (self) platform-infrastructure K8s, Terraform, CI/CD... 0.15 healthy
eng-agent product-engineering React, TypeScript, APIs.. 0.42 healthy
sec-agent security-compliance AppSec, OWASP, audits... 0.00 healthy
docs-agent content-communications Tech writing, guides... 0.78 degraded
──────────────────────────────────────────────────────────────────────────────────
Total instances: 4 │ Healthy: 3 │ Unhealthy: 1
randal mesh route
Preview which instance would handle a given prompt:
$ randal mesh route "Fix the Docker build"
Routing Analysis
───────────────────────────────────────────────────────────
Domain detected: platform-infrastructure
Instance Expert Rel Load Model Score
───────────────────────────────────────────────────────────
local (self) 0.920 0.270 0.170 0.100 0.94
eng-agent 0.310 0.210 0.200 0.100 0.51
sec-agent 0.050 0.150 0.120 0.100 0.37
───────────────────────────────────────────────────────────
→ Routing to: local (self)
Configuration examples
Minimal mesh instance
No profile fields required — the instance participates in the mesh but receives a neutral expertise score (0.5) during routing.
name: worker-1
runner:
workdir: ./workspace
mesh:
enabled: true
endpoint: http://localhost:7600
Infrastructure agent with expertise profile
name: infra-agent
runner:
workdir: ./workspace
defaultModel: anthropic/claude-sonnet-4
mesh:
enabled: true
role: platform-infrastructure
expertise: >
Kubernetes cluster management, Terraform IaC, GitHub Actions CI/CD,
Docker containerization, Prometheus/Grafana observability stack,
AWS EKS and GCP GKE administration.
endpoint: http://infra-agent:7600
routingWeights:
expertise: 0.5
reliability: 0.25
load: 0.15
modelMatch: 0.1
File-based expertise profile
name: frontend-agent
runner:
workdir: ./workspace
mesh:
enabled: true
role: product-engineering
expertise:
file: ./profiles/frontend-eng.md
additional: "Also experienced with the internal billing system"
endpoint: http://frontend-agent:7600
Three-node mesh (docker-compose)
Each configs/*.yaml file should have mesh.role and mesh.expertise set
for optimal routing. See the examples above for the config format.
# docker-compose.yml
services:
frontend-agent:
image: ghcr.io/drewbietron/randal:latest
environment:
MESH_PEERS: http://backend-agent:7600,http://infra-agent:7600
volumes:
- ./configs/frontend.yaml:/app/randal.config.yaml
backend-agent:
image: ghcr.io/drewbietron/randal:latest
environment:
MESH_PEERS: http://frontend-agent:7600,http://infra-agent:7600
volumes:
- ./configs/backend.yaml:/app/randal.config.yaml
infra-agent:
image: ghcr.io/drewbietron/randal:latest
environment:
MESH_PEERS: http://frontend-agent:7600,http://backend-agent:7600
volumes:
- ./configs/infra.yaml:/app/randal.config.yaml
Tips
- Start with 2 instances and add more as your workload grows.
- Use
randal mesh routeto verify routing before deploying. - Write detailed expertise descriptions — the more specific, the better the semantic routing. Include technologies, frameworks, and domain knowledge.
- Use
randal mesh route 'your task'to preview how the expertise matcher scores your peers. - Combine with
@randal/analyticsfor reliability-informed routing. - Monitor the
/api/mesh/statusendpoint from your infrastructure tooling. - Set
MESH_PEERSvia environment variables so the same config image works across environments.