Tutorial 01 — Building an Agent¶
Quick commands¶
HITL is declarative via spec.governance in an overlay — there is no --hitl CLI flag.
From this directory (docs/tutorials/01-building-an-agent):
# Live chat with tools
mas-ctl chat agent.yaml \
-o overlays/tools.yaml \
-o overlays/skills.yaml \
-o overlays/memory.yaml \
-q "What is the current price of Apple?"
# HITL — governance overlay sets hitl_on_tool + hitl_mode: interactive
mas-ctl chat agent.yaml \
-o overlays/tools.yaml \
-o overlays/governance-hitl.yaml \
-q "What is the current price of Apple?"
# Offline HITL demo (no live LLM) — mock-llm overlay
mas-ctl chat agent.yaml -i \
-o overlays/tools.yaml \
-o overlays/mock-llm.yaml \
-o overlays/governance-hitl.yaml
# Operator steering mid-run (interactive session)
mas-ctl chat agent.yaml -i -o overlays/tools.yaml --trace
# then type: /steer Use web search for stock prices, not fruit.
# TUI — same manifest/overlays
mas-ctl tui agent.yaml \
-o overlays/tools.yaml \
-o overlays/governance-hitl.yaml
Deployment: deployments/local-inproc.yaml (default runtime).
Packages:
mas-ctl(chat/validate),mas-runtime(contracts/plugins),mas-lab(plots/benchmarks) Time: ~30 min hands-on Goal: Go from an empty YAML file to a working QA agent with tools, a skill, and memory — then run it via CLI and API. Prerequisite: complete Tutorial 0 first. It coverstask install,$XDG_CONFIG_HOME/mas/config.yaml, infra bundles, data/trace cache paths, and API keys. Offline validation steps run without an LLM; live runs needdefault_infraor--infra-refplusOPENAI_API_KEYin.env.
Overview¶
The MAS Framework defines agents declaratively. An agent manifest is a YAML file
(kind: Agent) that describes what the agent does — not how it does it.
The runtime handles the rest: design pattern execution, tool invocation,
context management, observability.
In this tutorial you will:
- Write a minimal agent manifest (3 fields) and run it
- Add tools via an overlay — no manifest duplication
- Stack a skill overlay on top
- Stack memory + event tracing — see the agent think in real time
- Explore the CLI modes
- Serve the agent card via HTTP
- Inspect the trace: raw
events.jsonl→ trajectory / multilevel plots
Each step adds an --overlay to the CLI command — the base agent.yaml
never changes.
Every step builds on the previous one — you can stop at any point and have a working agent.
Step 1 — The minimal manifest¶
This is agent.yaml — the base manifest for the entire tutorial:
apiVersion: mas/v1
kind: Agent
metadata:
name: qa-agent
spec:
description: "Answer general knowledge questions."
context:
intent: "Answer general knowledge questions."
role: |
Answer questions clearly and concisely.
That's it. Everything else has sensible defaults:
| Field | Default | Meaning |
|---|---|---|
spec.models[0].model |
gpt-4 (overridden by flavour) |
Main LLM — works without flavour; flavour overrides |
spec.design_pattern.type |
react |
ReAct loop — up to 25 steps per turn |
spec.context_manager.type |
sliding-window |
Keep last 20 turns in context |
Key insight: The manifest is a specification, not a configuration file. It declares the agent's capabilities and intent. The runtime resolves how to execute it based on the active flavour (local, mock, cloud…).
How the runtime connects the manifest to an LLM¶
Four layers separate concerns:
agent.yaml flavour/local.yaml infra/ .env.local
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐
│ WHAT │ │ HOW │ │ WHERE │ │ SECRET │
│ model │ │ protocol │ │ model serving │ │ API key │
│ context │ │ plugins │ │ tool providers│ │ value │
│ tools │ │ → infra ref │ │ OTel collector│ │ │
└──────────┘ └──────────────┘ └──────────────┘ └──────────┘
(agent identity) (deployment profile) (providers) (personal)
- Agent — what it does: model (
gpt-4o-mini), context, tools, plugins - Flavour — how it's deployed: protocol + plugins (local in-process vs remote agent-remote). Infra refs come from workspace/CLI, not the flavour
- Infrastructure — where services are: model proxy endpoint, tool providers, OTel collector
- Credentials — the secret value (API key) in
.env.local, never committed
The model name in the manifest (gpt-4o-mini) is a canonical name — it never
changes regardless of which cloud backend is used. Infrastructure maps it to the
wire name (gpt-4o-mini, vertex_ai/..., etc.) at startup.
Tools and skills are resolved by proximity: the runtime looks for them next to
the manifest first, then falls back to the standard library.
This tutorial ships two flavours — same agent, different postures:
# flavours/local.yaml — development
spec:
protocol:
mode: local # in-process — no network calls
telemetry:
backend: file # lightweight file-based traces
path: logs/
# flavours/prod.yaml — production-like
spec:
agent_comm:
protocol: local # in-process delegation (MAS topology)
mode: remote
tools:
remote_tools_enabled: true # tools served via tool servers
allowed: [web-search, calculator, memory-search]
telemetry:
backend: otlp # OpenTelemetry (vs file)
observability:
trace_content: false # don't log prompt content
The agent manifest is identical in both cases — only the deployment changes.
Before a live LLM run, set your API key (Tutorial 0 §4):
# At repo root or tutorial directory
echo 'OPENAI_API_KEY=sk-…' >> .env # gitignored — never commit
source .env
With default_infra: standard:production in $XDG_CONFIG_HOME/mas/config.yaml, you do not
need --infra-ref on every command. Use --flavour mock for offline runs.
Run it¶
First, validate the manifest against the schema:
cd docs/tutorials/01-building-an-agent
mas-ctl validate agent.yaml
Then run a single query:
mas-ctl chat agent.yaml -v -q "What is the capital of France?"
Or start an interactive REPL (type quit to exit):
mas-ctl chat agent.yaml -i
What happened¶
The runtime loaded the manifest, resolved the local flavour, loaded
infrastructure (model provider, tool provider), instantiated the default
ReAct design pattern, and ran a single-turn LLM call. No tools were declared
yet, so the ReAct loop short-circuited to a direct answer.
Step 2 — Adding tools via overlay¶
Instead of editing agent.yaml, we overlay new capabilities on top.
An overlay is a partial manifest that merges into the base — only the fields
you declare are changed; everything else stays. The --overlay flag applies it:
Validate the overlay first:
mas-ctl validate agent.yaml -o overlays/tools.yaml
Then run:
mas-ctl chat agent.yaml -v \
-o overlays/tools.yaml \
-q "If Tokyo has 14 million people and Paris has 2.1 million, how many times larger is Tokyo?"
overlays/tools.yaml adds tools and updates the role prompt (spec.context.role):
# overlays/tools.yaml
spec:
context:
role: |
Answer questions clearly and concisely.
When you need to perform calculations, use the calc tool.
When you need current information from the web, use the web-search tool.
tools:
- web-search
- calculator
Tools are referenced by semantic name — web-search, calculator. The
runtime binds them to actual Python modules or tool servers at startup; the
agent manifest never references implementation details.
The ReAct loop now has two tools. With -v you see the exchange:
[qa-agent] AGENT -> TOOL (calculator): {'expression': '14000000 / 2100000'}
[qa-agent] TOOL (calculator) -> AGENT: {'result': 6.666...}
Step 3 — Stacking a skill overlay¶
Skills are markdown documents that give the agent domain expertise.
They are referenced by semantic name — just like tools — and resolved
via proximity search (skills/ next to the manifest → mas.library.standard/skills/).
# overlays/skills.yaml
spec:
skills:
- answer-formatting
The runtime finds skills/answer-formatting/SKILL.md automatically.
No skills_dir needed.
Stack it on top of the tools overlay with a second --overlay:
mas-ctl validate agent.yaml \
-o overlays/tools.yaml -o overlays/skills.yaml
mas-ctl chat agent.yaml -v \
-o overlays/tools.yaml \
-o overlays/skills.yaml \
-q "Who is the current president of the United States?"
Overlays are applied left-to-right. Each one merges into the result of the previous merge. The base
agent.yamlis never modified. Fields declared in a later overlay win over earlier ones; list fields (tools, skills) are appended.
The agent now follows the formatting rules from
skills/answer-formatting/SKILL.md — you'll see the structured answer
format with confidence indicator.
Step 4 — Adding memory¶
Memory is a resource, not a tool. The agent accesses it through two paths: proactive context injection (RAG) and reactive tool calls. This tutorial covers the practical setup end-to-end.
Like tools and skills, memory is declared by name — the runtime resolves the right plugin bundle:
# overlays/memory.yaml
spec:
memory: semantic # ephemeral (in-memory) — no filesystem dependency
tools:
- memory-search # exposes memory_search, memory_get, memory_store
Two memory modes are available:
| Name | Storage | Use when |
|---|---|---|
semantic |
In-memory (:memory:) |
Tests, benchmarks, CI — each run starts clean |
semantic-persistent |
$XDG_DATA_HOME/mas/memory/{agent_id}.sqlite |
Interactive demos where memory must survive across sessions |
Both wire the same SemanticMemoryPlugin — hybrid search (vector + FTS),
proactive RAG injection into the user message, and the memory-search tool
provider (memory_search, memory_get, memory_store). The only
difference is whether the SQLite database lives on disk or vanishes when
the process exits.
The demo below uses ephemeral memory (semantic) — Run 1 shows HITL
correction within a session, Run 2 uses a seed overlay to pre-populate
memory at startup.
Stack all three overlays:
mas-ctl validate agent.yaml \
-o overlays/tools.yaml -o overlays/skills.yaml \
-o overlays/memory.yaml
mas-ctl chat agent.yaml -v \
-o overlays/tools.yaml \
-o overlays/skills.yaml \
-o overlays/memory.yaml \
-q "What is the current price of Apple?"
The -v flag prints each agent operation to stderr (LLM calls, tool
round-trips, latencies):
[qa-agent] AGENT -> TOOL (web_search): {'query': 'current price apples'}
[qa-agent] TOOL (web_search) -> AGENT: {'results': [...]} (1205ms)
Run 1 — HITL: Ambiguous question → Correction + Memory store¶
Memory is ephemeral by default (in-memory SQLite) — each process starts clean with no filesystem dependency.
mas-ctl chat agent.yaml -v \
-o overlays/tools.yaml \
-o overlays/skills.yaml \
-o overlays/memory.yaml \
-q "What is the current price of Apple?" \
-q "No, I meant Apple the fruit, not the stock. Remember this for next time."
Turn 1 — The ambiguous question (empty memory):
You: What is the current price of Apple?
The agent interprets "Apple" as the stock (AAPL) — common LLM behaviour with no prior context:
🤖 llm call → gpt-4o-mini (2 messages)
🔧 tool call → web_search({'query': 'current price of Apple stock'})
✓ web_search → {'summary': 'Stock Price - Apple: ... $258.83 ...'}
Agent: The current price of Apple Inc. (AAPL) stock is approximately $258.83.
Turn 2 — The user corrects and the agent stores to semantic memory:
You: No, I meant Apple the fruit, not the stock. Remember this for next time.
The agent calls memory_store to record the correction:
🔧 tool call → memory_store({
'content': "User prefers information about Apple the fruit ...",
'source': 'user_preference'
})
✓ memory_store → {'status': 'stored', 'doc_id': '3b7c6427...'}
Agent: Got it! I'll remember that you're asking about apple the fruit.
Since memory is ephemeral, the stored preference vanishes when the process exits. Run 2 shows how to pre-seed that knowledge into a fresh process.
Run 2 — Pre-seeded memory, single shot (new process)¶
A seed overlay pre-populates semantic memory at startup — same effect as if the agent had learned the preference in a previous session:
# overlays/memory-seed.yaml
spec:
memory_seed:
- source: user_preference
content: >
User always refers to the fruit when asking about 'Apple',
not the stock.
Stack it on top of the memory overlay:
mas-ctl chat agent.yaml -v \
-o overlays/tools.yaml \
-o overlays/skills.yaml \
-o overlays/memory.yaml \
-o overlays/memory-seed.yaml \
-q "What is the current price of Apple?"
The SemanticMemoryPlugin proactively injects the seeded preference into
the user message before the LLM sees it:
→ LLM sees: "[Relevant memories — apply these before answering]
- User always refers to the fruit when asking about 'Apple' ...
What is the current price of Apple?"
🔧 tool call → web_search({'query': 'current price of Apple fruit'})
Agent: The current price of apple fruit is approximately $1.56 per pound.
The proof: same question as Turn 1 of Run 1, opposite answer — in a brand new process with zero conversation history. The seeded memory resolved the ambiguity on the first turn. Without the seed overlay, the agent defaults to stock (as seen in Run 1).
For cross-session persistence (memory survives across processes), use
memory: semantic-persistent— stores at$XDG_DATA_HOME/mas/memory/{agent_id}.sqlite. See the memory modes table above.For deeper implementation details on runtime behavior and manifests, see the local docs listed in the "Going further" section below.
Step 5 — Running via CLI¶
You've already used -q for single queries and multi-turn scripted
conversations. Here are the other modes:
Interactive REPL¶
mas-ctl chat agent.yaml -i \
-o overlays/tools.yaml \
-o overlays/skills.yaml
Tool exchanges are printed automatically in interactive mode.
Single query with pipe¶
echo "What is the GDP of France?" | mas-ctl chat agent.yaml -v
With a different flavour¶
# Offline / no API key needed — cached LLM responses from artifacts/llm_cache.json
mas-ctl chat agent.yaml -i -o overlays/mock-llm.yaml
# Production-like: agent-remote transport, tool servers, OTel telemetry
mas-ctl chat agent.yaml -i # use workspace flavour / infra for prod-like runs
Flavour names are resolved by proximity — --flavour prod finds
flavours/prod.yaml next to the manifest. mock is a built-in
flavour that serves responses from a SHA-256-keyed JSON cache
(artifacts/llm_cache.json), so you can run the tutorial without
network access.
The agent manifest is the same in all cases — only the deployment posture changes.
CLI-only agent (no YAML)¶
You can also add tools, skills, memory, and plugins directly from the command line — useful for quick experiments without writing any YAML files:
mas-ctl chat agent.yaml -v -i \
--tool web-search \
--tool calculator \
--skill answer-formatting \
--memory semantic \
--plugin mas.runtime.plugins.event_log_plugin:EventLogPlugin
This is equivalent to stacking overlays/tools.yaml, overlays/skills.yaml,
and overlays/memory.yaml — plus an EventLogPlugin for emoji-prefixed
event tracing (▶, 🔧, ✓, ◼).
| Flag | Effect |
|---|---|
--tool NAME |
Add a tool by semantic name (repeatable) |
--skill NAME |
Add a skill by name (repeatable) |
--memory NAME |
Enable a memory backend (semantic, semantic-persistent) |
--plugin MOD:CLS |
Load a plugin by module path and class (repeatable) |
--set KEY=VALUE |
Set a spec.context key (repeatable) |
CLI flags are applied after all --overlay files, so they can override
or extend any overlay.
Overlay progression¶
One base manifest, overlays stacked with --overlay:
| Step | Command | What's new |
|---|---|---|
| 1 | agent.yaml alone |
Minimal: context.intent + context.role |
| 2 | + overlays/tools.yaml |
+ tools: web-search, calculator |
| 3 | + overlays/skills.yaml |
+ skills: answer-formatting |
| 4 | + overlays/memory.yaml |
+ memory resource (plugin) + context injection + memory-search tool |
The file agent-final.yaml shows what the runtime sees after merging all four
overlays — a single assembled view for reference. Validate it:
mas-ctl validate agent-final.yaml
Scenario YAML and automated checks¶
This tutorial ships demo/scenario.yaml — a structured walkthrough with
commands and expected exit codes. CI runs the offline steps automatically:
# From repository root
pytest tests/tutorials/test_scenario_commands.py -v -k tuto-01
Live mas-ctl chat steps need TUTORIAL_ONLINE=1 and a configured LLM (Tutorial 0).
Key takeaways¶
- Spec-first: define what the agent does, not how
- Defaults matter: ReAct, sliding-window, model — you get a capable agent with 3 fields
- Overlays compose:
--overlaystacks features without duplication - Tools by name: the agent says
web-search; the runtime binds it to a Python module or tool server — the agent manifest never sees implementation details - Skills are markdown: domain knowledge injected into the system prompt
- Memory is a resource, not a tool: two access paths (proactive RAG injection +
memory_searchtool) share the same plugin - Flavours separate deployment from identity:
local(in-process, file traces) vsprod(gRPC, tool-server, OTel) with zero manifest changes - CLI flags: same manifest —
-qfor scripted queries,-ifor interactive REPL
Step 7 — Inspecting the trace¶
Every agent run writes structured events to events.jsonl (configured by the
flavour's telemetry.path). This is raw observability data — let's turn it
into something useful.
7a — The raw event stream¶
After running the agent with the memory overlay (Step 4), look at the log:
cat logs/events.jsonl | head -5
{"event": "llm_call_start", "agent": "qa-agent", "timestamp": "2026-04-19T14:22:01Z", "trace_id": "abc123"}
{"event": "tool_call_start", "tool": "web-search", "args": {"query": "square root of 729"}, "agent": "qa-agent"}
{"event": "tool_call_end", "tool": "web-search", "duration_ms": 120}
{"event": "llm_call_end", "agent": "qa-agent", "tokens": {"in": 450, "out": 82}}
{"event": "session_end", "agent": "qa-agent", "total_tokens": 532}
Each line is a self-contained JSON event with the agent name, timestamp,
trace ID, and event-specific payload. The dp_state field (when present)
captures the design pattern's internal state — which ReAct iteration,
think/act/observe phase, and reasoning text.
7b — Trajectory diagrams¶
Knowledge-graph normalization (
mas-lab graph normalize, Neo4j push, structural validation) is not part of this open-source repository. OSS tutorials plot trajectories directly fromevents.jsonl.
Render the event stream as a visual trajectory:
# Mermaid (text — paste into any Mermaid renderer)
mas-lab plot trajectory logs/events.jsonl --format mermaid
# SVG file
mas-lab plot trajectory logs/events.jsonl --format svg -o output/trajectory.svg
# Interactive HTML
mas-lab plot trajectory logs/events.jsonl --format html -o output/trajectory.html
sequenceDiagram
actor User
participant qa-agent
User->>qa-agent: What is the square root of 729, and is the result prime?
qa-agent->>qa-agent: 🔧 calculator(sqrt(729)) → 27
qa-agent-->>User: The square root of 729 is 27. 27 is not prime (3 × 9).
This is what you'll automate in Tutorial 3 — experiments define pipelines that extract trajectories and generate plots for every run automatically.
Going further¶
These docs are not required to complete this tutorial — read them when you want to understand the underlying mechanics:
- Glossary — manifest vocabulary
- User guide — install and CLI workflows
- Contributing — extension points
Teaching notes (optional)¶
If you present this tutorial (~20 min), a useful slide arc:
- Motivation — declarative agents vs imperative frameworks
- Minimal manifest — three fields and sensible defaults
- Tool declaration —
kind: Toolspec vs Pythonimpl - Live ReAct demo
- Progressive enrichment — tools → skills → memory
- Agentic memory — Apple ambiguity example across two runs
- Contracts and control — kernel ingress/egress (see Mealy guide)
- CLI vs API — same manifest,
--serve-oasf - Teaser — multiple agents (Tutorial 2)
Next¶
→ Tutorial 2: Creating a Multi-Agent System — compose a 4-agent trip planner MAS with tool/skill catalogs, routing topology, and overlay-driven topology switching.