v0.2deepagents runtime · playbooks · learnings · evals

The open framework for production-ready agents and agentic workflows.

Build, ship, and observe AI work in your own infra. Git-backed context. Typed plugins. MCP-native. Full observability — without the hosted-SaaS lock-in.

Get started Read the docs View source

deepagents runtime + subagentsPlaybooks, learnings, evals, budgetsMCP-native + 12 connectorsSelf-host · Apache 2.0

AI breaks when it leaves the demo.

Most teams get their first agentic workflow working by stitching prompts into app code, bots, cron jobs, and internal tools. Then things drift.

— prompts drift, outputs change, nobody knows why
— no evals — you ship and hope
— compute runs away with no per-agent budgets, no alerts
— approvals scattered across Slack threads and Notion docs
— every interface re-implements the same logic
— "does this still work?" becomes guesswork

Vocion gives you one runtime for AI work that has to hold up in production.

Five resources to author AI work.

One runtime to operate it.

Vocion stays small on purpose. These five resources are the authoring surface. Everything else is runtime.

Source

Connected systems that feed raw data in. Zoom, Gmail, HubSpot, Postgres, your own APIs. Typed and authored per tenant.

Object

The business entities you care about. Account, Deal, Ticket, Incident. Canonical grounded records every run reads from.

Operation

Typed LLM call. Zod schemas in, Zod schemas out. Approval-gated when it matters. Authored as prompt today, swapped to plugin tomorrow under the same slug.

Workflow

A sequence of Operations with human approval gates where it matters. Durable on Postgres. Resumable from any interface.

Agent

A named identity with a system prompt, tool surface, subagents, and budget. Runs on the deepagents runtime — virtual FS, write_todos, subagent dispatch, full observability.

Agents are optional. The runtime works just as well for deterministic reviewed workflows.

Two compositional primitives.

Authored once. Mounted into every relevant agent.

v0.2 added two primitives that compose on top of the five resources — for the procedural knowledge and continuous improvement that agentic systems need to stay accurate.

v0.2

Playbook

Markdown + YAML the agent reads on demand. Procedural guides for "how we draft a proposal", "how we triage a meeting." Resources (REFERENCE.html, COMPONENTS.md) ride along. Per-agent playbookTags decide what mounts where. Lazy-loaded — no bloat to the per-turn prompt.

v0.2

Learning

Whitelisted rule buckets ("global", "meeting_triage", "proposal_drafting"…). Rules are added at runtime by the self-improver subagent after the user explicitly approves a candidate. Trigram dedup at 0.72 keeps the store clean. The agent reads its applicable rules as /learnings/<step>.md on every turn.

One runtime, every interface.

Author once. Trigger and review from wherever your team already works. Speak MCP, and every Claude-side client can call your agents as tools.

Run from

webMCP serverSlackTeamsCLIyour own appscheduled jobsAPI triggers

What stays the same underneath

— context version
— workflow logic
— approvals
— audit trail
— trace spans
— output history

No more separate prompt stacks for each surface.

Connect what you already run.

Built for real business systems, not toy demos. Twelve first-class connectors today; typed source plugins when you need more control.

GmailHubSpotZoomSlackPostgresStripeZendeskGoogle DriveNotionSalesforceCustom RESTWebhooks

Starter connectors and source patterns first. Typed source plugins when you need more control.

The operating loop that makes agentic systems usable.

Most AI stacks stop at generation. Vocion ships the five primitives every production agentic system needs — human review, observability, evals, self-improvement, and compute budgets.

Human-in-the-loop

The request_human_review tool pauses a run for approval. Comments on Drive decks and Slack reactions flow into the same queue.

Full observability

Every LLM call, tool span, and subagent dispatch lands in Langfuse — joined to the context SHA that produced it.

Eval-driven development

npm run eval:run scores datasets via LLM judge. Stamp every run with its context SHA. Pass-rate < 0.8 fails CI.

Self-improving

The self-improver subagent watches feedback, proposes rules, and (after your explicit approval) commits them as learning rows the agent reads on every relevant turn.

Compute budgets

Token and dollar caps per agent, per period. Hard cap refuses new runs. Soft cap warns. Cache reads billed at 10% per the model card.

Review changes in PRs, not screenshots.

Every resource lives in git as YAML and markdown.

— edit operation.yaml, SKILL.md, or prompt.md
— commit the change
— review it in a PR
— apply it to the runtime
— run and review with a stamped context version

context/<org>/
  agents/
    sales-assistant/
      agent.yaml          # slug, prompt, subagents, suggestions
      system-prompt.md
  operations/             # v0.2: typed LLM calls (was skills/)
    draft_followup/
      operation.yaml
      prompt.md
      evals.yaml
  playbooks/              # v0.2: markdown the agent reads on demand
    ece-proposal/
      SKILL.md            # YAML frontmatter + procedural guide
      REFERENCE.html      # sibling resources ride along
  learnings/              # v0.2: whitelisted rule-step buckets
    global.yaml
    meeting_triage.yaml
  evals/                  # v0.2: agent eval datasets
    sales-assistant-baseline.yaml
  workflows/
    discovery_followup/
      workflow.yaml
  objects/
    deal/
      type.yaml

Same folder pattern across every resource: structured definition · LLM-facing content · evals · notes. Easy to author, easy to diff, easy to test.

Author → Apply → Run → Review → Audit.

One loop, every interface.

01
Author
Edit an operation.yaml, workflow.yaml, SKILL.md playbook, or prompt.md in your editor.
02
Apply
Reconcile authored context into the runtime and stamp a new context version.
03
Run
Trigger from web, Slack, Teams, CLI, your app, or a scheduled workflow.
04
Review
Drafts and paused workflows land in one queue. Approve, reject, revise, resume.
05
Log
Trace any output back to the exact context version, inputs, retrieval hits, and runtime path that produced it.

Start with a real business workflow, not a blank canvas.

Vocion ships best when you begin with something your team already does every week.

Sales follow-up with approval

Draft outbound follow-ups from CRM notes and call context, route them to review, keep a full trace of every message.

Explore

Support reply drafting

Turn inbound tickets into draft responses with human approval on edge cases and a complete audit trail.

Explore

Weekly business reporting

Generate structured updates from raw metrics, review before distribution, keep one history of every run.

Explore

All starter projects

Start with prompts. Graduate to code when the logic gets real.

Start fast with YAML and markdown. Move to typed plugins when the workflow needs stronger contracts, richer logic, or external actions.

This is not a throwaway prototype path. It is the intended upgrade path.

YAML and markdown for fast iteration
Typed plugins with Zod schemas via @vocion/sdk
Dispatch subagents — author once, mount as tools, run in parallel via task("name", "...")
Swap providers per Operation without rewriting the whole workflow
Preserve slug and history as you evolve

Built for engineers who want leverage and control.

Vocion is for teams that care about:

reproducibilityreviewabilitytyped boundariesruntime consistencyMCP-nativeoperational visibilityself-hosted deployment

Not just "agents."

Open source by default.

Vocion is Apache 2.0 and designed to run on your infrastructure.

— self-host anywhere
— use your own model providers
— bring your own retrieval stack
— run on your own Postgres
— keep your own deployment topology

Managed services can sit on top later if you want them. The framework does not depend on them.

Self-host guide View source

Need help shipping it in a real business?

MetaCTO uses Vocion to design and deploy production AI workflows for revenue teams, support orgs, operating teams, and internal platforms. If you want help implementing, hosting, or customizing it, work with the team behind the framework.

Framework first. Services if you want them.

Talk to MetaCTO

Build agents that survive production.

Subagents, playbooks, learnings, evals, budgets, HITL — out of the box. Your code, your infra, your data.

Get started Read the docs View source