Building Software with Hermes Agent

Introduction

Hermes Agent v0.13.0 is an open-source AI agent framework by Nous Research. It runs a tool-calling loop against any OpenAI-compatible API, with a plugin system for memory, model providers, messaging gateways, and multi-agent orchestration. The agent has 75 built-in tool implementations (file system, terminal, web search, MCP clients), a SQLite session store with FTS5, a cron scheduler, a kanban plugin for multi-agent dispatch, and 26 configured profiles for different operational roles. The core loop is 12,000 lines in a single Python file (run_agent.py).

The Core Loop

The agent loop is synchronous and straightforward:

while (api_call_count < self.max_iterations and self.iteration_budget.remaining > 0) \
        or self._budget_grace_call:
    if self._interrupt_requested: break
    response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
    if response.tool_calls:
        for tool_call in response.tool_calls:
            result = handle_function_call(tool_call.name, tool_call.args, task_id)
            messages.append(tool_result_message(result))
        api_call_count += 1
    else:
        return response.content

Messages follow the OpenAI format (system/user/assistant/tool). Each tool call result is appended to the message list, and the model decides whether to take another action or respond. The default max is 120 tool iterations per conversation, configurable per session.

This design means the agent can chain arbitrarily long sequences of tool calls — reading files, running commands, searching the web, writing code — within a single conversation turn. There is no separate "planning" phase. The LLM plans implicitly by choosing which tool to call next, and the loop terminates when it judges the work is complete.

Tool System and MCP

Tools are registered at import time via a central registry (tools/registry.py). Each tool file calls registry.register() on load, and model_tools.py discovers them by importing the entire tools/ directory. The resulting tool schemas are sent to the LLM alongside the conversation history.

The 75 built-in tools include:

Category	Tools
File I/O	`read_file`, `write_file`, `patch`, `search_files`
Shell	`terminal`, `process` (background lifecycle)
Web	`web_search`, `web_extract`, `browser_*` (8 tools)
Memory	`memory` (persistent store), `session_search` (FTS5)
Skills	`skill_view`, `skill_manage`, `skills_list`
Agents	`delegate_task` (subagent spawning)
Cron	`cronjob` (create/list/update/remove/run)
Messaging	`send_message`, `clarify`
Vision	`vision_analyze`, `image_generate`
MCP	Native MCP client (stdio + HTTP servers)

The native MCP client is significant. MCP servers are defined in config.yaml and their tools are automatically merged into the agent's tool list at startup. In practice this means any service that exposes an MCP interface becomes a tool the agent can call: querying DapStack tickets, reading kanban boards, searching timeline data, inspecting marimo notebooks. The agent doesn't need a custom integration for each backend — it just needs an MCP server definition.

Current MCP-backed integrations on the live instance:

DapStack (kanban/project management) — 70+ tools for ticket CRUD, sprint management, invoicing, time tracking, CRM
Timeline (quantified-self data) — sensor queries, full-text search across SMS/notifications/screenshots/OCR
Marimo (reactive notebooks) — cell inspection, linting, database queries

Profiles: One Agent, Many Roles

A profile is a named configuration with its own config.yaml, skills/, toolsets, and home directory. The agent resolves paths via get_hermes_home() which returns ~/.hermes/profiles/<profile>/ when a profile is active.

The live instance has 26 profiles:

deepseek   developer   gitvet   human   ingest   landgrid
local      mcp         mdt      nas     nullvec   nullvec-gating
obsidian   offshore-analyst     offshore-orchestrator    offshore-researcher
orchestrator   preperc   ptiles    qwen-local    reviewer
simple_timemachine_viewer   snac    steele-red   timeline   whimper

Each profile pins a model, a toolset, and a skill set. The steele-red profile uses the main model with web and file tools. The offshore-analyst profile uses a cheaper model and only terminal/file tools for lightweight data processing. The developer profile enables delegation for multi-agent coding sessions.

Profiles share the same codebase but diverge on cost, latency, and capability. The qwen-local profile runs against a local llama.cpp server on port 8086 for zero-cost inference on simple tasks.

Scheduled Work Without the Agent

Not everything needs LLM reasoning. The cron system supports a no_agent=True mode where the scheduler runs a script directly and delivers its output verbatim. Zero API calls, zero latency. This is used for watchdogs:

Disk usage alerts (df -h / | tail -1 | awk '{print $5}' — threshold check)
GPU temperature polling (nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
Process health checks (is the gateway running? is the timeline collector alive?)

The design rule: if the output shape is fixed and the decision is mechanical (over/under threshold), use no_agent=True. If there's any synthesis, summarization, or conditional logic, use the full agent loop.

For jobs that do need an LLM, the cron system supports chaining: job A's output is injected as context into job B's prompt. The arxiv-digest cron (8 AM daily) runs a web research agent and pipes results into a summary job that emails a curated reading list.

Kanban Orchestration

The most operationally complex pattern is the dispatcher: a recurring cron job that polls DapStack MCP for tickets in the "Ready" column and routes them to the appropriate profile for execution.

The pipeline:

Dispatcher cron fires every 1 minute
Queries all 10 projects (steele.red, timeline, ptiles, preperc, snac, whimper, nullvec, simple_timemachine_viewer, gitvet, gitkracked) for Ready-status tickets
If found, reads the ticket content, identifies the target profile from a project-to-profile mapping (dapstack_profiles.yaml)
Creates a cron job with that profile, the ticket description as the prompt, and deliver: "origin"
The spawned agent runs autonomously and posts results back to the dispatcher's channel

The routing map:

steele.red       -> steele-red profile
timeline          -> timeline profile
ptiles            -> ptiles profile
preperc           -> preperc profile

The dispatcher is gated on an auto-dispatch label. Tickets without the label stay in Ready but are never picked up by automation — they wait for a human to pull them.

What Breaks

The agent cannot verify its own actions. A subagent that runs write_file and reports "file written successfully" may have written to the wrong path, or the write may have failed silently. The parent agent can re-read the file to verify, but this is not automatic. Every subagent result is a self-report, not a verified fact. The mitigation is to require subagents to return verifiable handles (file paths, HTTP status codes, URLs) and have the parent check them.

Long context costs dominate. With 120 tool iterations and large tool schemas (75 tools, each with parameter schemas), the message list grows fast. A single debugging session can consume 500K input tokens. The MCP tools alone add ~50K of schema tokens up front because each MCP server's tool list is regenerated as a full schema on every connect. The current model (deepseek-v4-flash, 1M context) handles this, but cost and latency scale linearly with context size.

The profile system has no resource isolation. All profiles share the same filesystem, same user permissions, and same Docker daemon. An errant command in one profile can affect another. Landlock-based sandboxing exists as a reference implementation but is not wired into the profile system by default.

Kanban tickets need careful prompt design. The dispatch prompt must be fully self-contained because the spawned agent has no access to the original ticket thread. If the ticket says "fix the timeout issue" without specifying which service or what timeout threshold, the agent stalls. The solution is a dispatch prompt template that injects ticket description, project name, relevant skill names, and delivery target explicitly.

What Works

Persistent memory reduces repetition. The memory tool writes to two stores: "user" (profile — name, preferences, communication style) and "memory" (factual notes — environment details, project conventions, tool quirks). Both are injected at the start of every session. This eliminates the need to re-state project conventions, tool paths, or personal preferences across sessions. After six months of use, the memory stores are at ~5K chars each — compact enough to fit in the system prompt without dominating it.

Skills encode procedure, not facts. A skill is a markdown file with YAML frontmatter that describes a workflow: trigger conditions, numbered steps, exact commands, pitfalls, verification steps. Skills are loaded on demand and placed in the system prompt. This separates procedural knowledge (skills, loaded when needed) from factual knowledge (memory, always loaded). The distinction matters — skills for infrequent but standardized tasks (deploy to CloudFront, run a code review, back up configs) prevent the agent from hallucinating the steps, without bloating every session with instructions for tasks you do once a month.

MCP reduces integration surface. Adding a new backend (Timeline, DapStack, Marimo) required zero changes to the agent code — just an MCP server definition in config.yaml. The agent discovers the tools at startup and uses them through the same function-calling interface as built-in tools. This is the right abstraction for tool integration.

The dispatcher pattern eliminates sync overhead. The multi-agent kanban workflow means a human can file a ticket, tag it with auto-dispatch, and get results hours or days later without any real-time coordination. The agent works asynchronously across cron cycles. This is the pattern that makes the system more than a chatbot — it turns the agent into a persistent workforce that doesn't require the human to be present.

Configuration

The config.yaml that powers this instance:

model:
  default: deepseek-v4-flash
  provider: deepseek
  base_url: https://api.deepseek.com/v1
providers:
  deepseek:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-v4-flash:
        context_length: 1000000
agent:
  max_turns: 120
  gateway_timeout: 1800
  tool_use_enforcement: auto
toolsets:
  - hermes-cli

One model (deepseek-v4-flash, 1M context), one primary provider, no fallback. The 1M context window is the key enabler — without it, long tool-calling chains would hit context limits before completing complex tasks.