The fastest way to lose control of an agent program is to let agents accumulate as prompts. Someone writes a clever instruction, wires it to a tool, gets a good demo, and ships it. Six months later there are forty of these in production, each one a paragraph of text in a repository nobody audits, and no one can answer the basic governance questions: who owns this, what can it touch, what does it cost, and what happens when it is wrong.
The fix is a small one with a large effect. Every agent gets a manifest before it touches production. The manifest is the contract between the agent and the organisation, and writing it forces the decisions that otherwise surface as incidents.
What a skill manifest is
A skill manifest is a single declarative record that defines one agent as a governed object. It is not documentation written after the fact. It is the source of truth the runtime reads, the gateway enforces, and the security reviewer audits. If the manifest does not permit a tool, the agent cannot call it. If the manifest sets a cost cap, the gateway enforces it before the bill arrives.
The fields are deliberately boring, because boring is auditable:
- name and owner. A human or team accountable for this agent. Not “the AI project.” A name.
- workflow. The business process it serves, so the agent maps to something operations recognises.
- allowed users and roles. Who may invoke it, expressed in the same RBAC the rest of the estate uses.
- allowed tools. The exact tool surface, including the tools deliberately withheld. The absence of a write tool is a decision recorded in writing.
- input and output schema. The shape of what goes in and what comes out, so the agent’s contract is machine-checkable.
- knowledge sources. Which indexes and documents it may retrieve from, filtered permissions-aware so the agent sees only what the calling user may see.
- approval requirements. What a human must sign off, and at which step.
- risk tier. A classification that drives how tightly everything else is set.
- cost cap. A ceiling on cost per completed task, enforced at the gateway.
- logging. Where the traces land, so every run is reconstructable.
- eval tests. The regression suite that gates any change to prompt, model, retriever, or tool.
- failure policy and rollback. What the agent does when it cannot proceed, and how a wrong action is undone.
A worked manifest
Concrete beats abstract. Here is a manifest for a finance agent that resolves invoice exceptions in accounts payable.
name: invoice-exception-resolver
owner: finance-ap@customer
workflow: accounts-payable / exception handling
risk_tier: 2 # financial, reversible, human-in-loop
allowed_roles:
- ap-clerk
- ap-manager
allowed_tools:
- erp.read # look up POs, vendors, history
- erp.match # propose a 3-way match
- ticket.create # raise an exception ticket
# erp.post is intentionally NOT granted
input_schema: { invoice_id, vendor_id, amount, gl_hint? }
output_schema: { match_result, confidence, proposed_action, evidence[] }
knowledge:
- vendor-master # permissions-aware
- po-history
- ap-policy
approval: execute-with-approval (L3); ap-manager signs postings
cost_cap: 0.60 EUR per completed task
logging: trace-lake (OpenTelemetry GenAI conventions)
evals: 42 cases; promotion gate >= 0.95 match on held-out set
failure_policy: low confidence -> route to human queue, do not guess
rollback: void ticket, requeue to human, alert owner
Read what the manifest decides. The agent can read the ERP and propose a match, but erp.post is not in its tool list, so it physically cannot move money. That is the blast-radius decision made in writing rather than discovered in an incident. The 0.60 EUR cost cap is the finance defence, set before the first invoice runs. The permissions-aware knowledge sources answer “can this agent read something the invoking user cannot” with “no.” The eval gate makes promotion arithmetic rather than opinion. The rollback names exactly how a bad ticket is undone.
The risk tier sets everything else
The single most useful field is the risk tier, because it is the one that calibrates all the others. A tier is a short, honest classification of what happens when the agent is wrong: is the action reversible, does it touch regulated data, does it move money, does it reach a customer.
A practical tiering:
- Tier 0: read-only, internal. Wrong answers waste a person’s time and nothing else. Loose caps, light approval, fast iteration.
- Tier 1: reversible internal actions. Creating a ticket, drafting a record. Wrong actions are annoying and cheap to undo.
- Tier 2: reversible actions that touch money or customers, or read regulated data. Tighter caps, human approval, full replay required.
- Tier 3: irreversible or high-value actions, or anything an EU AI Act risk classification would flag. The manifest here is strict by default: narrow tools, low caps, mandatory approval, tested rollback, and no path to full autonomy without sustained evidence.
The tier is what stops governance from being uniform and therefore useless. You do not want the same approval ceremony on an internal FAQ agent as on an agent that issues refunds. The tier lets the FAQ agent move fast and the refund agent move carefully, from the same framework.
Four manifests across four functions
The manifest shape holds across departments. The fields stay the same; the values change with the work.
- Invoice Exception Resolver (Finance / AP). Reads the ERP, proposes three-way matches, raises exception tickets. No posting tool. Tier 2, human approval on anything that becomes a posting.
- Support Escalation Agent (Support). Reads ticket history and the knowledge base, drafts a resolution or routes to the right queue, can close a ticket it resolved. Tier 1 for routing, Tier 2 for any action that touches a customer-facing system.
- Contract Review Assistant (Legal). Reads contracts and the clause library, flags deviations from playbook, drafts redlines. It does not send anything externally. Tier 1, because every output is reviewed by counsel before it leaves the building.
- IT Service Desk Agent (IT). Reads the asset and identity systems, resets known-safe settings, creates change tickets for anything privileged. Tier 2, with privileged actions held behind approval and a tested rollback.
Same eleven fields. Different owners, tools, knowledge sources, and tiers. That sameness is the point: a reviewer who has read one manifest can read all of them, and a new agent is a form to fill in rather than a fresh argument to have.
Rule of thumb: if an agent in production cannot be described by a complete manifest, it is not governed, it is just deployed.
The manifest is what the gateway enforces
A manifest that is only a document is a wish. The reason it works is that the runtime and the AI gateway read it and enforce it. The allowed-tools list is the actual tool surface the agent is given, not a description of intent. The cost cap is a budget the gateway checks before the call. The knowledge sources are the only indexes the retrieval step will touch, filtered by the caller’s permissions. The eval suite is run automatically before any change to the agent is allowed to ship.
This is what makes the manifest the unit of change control. When someone wants to give the invoice agent the erp.post tool, that is not a code edit buried in a pull request. It is a change to the manifest, which raises the risk tier, which triggers the approval the higher tier requires, which reruns the eval suite, which lands in the audit log. The manifest turns “we gave the agent a new capability” from something that happens quietly into something that happens on the record.
Where the manifest pays off
Three moments, specifically.
At the security review, the manifest is the artifact that answers the pointed questions in one line each. What can this agent touch? The allowed-tools list. Can it read data the user cannot? No, the knowledge sources are permissions-aware. What does it cost? The cap. The review reads the manifests instead of interrogating the team.
At the promotion decision, the manifest holds the eval gate and the current rung, so moving an agent up the autonomy ladder is a check against named thresholds rather than a vote. The evidence and the bar live in the same place.
At the incident review, the manifest plus the trace answers “what was this agent allowed to do, and what did it actually do?” without anyone reconstructing intent from memory. The gap between the two is the finding.
An agent program that runs on manifests is slower to start one agent and far faster to run forty. The discipline is front-loaded into a form. Fill it in before production, enforce it at the gateway, change it on the record, and the agents stop being a pile of prompts and start being a governed fleet.