CapableDeputy - Skeptical Engineering

A structurally secure runtime for personal AI agents. A faithful implementation of recognized security models—a reference monitor, an information-flow lattice, and the object-capability model—with the LLM treated as an untrusted component outside the trusted computing base.

🔗 GitHub Repository

Overview

A capable deputy, never a confused one.

Most defenses for LLM agents are perimeter classifiers—they try to detect a bad instruction ("does this look like an attack?"). A classifier can always be fooled. CapableDeputy takes the opposite approach: enforce recognized information-security models at the architecture level so the bad outcome has no reachable path—independent of why the model misbehaves, whether a crafted injection, a hallucination, a buggy or malicious tool, or the user's own mistake. The classifier can fail; the capability check cannot.

Every action the agent takes flows through one deterministic capability and information-flow chokepoint, escalates to programmatic execution when the stakes warrant, and surfaces every cross-compartment data flow through a human-auditable approval gate.

How It Works

Because the guarantees come from the models rather than from per-attack rules, a range of risks is structurally mitigated as a consequence of the design rather than as individually-coded features:

Silent data exfiltration - Information-flow taint blocks egress of data a session has read (Denning lattice)
Confused-deputy / ambient-authority abuse - No authority without an explicit, scoped capability (object-capability model)
Over-broad delegated authority - Monotonic attenuation across delegation chains
Unauthorized irreversible actions - Human-in-the-loop approval (Clark-Wilson separation of duty)
Conflict-of-interest data mixing - Cross-compartment access rules (Brewer-Nash)
Prompt-injection-driven misuse - Mitigated as one special case: the model is treated as untrusted regardless of how it was subverted

The design draws on classical security models—Bell-LaPadula, Biba, Brewer-Nash, Clark-Wilson, and the object-capability model—synthesized with the dual-LLM and programmatic-execution patterns from CaMeL (Google DeepMind) and Dromedary (Microsoft). These are tracked frameworks, not loose inspiration: every enforcement mechanism traces to a named model, with any deliberate deviation recorded.

Scope

CapableDeputy is deliberately narrow. It is a runtime control at the intersection of InfoSec, Data & Privacy, and AI governance—it defends the conjunction those three programs structurally cannot: sensitive data, untrusted input, and capable action converging in one agent (the "lethal trifecta"). It is deep on agentic-effect containment, human oversight, and decision accountability, and silent by design on model accuracy, bias, and content safety.

Philosophy

The right way to make an AI agent safe is not to ask the model to behave—it's to build a structure in which misbehavior has nowhere to go. Skeptical of the model by construction, accountable by an append-only provenance record, and boring in exactly the way security infrastructure should be.