AI Lab
Experiments and prototypes.
A working portfolio of small, opinionated explorations in what enterprise AI products need to become reliable enough for regulated institutions to act on: the data agents draw on, the tools they act through, and the guardrails around them. Each one starts from a product question, refusal and evaluation, observability, entitlements and lineage and cost, and is built as the cheapest artifact that makes the answer legible. Each has a writeup of what it was testing and what fell out, and most have a clickable demo.
-
Prototype
ContextScope
AgentScope for agent belief state. An interface prototype that shows the conversation on the left and the agent's changing case state, as a turn-by-turn diff, on the right.
A clickable prototype for a causal debugger of agent state: what the agent believed, why it believed it, what conflicted, and what that enabled next. Where AgentScope makes a run legible, ContextScope makes the agent's belief legible. Built around the conflict a regulated operation has to handle well: the ledger says the card payment was authorised, the customer says it was not them.
Interface prototype · Product design · Agent observability · Governance
-
Case study
Tax Policy Navigator
What it takes to make a generative answer about UK tax trustworthy enough that a citizen could act on it.
A case study in product judgment for grounded AI in a regulated domain. Why refusal has to be a designed surface, why a citation has to reach down to the individual claim, and where a machine judge stops being enough and a human has to take over. Built and evaluated end to end over the full HMRC Employment Income Manual. The hard problems turned out to be product problems, not retrieval problems.
AI product · Grounded generation · Evaluation · Regulated domains · Trust and safety
-
Prototype
AgentScope
An interface prototype for a developer-facing harness that makes a multi-agent run legible at a glance, including when it fails.
A clickable prototype that visualises a tree of agent runs, each with their own steps, context window, and cost share, on one screen. Three switchable mock runs demonstrate the design across different shapes.
Interface prototype · Developer tools · Multi-agent · Observability
-
Prototype
Data Discovery Agent
An interface prototype for a financial-data catalogue where every recommendation arrives with its lineage, entitlement status, and monthly cost.
A clickable prototype that answers a question most chat-with-data demos refuse to ask. How should a regulated-data platform expose an agent surface when picking the wrong dataset can mean wrong answers, compliance breaches, or runaway cost?
Interface prototype · Product design · Data catalogues · Compliance