AI Lab

Experiments and prototypes.

A working portfolio of small, opinionated explorations in what enterprise AI products need to become reliable enough for regulated institutions to act on: the data agents draw on, the tools they act through, and the guardrails around them. Each one starts from a product question, refusal and evaluation, observability, entitlements and lineage and cost, and is built as the cheapest artifact that makes the answer legible. Each has a writeup of what it was testing and what fell out, and most have a clickable demo.

Prototype

ContextScope

AgentScope for agent belief state. An interface prototype that shows the conversation on the left and the agent's changing case state, as a turn-by-turn diff, on the right.

A clickable prototype for a causal debugger of agent state: what the agent believed, why it believed it, what conflicted, and what that enabled next. Where AgentScope makes a run legible, ContextScope makes the agent's belief legible. Built around the conflict a regulated operation has to handle well: the ledger says the card payment was authorised, the customer says it was not them.

Interface prototype · Product design · Agent observability · Governance

Read the writeup Open the prototype
Case study

Tax Policy Navigator

What it takes to make a generative answer about UK tax trustworthy enough that a citizen could act on it.

A case study in product judgment for grounded AI in a regulated domain. Why refusal has to be a designed surface, why a citation has to reach down to the individual claim, and where a machine judge stops being enough and a human has to take over. Built and evaluated end to end over the full HMRC Employment Income Manual. The hard problems turned out to be product problems, not model problems.

AI product · Grounded generation · Evaluation · Regulated domains · Trust and safety

Read the writeup
Prototype

AgentScope

An interface prototype for a developer-facing harness that makes a multi-agent run legible at a glance, including when it fails.

A clickable prototype that visualises a tree of agent runs, each with their own steps, context window, and cost share, on one screen. Three switchable mock runs demonstrate the design across different shapes.

Interface prototype · Developer tools · Multi-agent · Observability

Read the writeup Open the prototype
Prototype

Data Discovery Agent

An interface prototype for a financial-data catalogue where every recommendation arrives with its lineage, entitlement status, and monthly cost.

A clickable prototype that answers a question most chat-with-data demos refuse to ask. How should a regulated-data platform expose an agent surface when picking the wrong dataset can mean wrong answers, compliance breaches, or runaway cost?

Interface prototype · Product design · Data catalogues · Compliance

Read the writeup Open the prototype

Experiments and prototypes.

ContextScope

Tax Policy Navigator

AgentScope

Data Discovery Agent