AI Lab
Experiments and prototypes.
A working portfolio of small, opinionated explorations in what enterprise AI agents need to act reliably: the data they draw on, the tools they act through, and the guardrails around them. Each one starts from a product question and is built as the cheapest artifact that makes the answer legible. Each has a writeup of what it was testing and what fell out, and most have a clickable demo.
-
Case study
Tax Policy Navigator
What it takes to make a generative answer about UK tax trustworthy enough that a citizen could act on it.
A case study in product judgment for grounded AI in a regulated domain. Why refusal has to be a designed surface, why a citation has to reach down to the individual claim, and where a machine judge stops being enough and a human has to take over. Built and evaluated end to end over the full HMRC Employment Income Manual. The hard problems turned out to be product problems, not retrieval problems.
AI product · Grounded generation · Evaluation · Regulated domains · Trust and safety
-
Live
AgentScope
An interface prototype for a developer-facing harness that makes a multi-agent run legible at a glance, including when it fails.
A clickable prototype that visualises a tree of agent runs, each with their own steps, context window, and cost share, on one screen. Three switchable mock runs demonstrate the design across different shapes.
Interface prototype · Developer tools · Multi-agent · Observability
-
Live
Data Discovery Agent
An interface prototype for a financial-data catalogue where every recommendation arrives with its lineage, entitlement status, and monthly cost.
A clickable prototype that answers a question most chat-with-data demos refuse to ask. How should a regulated-data platform expose an agent surface when picking the wrong dataset can mean wrong answers, compliance breaches, or runaway cost?
Interface prototype · Product design · Data catalogues · Compliance