Prototype

Data Discovery Agent

An interface prototype for a financial-data catalogue where every recommendation arrives with its lineage, entitlement status, and monthly cost.

A clickable prototype that answers a question most chat-with-data demos refuse to ask. How should a regulated-data platform expose an agent surface when picking the wrong dataset can mean wrong answers, compliance breaches, or runaway cost?

Open the prototype ↗

Themes Interface prototype · Product design · Data catalogues · Compliance

The problem

A modern financial-data platform has thousands of datasets, hundreds of fields, several asset classes, and three problems compounding on top of the catalogue itself:

Provenance. Every dataset has a canonical source (Bloomberg PX_LAST, ICE Composite, MSCI ESG Manager) and a vendor. Pick the wrong one and your risk numbers don’t match the desk’s. Most catalogue UIs flatten this into a single field name and lose the chain of custody.
Entitlements. Half the most useful data is restricted: insider transactions, MNPI-flagged feeds, premium tiers behind extra approval. A platform that pretends restrictions don’t exist gets analysts in trouble. A platform that hides them in a separate compliance app gets ignored.
Cost. Adding ESG fields to a request can mean £8k more per month. The analyst who clicks “submit” on a 200-security request rarely sees the bill until the following quarter, when their team’s data budget is gone.

Today’s catalogue search treats all three as someone else’s problem. Bloomberg, Refinitiv, FactSet, ICE all ship variants of the same UI: a search box, a faceted filter, a results list. Lineage, entitlements, and cost live in separate apps you’re supposed to consult before each request. Almost nobody does.

Who it’s for

The named persona is the cross-asset analyst at a buy-side fund who needs to assemble a custom dataset by Friday’s risk meeting. They don’t have the entitlement matrix memorised. They don’t know which feed is canonical for their team’s compliance policy. They have a budget number from their PM and no good way to spend it without burning through it. The platform either tells them what they need to know, or doesn’t.

The product question

Should a financial-data platform expose an agent surface? And if so, what does the agent need to render alongside every recommendation, given that picking the wrong dataset can mean wrong answers, compliance breaches, or runaway cost? The first question is rhetorical at this point. The second is where every chat-with-data demo cuts the corner.

The artifact

This prototype is the answer made clickable.

No model, no API, no backend. The conversational surface is wired to a hand-built fixture set and a deterministic dispatcher. The point is the shape of the experience: what the agent surfaces alongside its recommendations, how it gates restricted queries, how it makes cost visible before commit.

How to look at it

Press Guided tour in the prototype’s header for a 90-second auto-driven walkthrough.
Or land cold and explore. Six starter prompts sit beneath the input on first arrival, including a deliberately restricted one (find insider trades) so you can see the compliance surface in action.

The right-hand panel persists every dataset preview and data request as artifacts the analyst can return to. State persists across refreshes.

What the user can do

Ask in plain language for products, datasets, securities, or fields.
Browse hierarchies, preview datasets, inspect securities by asset class.
Use a small SQL-shaped DSL (SELECT * FROM equities, SELECT AAPL, MSFT) when natural language is too coarse.
Compose a custom request through the conversational surface, with running monthly cost shown as you select fields.
Watch the agent refuse restricted queries (insider, MNPI, pre-trade) at the surface, with a named substitute and an entitlement contact, instead of silently failing or returning data that gets the analyst in trouble.

A few of the product calls behind the prototype

Lineage and cost are part of every recommendation, not a separate app. When the agent matches a product, the response names the canonical source, the vendor, the freshness, the entitlement tier, and the monthly cost in the same paragraph as the match. Bloomberg, Refinitiv, FactSet all fail at this. Showing it would land with anyone who has actually used those products.
Compliance lives at the agent surface, not as a gate around it. Restricted queries (insider transactions, MNPI flags, pre-trade signals) get a structured refusal that names the public substitute and the entitlement-approval path. The query is logged for audit. This is the move that separates a chat-with-data toy from a chat-with-data product a regulator can sign off.
Cost surfaced before commit, not after. The composer shows a running monthly cost as the analyst selects fields, plus a flag when the selection includes Premium-tier fields. If the cost gets large enough, the agent suggests a cheaper substitute (Standard reference data, ESG-Lite). Most platforms hide cost until invoice time. Surfacing it changes the product.
A small DSL, only where it earns its place. Natural language is the default. The DSL appears only in the securities composer where power users want it. Don’t make every user pay the syntax tax.
Sandbox vs production as a header-level affordance. Billable actions need a surface-level guard. The toggle lives next to the input the analyst reaches for, not buried in settings. Production-mode submissions surface an explicit cost warning.
Persistence is a product decision, not an engineering nicety. The prototype remembers what you previewed and composed across refreshes. A working analyst would expect that, and ignoring it would teach the wrong shape.
Focused on financial data; geospatial is out of scope. The earlier draft of this prototype carried a parallel geospatial domain. It demonstrated breadth, not depth, and diluted the thesis. The current surface is financial-only, with the depth (lineage, entitlements, cost) that a financial-data customer actually needs.

What this prototype taught me

Three things that transfer to any chat-with-data product over a regulated corpus:

Provenance, entitlement, and cost are first-class. When they sit in separate apps, analysts ignore them. When they sit in every recommendation, analysts read them. The shift is product, not technology.
Refusal as a feature, with a named substitute. A restricted query that returns “no results” trains the user to keep guessing. A restricted query that returns “this is in restricted set X; the public substitute is Y; here is the entitlement-approval path” turns refusal into a service. This pattern transferred directly into the refusal taxonomy of the tax-policy-navigator case study.
Domain focus beats domain breadth. An earlier draft carried a parallel geospatial domain. It demonstrated breadth, not depth, and diluted the thesis. Cutting it was the single most useful product call in the project.

What this isn’t yet

A live model behind the conversational surface, real entitlement metadata from a working access-control system, and real per-vendor monthly cost. The prototype tests the shape of the experience; productionising means wiring each of those services in. The shape choices, however, hold across that wiring.