Appearance
Prototype Scope
The prototype proves the operating model end-to-end on one representative workflow — bulk card spending-limit updates — with a real chat interface, LLM pipeline, async execution engine, and observability layer.
Tech stack
| Layer | Choice | Why |
|---|---|---|
| Frontend | Next.js 16 + React 19 + Tailwind 4 | App Router with real-time Convex hooks; dark-mode design system built on CSS tokens |
| Backend / DB | Convex 1.35 | Real-time subscriptions replace polling; mutations, queries, and scheduled actions in one runtime |
| LLM | OpenAI (gpt-4o, text-embedding-3-small) | Structured output mode for intent extraction; embeddings power semantic KB search |
| Schema validation | Zod 4 | Single source of truth for intent shape, shared between LLM output and Convex mutations |
What has been built
Chat interface
- Natural-language input with five example prompts for quick-start
- User bubbles are editable inline — click to revise and re-run without losing thread context
- Per-response retry button re-runs the original request with a new idempotency key
- Thumbs up / down feedback on every AI response (answer and bulk op) — wired to the metrics store
LLM pipeline (interpreter)
- Single system prompt classifies input as either a
questionor abulk_oprequest - RAG: top-k semantic search against embedded KB articles (OpenAI
text-embedding-3-small, 1536-dim) injects relevant policy context into the prompt before calling the LLM - Structured output (Zod schema) extracts
intent,targetGroup,newLimit,notifyCardholders - Unknown or unsupported intents surface a graceful "not supported" thread entry rather than silently failing
- KB gap logging: questions with low-confidence KB matches are recorded as
kb_gapmetric events
Policy validation
bulk_update_card_limitonly —bulk_freeze_cardsandbulk_notify_cardholdersare recognised but not yet wired end-to-end- Frozen and cancelled cards excluded automatically (policy rule P6)
- Operations affecting > 25 eligible cards flagged
approvalRequired: true(P4) - Hard cap: 200 cards max per operation (P5); max SGD 5,000 per-card limit
- Exclusion reasons and policy notes stored on the job record and surfaced in the confirmation UI
Confirmation screen
- Shows target group, new limit, total resolved, eligible, excluded (with per-card reasons), and approval flag
- Idempotency key prevents duplicate jobs from double-submission or rapid retry
Async execution engine
confirmJobfans out onejob_itemrecord per eligible card, then schedules each item as a Convex scheduled action- Staggered start times (random 500–3 000 ms) simulate realistic API call pacing
- Deterministic mock card API: 18% transient failure rate, 100% permanent failure for compliance-locked cards
- Exponential backoff retry: 1 s → 2 s → 4 s, max 3 attempts; retryable vs. permanent failure distinction enforced by
failureCode
Job lifecycle controls
- Cancel: marks all
queueditemscancelled; in-flight items run to completion - Retry failed: creates a scoped re-run targeting only
failed_retryableitems; original job record unchanged - Job terminal states:
completed,completed_with_failures,cancelled,failed - Item states:
queued → in_progress → succeeded | failed_retryable | failed_permanent | cancelled | skipped
Real-time progress
- Convex subscriptions push live counts to the job progress component — no polling
- Live counts: succeeded · failed · retrying · cancelled · remaining queued
Job detail page
- Per-item status table with colour-coded badges
- Per-card runbook panel: freeze / unfreeze, block (with confirmation guard), report fraud, list recent transactions — all wired to Convex mutations against the mock card store
Metrics dashboard (/metrics)
- Stats cards: total jobs, AI draft acceptance rate, thumbs-up / thumbs-down counts
- Top-5 KB gap table: questions where the KB returned no strong match, ranked by frequency
- Data sourced from append-only
metrics_eventstable (last 1 000 events)
KB article store
- Convex table with OpenAI embeddings, vector index (
by_embedding, 1 536 dims) - Ingestion script reads from
datasets/reap-help-center.jsonl; re-ingestion is idempotent onarticleId
Still mocked / out of scope
The prototype does NOT:
- Call real Reap APIs — card service is a seeded Convex table; no real card is ever modified
- Send real notifications —
notifyCardholdersis stored but no email or SMS is dispatched - Enforce authentication or RBAC — all users treated as admin; approval gate is flagged but has no enforcement UI
- Wire
bulk_freeze_cards/bulk_notify_cardholdersend-to-end — recognised by the LLM but execution path is not implemented - Export failure reports as CSV — planned but not yet built
- Support multi-region or multi-currency policy variations — single flat ruleset
- Use durable workflow orchestration (Temporal) — Convex scheduled actions are sufficient for the prototype; durability gaps remain under extreme failure scenarios
Iteration roadmap
| Iteration | Focus | Key additions |
|---|---|---|
| v0 — Prototype ✓ | One workflow end-to-end: chat → intent → policy check → confirmation → async execution → retry / cancel | Proves the operating model. LLM pipeline with RAG, real-time job progress, deterministic failure simulation, per-card runbooks, metrics dashboard. |
| v1 — Reusable framework | Same job lifecycle for multiple operation types | Wire bulk_freeze_cards and bulk_notify_cardholders end-to-end. Add RBAC enforcement. Implement CSV failure export. Add approval workflow UI for flagged jobs. |
| v2 — Policy-aware copilot | Richer policy retrieval; better block / approval explanations; escalation notes | Connect to a real policy registry. Add KYC hold rules, regional carve-outs, approval expiry. LLM explains policy blocks in plain language. Draft customer-facing comms for successful operations. |
| v3 — Selective auto-resolution | Auto-resolve only well-understood, policy-safe, low-volume operations with historical metrics validating safety | Auto-approve routine operations (<10 items, no policy conflicts, operation type with >99% historical success rate). Escalate only exceptions. Requires 90 days of v2 data to establish baselines. |