CX AI Ops — Solution Design

Prototype Scope

The prototype proves the operating model end-to-end on one representative workflow — bulk card spending-limit updates — with a real chat interface, LLM pipeline, async execution engine, and observability layer.

Live demo GitHub

Tech stack

Layer	Choice	Why
Frontend	Next.js 16 + React 19 + Tailwind 4	App Router with real-time Convex hooks; dark-mode design system built on CSS tokens
Backend / DB	Convex 1.35	Real-time subscriptions replace polling; mutations, queries, and scheduled actions in one runtime
LLM	OpenAI (gpt-4o, text-embedding-3-small)	Structured output mode for intent extraction; embeddings power semantic KB search
Schema validation	Zod 4	Single source of truth for intent shape, shared between LLM output and Convex mutations

What has been built

Chat interface

Natural-language input with five example prompts for quick-start
User bubbles are editable inline — click to revise and re-run without losing thread context
Per-response retry button re-runs the original request with a new idempotency key
Thumbs up / down feedback on every AI response (answer and bulk op) — wired to the metrics store

LLM pipeline (interpreter)

Single system prompt classifies input as either a question or a bulk_op request
RAG: top-k semantic search against embedded KB articles (OpenAI text-embedding-3-small, 1536-dim) injects relevant policy context into the prompt before calling the LLM
Structured output (Zod schema) extracts intent, targetGroup, newLimit, notifyCardholders
Unknown or unsupported intents surface a graceful "not supported" thread entry rather than silently failing
KB gap logging: questions with low-confidence KB matches are recorded as kb_gap metric events

Policy validation

bulk_update_card_limit only — bulk_freeze_cards and bulk_notify_cardholders are recognised but not yet wired end-to-end
Frozen and cancelled cards excluded automatically (policy rule P6)
Operations affecting > 25 eligible cards flagged approvalRequired: true (P4)
Hard cap: 200 cards max per operation (P5); max SGD 5,000 per-card limit
Exclusion reasons and policy notes stored on the job record and surfaced in the confirmation UI

Confirmation screen

Shows target group, new limit, total resolved, eligible, excluded (with per-card reasons), and approval flag
Idempotency key prevents duplicate jobs from double-submission or rapid retry

Async execution engine

confirmJob fans out one job_item record per eligible card, then schedules each item as a Convex scheduled action
Staggered start times (random 500–3 000 ms) simulate realistic API call pacing
Deterministic mock card API: 18% transient failure rate, 100% permanent failure for compliance-locked cards
Exponential backoff retry: 1 s → 2 s → 4 s, max 3 attempts; retryable vs. permanent failure distinction enforced by failureCode

Job lifecycle controls

Cancel: marks all queued items cancelled; in-flight items run to completion
Retry failed: creates a scoped re-run targeting only failed_retryable items; original job record unchanged
Job terminal states: completed, completed_with_failures, cancelled, failed
Item states: queued → in_progress → succeeded | failed_retryable | failed_permanent | cancelled | skipped

Real-time progress

Convex subscriptions push live counts to the job progress component — no polling
Live counts: succeeded · failed · retrying · cancelled · remaining queued

Job detail page

Per-item status table with colour-coded badges
Per-card runbook panel: freeze / unfreeze, block (with confirmation guard), report fraud, list recent transactions — all wired to Convex mutations against the mock card store

Metrics dashboard (/metrics)

Stats cards: total jobs, AI draft acceptance rate, thumbs-up / thumbs-down counts
Top-5 KB gap table: questions where the KB returned no strong match, ranked by frequency
Data sourced from append-only metrics_events table (last 1 000 events)

KB article store

Convex table with OpenAI embeddings, vector index (by_embedding, 1 536 dims)
Ingestion script reads from datasets/reap-help-center.jsonl; re-ingestion is idempotent on articleId

Still mocked / out of scope

The prototype does NOT:

Call real Reap APIs — card service is a seeded Convex table; no real card is ever modified
Send real notifications — notifyCardholders is stored but no email or SMS is dispatched
Enforce authentication or RBAC — all users treated as admin; approval gate is flagged but has no enforcement UI
Wire bulk_freeze_cards / bulk_notify_cardholders end-to-end — recognised by the LLM but execution path is not implemented
Export failure reports as CSV — planned but not yet built
Support multi-region or multi-currency policy variations — single flat ruleset
Use durable workflow orchestration (Temporal) — Convex scheduled actions are sufficient for the prototype; durability gaps remain under extreme failure scenarios

Iteration roadmap

Iteration	Focus	Key additions
v0 — Prototype ✓	One workflow end-to-end: chat → intent → policy check → confirmation → async execution → retry / cancel	Proves the operating model. LLM pipeline with RAG, real-time job progress, deterministic failure simulation, per-card runbooks, metrics dashboard.
v1 — Reusable framework	Same job lifecycle for multiple operation types	Wire `bulk_freeze_cards` and `bulk_notify_cardholders` end-to-end. Add RBAC enforcement. Implement CSV failure export. Add approval workflow UI for flagged jobs.
v2 — Policy-aware copilot	Richer policy retrieval; better block / approval explanations; escalation notes	Connect to a real policy registry. Add KYC hold rules, regional carve-outs, approval expiry. LLM explains policy blocks in plain language. Draft customer-facing comms for successful operations.
v3 — Selective auto-resolution	Auto-resolve only well-understood, policy-safe, low-volume operations with historical metrics validating safety	Auto-approve routine operations (<10 items, no policy conflicts, operation type with >99% historical success rate). Escalate only exceptions. Requires 90 days of v2 data to establish baselines.

Prototype Scope ​

Tech stack ​

What has been built ​

Still mocked / out of scope ​

Iteration roadmap ​

Prototype Scope

Tech stack

What has been built

Still mocked / out of scope

Iteration roadmap