Appearance
Key Design Decisions and Trade-offs
Decision 1: Bounded planner + deterministic executor
Chosen
AI extracts and explains. Deterministic services validate, approve, and execute.
Example: "Freeze all cards for the London office" → LLM extracts {team: "London", action: "freeze"}; deterministic service resolves the 50 matching card IDs from the card service; executor freezes each one. The LLM never sees or generates a card ID.
Rejected — Fully autonomous agent
Agent calls card APIs directly based on LLM reasoning. A single model error on a 50-card operation is a 50-card problem. Impossible to audit, hard to validate.
Failure case: LLM hallucinates card ID CARD-7731 (off by one digit from CARD-7732) — wrong customer's card is frozen. Operator doesn't notice until the customer calls. No audit trail shows where the ID came from.
Decision 2: Item-level tracking (not all-or-nothing transactions)
Chosen
Each card update is an independent item task with its own status. Partial success is reported clearly.
Example: Payroll batch for 47 employees — employee #47's transfer fails due to insufficient balance. The other 46 succeed and are not reversed. Operator retries just #47 after funds are available.
Rejected — Atomic rollback
All 50 cards succeed or the entire job rolls back. Impractical for distributed operations taking minutes. A total rollback on card 47 of 50 is worse for the operator than reporting 46 successes and 4 failures.
Failure case: Employee #47 causes a full rollback — 46 employees who expected to be paid are not, creating urgent complaints and a second full retry of the entire batch.
Decision 3: Async execution with immediate job receipt
Chosen
Return a job receipt immediately after confirmation; process all updates in background; push progress updates.
Example: Agent submits a 50-card freeze, receives JOB-8821 in <200ms, and can immediately start the next ticket while the freeze runs in the background. Progress bar updates every few seconds.
Rejected — Synchronous hold
Block the HTTP connection until all 50 cards update. Connection timeouts on any slow card kill the entire operation. The operator cannot do other work while waiting.
Failure case: Card #23 takes 35s due to a slow upstream call — browser times out at 30s, operation state is unknown, operator must manually check which cards were actually frozen.
Decision 4: Policy enforcement in deterministic rules engine
Chosen
Structured policy registry evaluated deterministically. LLM may retrieve and explain policy; it does not enforce it.
Example: Policy §3.1 blocks limit changes on KYC-held accounts. Rules engine checks this flag on every card before execution — regardless of what the LLM says. Compliance audit shows: "Policy §3.1 v2.4 enforced at 14:03:22 UTC."
Rejected — Policy in LLM system prompt
Include policy documents in system prompt and rely on model to apply them. Policy enforcement must be auditable and consistent. LLMs misinterpret nuanced rules and cannot be held accountable the same way code can.
Failure case: Policy has a regional carve-out (APAC accounts exempt from §3.1 under certain conditions). LLM misreads the carve-out and blocks 12 legitimate updates — or worse, allows 3 it shouldn't. Neither outcome is detectable without re-running the prompt.
Decision 5: Convex over a traditional server-side stack
Chosen
Convex as the unified backend — mutations, actions, scheduler, live queries, and vector search in one layer. Eliminates the FastAPI + Redis + Celery + Postgres assembly for the prototype while preserving the same logical architecture.
Example: Convex executor action processing card #31 is retried via ctx.scheduler.runAfter with exponential backoff — the item stays in in_progress state in the Convex database and the scheduler re-queues it automatically. No watchdog process needed.
Considered — FastAPI + Postgres + Celery
The canonical production stack with explicit durable workers. More natural for Temporal in production and easier to reason about visibility timeouts at scale. Ruled out for the prototype because standing up and wiring four separate services adds lead time without changing the observable behaviour.
Trade-off: At 10× volume the Convex scheduler's per-workspace limits and opaque durability model become constraints — the production path replaces it with Temporal for explicit workflow guarantees across restarts.