We Built an Agentic System Before We Knew That's What We Were Building

A lot of the writing about agentic AI right now describes architectures that haven't shipped. Frameworks, papers, diagrams — the pattern of separating an LLM planner from a deterministic execution layer is well-described in academic work and a handful of open-source toolkits. The shape is well-known on paper.

What I haven't seen as much of is a plain, honest account of what that architecture actually looks like once a real product is running on top of it. So that's what this is.

We built an ecommerce automation platform — a system that helps sellers manage product listings across seven marketplaces (Amazon, eBay, Shopify, and others) from one place. We didn't set out to build "an agent." We set out to make a hard product workable. Halfway through, we realized the only way to make it workable was agentic.

This is the version of agentic architecture I trust — the one where the shape emerged from the constraints of the problem, not from a desire to use the latest pattern.

The Problem

A seller managing seven marketplaces is doing the same thing seven times. Create a listing on Amazon. Create the same listing on eBay. Update the price on Shopify when a competitor drops theirs. Sync inventory daily. Pause a listing when stock runs out. Reopen it when it comes back.

The naive solution is a UI that wraps a bunch of APIs — checkboxes for which marketplaces, forms for product data, buttons for each action. We started there. It didn't work.

It didn't work because the actual user request was never "create this listing on Amazon." It was always something more like:

"List this product on Amazon and eBay, sync the inventory daily, and update prices on Shopify when competitors drop theirs by more than 5%."

That's not a button click. That's a workflow. With dependencies, conditions, retries, and steps that need to keep running long after the user closes the tab.

The buttons-and-forms UI fell apart the moment the user wanted any composition. Every new combination meant a new screen. The product was unbuildable as a traditional CRUD app.

The Architecture That Emerged

Once we accepted that every meaningful user action was actually a workflow, the system had to be redesigned around that fact. What we ended up with is four pieces, with a strict separation of who decides what.

Four pieces. Each one with a job nobody else can do

1. The activity layer

Every meaningful action — create a listing, update a price, sync inventory, pause a product — is its own activity. Activities are grouped by domain across one or more repositories, but the boundary that matters isn't the repo. It's that each activity is independent and registered with Temporal. Where it lives in the codebase is a deployment detail. What it does and how it can be invoked is the contract.

What matters more than the implementation: each activity declares itself. It tells the system its name, whether it's a safe read or a sensitive write, what parameters it expects, what shape its output takes, and what errors it can produce.

That self-declaration matters a lot. It's the foundation of everything that comes later.

2. The orchestrator

A Go service running on Temporal. Its job: take a list of tasks, validate them, resolve dependencies, and run them as a durable workflow.

The orchestrator handles all the things that used to be the hard parts of distributed systems — retries, state, idempotency, partial failure recovery, parallel execution. None of that is novel. What's novel is that the LLM doesn't touch any of it.

3. The chat gateway

A separate service that holds the LLM, manages conversation context, and exposes the catalog of available activities to the model. When the user types a sentence, the chat gateway translates that intent into a structured task list. Then it hands that list to the orchestrator.

The LLM's role ends at producing the plan. It doesn't execute. It doesn't retry. It doesn't decide what to do when something fails.

4. The visual interface

For users who already know what they want, typing sentences is slower than building a workflow visually. A drag-and-drop interface, n8n-style, lets power users compose explicit task lists and hand them directly to the orchestrator. Same backend. No LLM in the path

The Decision That Mattered Most

Most agentic systems being built right now make the LLM responsible for both planning and execution. The LLM picks tools, the LLM retries on failure, the LLM tracks state, the LLM decides when to give up. Every one of those is a place the LLM is making a decision it isn't reliable enough to make.

We made a different bet. The LLM plans. Temporal executes. The orchestrator validates.

That separation is the thing that made the system actually work in production. When a workflow fails halfway through — and it will, because marketplaces have outages, rate limits, and inconsistent APIs — the recovery is deterministic. Temporal knows exactly which step failed, has the inputs preserved, and retries the same step. No LLM in that loop. No fresh interpretation of what to do next. The same retry, every time.

If the LLM were the runtime, every retry would be a coin flip. Sometimes it would retry the right thing. Sometimes it would re-plan and pick a different approach. Sometimes it would hallucinate that it had succeeded. None of those are acceptable behaviors when real money and real listings are involved.

The framing I'd use now: the LLM is the language the user speaks. Everything else is the machinery that makes the language safe to act on.

The Permission System

This part doesn't get written about much, but it's where the architecture either holds up or falls apart.

The model is simple: authorize once for the whole workflow, not once per step.

When a plan arrives at the orchestrator — whether it came from the chat gateway or the visual interface — the first thing the orchestrator does is identify the user, look up their role, and check the permissions for every activity in the proposed plan against that role up front. One authorization decision, taken before the first activity runs, covering the entire chain of work.

If any step in the plan is something the user isn't allowed to do, the whole plan is rejected before execution starts. If every step is allowed, the workflow runs end-to-end without checking permissions again.

This matters for a few reasons. Workflows can run for minutes or hours — re-checking auth at every step would mean repeatedly hitting the user store, dealing with token refreshes mid-workflow, and creating windows where a half-executed workflow could leave the system in a broken state if a permission changed between steps. One check at the gate is cleaner, faster, and easier to reason about.

This is what makes the system multi-tenant safe. A user with read-only access can ask the chat gateway to delete every listing they own. The LLM may propose a deletion plan. The orchestrator rejects the entire plan at the gate — nothing executes. The decision about what's allowed is not in the model. It's in the rules the model can't override.

What I Didn't Expect

A few things surprised me along the way.

The visual interface mattered more than the chat did. We built the chat first because it felt like the more impressive feature. But power users — the ones doing this every day — don't want to type. They want to compose explicit workflows once and run them on a schedule. The chat is what brings new users in. The visual interface is what keeps them.

Self-describing activities became the most leveraged part of the system. Once each activity declares its own name, parameters, and output format, you get a lot for free: the LLM has a real catalog to plan against, the visual interface auto-generates its node library, the documentation writes itself, and onboarding a new developer means pointing them at a single activity instead of explaining the whole system.

Authorizing the whole plan up front, instead of per step, kept the system honest. It's easy to default to per-step permission checks because each one feels safer in isolation. In practice, per-step checks created edge cases — a workflow could be halfway through when a permission state changed, and recovery from that is genuinely hard. Authorizing the entire plan once, at the gate, made every workflow either fully allowed or fully refused. No half states. Easier to reason about, easier to audit, easier to debug.

When This Architecture Is the Right Answer

This kind of system is expensive to build. It's only worth it under specific conditions.

It's the right answer when the workflow shape isn't known until the user asks. If users compose actions in different ways, or string steps together that no developer pre-built into a single endpoint, you need an orchestrator that can run arbitrary plans.

It's the right answer when you have multiple user surfaces with different expertise levels. Non-technical users want chat. Power users want explicit visual workflows. Both can sit on the same backend if the backend is designed around plans rather than endpoints.

It's the right answer when state, retries, and audit matter. Workflows that span minutes or hours, touch multiple systems, and must recover cleanly from failure — that's where Temporal earns its complexity.

It's the wrong answer when the workflow is the same every time. If your users always do the same five steps in the same order, write that as a function. You don't need an orchestrator. You don't need an LLM. You're paying for flexibility you'll never use.

It's also the wrong answer when users need certainty about what will happen. An LLM in the loop, even a planning-only one, introduces variability. If the user wants exactly the same outcome every time, the LLM should not be in the user's path at all.

What I'd Tell a Founder Considering This Today

Agentic architecture buys you flexibility at the cost of determinism. If your business needs the flexibility — if your users compose workflows in ways no developer can pre-anticipate — the cost is worth it. If your business doesn't, you're buying complexity you don't need, and you'll spend a year solving problems you didn't have to take on.

The shape that worked for us was the LLM is the language; the runtime is deterministic. I think that pattern will hold up for most production systems, regardless of which framework or model is in fashion next year.

We didn't set out to build an agent. We set out to ship a product. The agentic shape emerged from the actual constraints of the problem. That's the version of "agentic" I trust — the one nobody started by trying to build.

If you're scoping a system that has to handle composable, multi-step user workflows — or you're stuck trying to make an LLM both plan and execute and watching it break in production — this is the kind of problem we work on at BlueSoft. Happy to compare notes.