Back to Blog
AI & Automation7 min read

Drop an AI Agent Into Any App: A Tech-Agnostic Architecture with ACP and MCP

Alex Ozhima
|April 28, 2026

Most teams that try to add a "chat with the app" experience end up writing a custom Anthropic- or OpenAI-SDK client, hand-rolling tool dispatch, baking in retries, building their own permission UI, and then hard-coding the whole thing to one model family.

A year later they're rewriting it because GPT-5 ships, or Claude 4.7 ships, or skills ship, or sub-agents ship, and their custom client has fallen behind every off-the-shelf coding agent.

There's a better pattern. Run an existing agent as a subprocess. Expose your app logic as MCP tools. Wrap it in a thin WebSocket bridge. The result is tech-agnostic — your frontend, backend, and even the agent itself become swappable parts.

I've been running this in production on an operator-facing product. Here's the architecture, why each boundary is where it is, and how to make it work with whatever stack you're already on.

The Problem With Button-Per-Verb UIs

Every internal tool eventually faces the same wall.

You start with a clean dashboard. Users want to "add a phone number to a contact" — you build a button. They want to "log a research note" — another button. They want to "rewrite a draft in a different tone" — modal, form, preview, save. Six months later, the right rail of your detail page is a Christmas tree of action buttons and your design team is asking for a redesign.

A chat panel collapses all of this. The operator types "the direct line is 555-0142, log it and re-enrich" — the agent calls three tools in sequence and the work is done. No new buttons. No new modals. No new training docs.

But the chat panel only works if the agent can actually do things. Which is the architecture problem.

The Architecture in One Diagram

Two client surfaces, one agent core. The web client talks through a WebSocket bridge; the TUI cuts out the middleman and pumps stdio directly. Both end up at the same MCP boundary, hitting the same API. The interesting part is what's at each boundary, because that's what makes it tech-agnostic.

The TUI deserves a special mention. It's not just a development tool — it's a real client. Internal users who live in the terminal (engineers, ops, power operators) often prefer a pnpm chat --candidate <id> invocation over opening a browser tab. Because the TUI speaks stream-json directly to the agent subprocess, you can ship the TUI without ever building the WebSocket bridge — it's the smallest viable deployment of the whole pattern.

Boundary 1: The Frontend Doesn't Know There's an Agent

The browser opens a WebSocket and sends JSON frames:

{"type": "user", "text": "log 555-0142 as a direct phone"}

It receives back:

{"type": "delta",    "text": "Logging 555-0142..."}
{"type": "tool_use", "name": "add_candidate_contacts", "input": {...}}
{"type": "result",   "cost_usd": 0.0123}

That's it. The frontend doesn't know which agent is running. It doesn't know which model. It doesn't know what tools exist. It renders streaming text and a thin "tool was called" badge so the operator sees what's happening.

This works in Next.js, SvelteKit, Remix, Astro, plain Vite + React, even a plain HTML page with a WebSocket. The contract is six message types. Anyone who can hold a WebSocket open can host the chat panel.

Boundary 2: The Bridge Speaks ACP On One Side, WebSocket On the Other

The bridge is the only piece that has to know about ACP. Its job is small:

  1. Accept a WebSocket from the browser
  2. Spawn the agent (an ACP server) as a subprocess
  3. Translate user turns from the WebSocket frame format into ACP messages on the agent's stdin
  4. Translate ACP events from the agent's stdout into the browser's frame format
  5. Tear everything down on disconnect

In our case the bridge is FastAPI (~150 lines), but the bridge could equally be Express, Fastify, Rails Action Cable, Phoenix, or a Cloudflare Worker. There's no business logic here. The bridge is an ACP-to-WebSocket translator, not a brain.

Boundary 3: The Agent Is an ACP Server

This is where most teams overcomplicate things. You don't need to build an agent. You just need one that speaks ACP — the Agent Client Protocol — over stdio. ACP is the open spec (originally from Zed) that standardizes how a host application drives a coding agent: how user turns are sent in, how partial assistant deltas, tool calls, and tool results stream out, how permission requests are negotiated.

Your bridge (or TUI) opens an ACP session against the agent subprocess and pumps frames in both directions. That's the only contract you need to honor. Everything else — which model, which provider, how reasoning happens, how tools are executed — is the agent's problem.

The mature ACP-speaking agents you can drop in today:

Claude Code (claude --print --input-format stream-json --output-format stream-json). Anthropic's CLI agent. Stream-json is its ACP-compatible wire format. You get skills, MCP, sub-agents, web search, and model upgrades for free. This is what we run in production.

OpenAI Codex CLI (codex). Open-source coding agent from OpenAI; ACP-compatible, MCP-aware.

Gemini CLI (Google). Open-source, supports ACP and MCP natively.

Goose (Block's open-source agent). Native MCP, multiple model providers, runs as a CLI or as a server, ACP-compatible.

OpenCode / Aider / Continue / Cline if you want fully open-weight or fully OSS. All of them support MCP for tool calling and can be invoked as subprocesses; ACP support varies but the wire formats are close enough that a thin adapter is feasible.

The architectural payoff: when the next agent ships with better reasoning, smaller context, or cheaper tokens, you change one config line. ACP is the dependency-injection seam for the LLM era.

Boundary 4: MCP Is Where You Define the Agent's Power

This is the most important boundary in the whole design.

The agent doesn't talk to your database. It doesn't talk to your queue. It doesn't talk to your auth system. It talks to one MCP server that you wrote, and that MCP server is the only thing that talks to your app's API.

@server.tool()
async def add_candidate_contacts(candidate_id: str, emails: list[str], phones: list[str]):
    """Append verified contact info to a candidate's lead record."""
    return await http.post(f"{API}/candidates/{candidate_id}/contacts",
                           json={"emails": emails, "phones": phones})

Every tool is a thin HTTP wrapper. The MCP server has no business logic. Validation, dedup, audit logging, queue side-effects — all of it lives in the API layer where it already lived before the agent existed.

Why this matters: scope is a list, not a flag

The agent can do exactly what you list as tools. Nothing more.

Want a read-only chat for a customer-facing feature? Register get_* and list_* tools, omit every mutator. The agent literally cannot write — it has no verb for it.

Want a destructive tool gated behind confirmation? Register it, then add a one-line skill or system prompt rule that says "always ask for explicit confirmation before calling send_email". The agent will. (Belt-and-braces: also enforce it in the API route.)

Want different surfaces for different user roles? Spawn the MCP server with a role-scoped env var and conditionally register tools. An admin's chat sees delete_*; an analyst's chat doesn't.

This is the part teams who roll their own agent client get wrong. They put auth logic, scope logic, and rate limits inside the LLM-calling code. Then they have two truth-checkers for the same invariant — one in the agent layer, one in the API. MCP forces you to pick one place: the API layer, where it belongs.

Per-conversation isolation falls out for free

Each WebSocket spawns its own agent subprocess and its own MCP subprocess. Two operators in two tabs share zero state. When one disconnects, both subprocesses die. There's no in-memory session store to corrupt, no cross-conversation token leak, no "did I clear the history?" footgun. The OS does the isolation work.

Boundary 5: The API Doesn't Know the Agent Exists

The mutation endpoints the agent hits are the same ones a human-driven UI would hit. Append a contact. Log a fact. Edit a draft. Approve a draft.

If you already have a REST or RPC layer for your product — and you do, because your normal frontend uses it — you don't write a single new endpoint for the agent. You expose the existing ones as MCP tools.

That's the secret to "tech-agnostic" working. Your FastAPI / Express / Rails / Django app doesn't change. It doesn't import an SDK. It doesn't know any of this is happening. The agent is just another HTTP client with its own user-agent string.

Skills: Behavior That Travels With Your App

There's one extra piece worth knowing about. Mature agents like Claude Code support skills — Markdown files with frontmatter that the agent loads when their description matches the user's intent.

.claude/skills/
├── outreach-playbook.md      # Voice and structure for emails
├── data-mutation-rules.md    # "Always confirm before send_email"
└── domain-glossary.md        # Project-specific vocabulary

Skills check into your repo. They're how you encode operator playbooks that the agent follows automatically — the things that aren't tools but are also not generic LLM knowledge. Tone rules. Confirmation gates. House style. Domain shortcuts.

They're not a replacement for API-level safety. They're a soft layer that makes the agent behave like a senior teammate instead of a literal tool dispatcher.

What This Stack Actually Costs to Build

In our codebase:

  • MCP server: ~200 lines. Twelve tools, each a 5-line HTTP wrapper.
  • Agent bridge: ~150 lines. One WebSocket route, one subprocess pump.
  • Web chat panel: ~250 lines. WebSocket client, streaming text, tool-use badges.
  • API mutation routes: existed already. Three new endpoints to round out coverage.

Less than a thousand lines of glue, and we got a fully working in-app agent that streams, calls tools, runs skills, and isolates per conversation. Compare that to writing your own SDK client with retry logic, tool dispatch, and permission handling — easily 3-5x more code, and you'd still be behind on features the next time the agent vendor ships an update.

The Drawback: Generic Beats Specific, Until It Doesn't

Be honest about what an agent loop costs.

A chat UI is generic. The same panel handles "log this phone number," "rewrite the draft in a friendlier tone," and "find the actual decision-maker at this company." For non-obvious tasks — anything where the operator doesn't know the exact sequence of steps in advance — that flexibility is the whole point. The agent thinks, calls tools, reacts to results, and converges on an answer the UI designer never anticipated.

But for well-defined workflows, an agent is the wrong tool.

If your operator's job is "open the candidate, paste a phone number into a field, click Save," a button-and-form takes ~200ms and zero LLM tokens. The agent equivalent is: parse the user's message, decide which tool to call, hit the API, stream back a confirmation. That's 2-5 seconds and a few cents per turn. For a workflow that runs 500 times a day, you're paying for latency and inference on a path that doesn't need either.

The pragmatic split:

  • High-frequency, well-defined verbs → keep the button. Forms are faster, cheaper, and don't hallucinate.
  • Low-frequency, ambiguous, or composite tasks → put them behind chat. Anything that would otherwise require a wizard, a multi-step modal, or a "contact support" link is a candidate.
  • Read-mostly investigation ("why is this candidate stalled?", "summarize what we know") → almost always belongs in chat. Static UIs can't compose novel queries on demand.

The chat panel and the buttons coexist. They're not competing UIs — they're handling different shapes of work.

The Three Rules

If you take nothing else from this:

  1. The agent runs as a subprocess. Don't build an SDK client. Drive an existing agent over stdio. You inherit every feature it ships from now on.
  2. MCP is the only thing the agent can do. Your tool list is your security boundary. Read-only? Don't register mutators. Mutator-with-confirmation? Use the existing API route's validation, plus a skill rule.
  3. Your app API doesn't change. If the agent needs a new verb, add it to the API like any normal feature, then expose it as one MCP tool. Don't put business logic in the agent layer.

Get those three boundaries right and the rest of the stack — Next.js or Svelte, FastAPI or Rails, Claude or Codex or Goose — becomes implementation detail.


Thinking about adding an agent to your product? Don't hesitate to give us a call — we've shipped this pattern in production and we're happy to talk through the trade-offs for your stack. Book a 30-minute call.

Alex Ozhima

Alex Ozhima

Founder & CEO at Katlextech

Ready to Ship Your Product?

Let's discuss how we can implement these strategies for your business