the problem with giving AI agents real tools
so at Upwork we've been building this thing called Fusion Studio — it's basically a frontend factory where AI agents build UI components from design specs. the catch is that these agents need access to everything. GraphQL schemas, our design system docs, browser automation, staging environments, issue tracking. and each of those systems has its own API, its own auth, its own quirks.
at first we tried the obvious thing: just shove context into the prompt. here's the schema, here's the component docs, here's how to deploy. but that falls apart fast. you're burning tokens on static context, the agent can't query for what it needs, and you end up with these massive system prompts that are 80% irrelevant to the current task.
that's when we found MCP.
what MCP actually gives you
Model Context Protocol is basically a typed tool interface for AI agents. instead of cramming everything into context, you expose capabilities as tools that the agent can call when it needs them. the agent sees a list of available tools with descriptions and typed parameters, and it decides what to call and when.
here's what two of our tool definitions look like:
const tools = [
{
name: "query-graphql-schema",
description: "Introspect the GraphQL schema for a domain",
parameters: {
domain: { type: "string", enum: ["identity", "catalog", "messaging"] },
},
},
{
name: "get-component-docs",
description: "Get Fluid Design System component documentation",
parameters: {
component: { type: "string" },
},
},
]the key insight is that the agent doesn't need to know how these work. it just knows what they do and what parameters they take. the MCP server handles all the messy stuff — auth, caching, rate limiting, error handling. the agent just calls get-component-docs with { component: 'FluidButton' } and gets back structured documentation.
the seven servers
we ended up building seven MCP servers, each one handling a different capability:
graphql schema introspection — this one connects to our federated graph and lets agents explore types, queries, and mutations by domain. when an agent needs to build a component that fetches data, it introspects the schema first instead of guessing at field names.
fluid design system docs — our design system has over 6,000 components. this server indexes all of them and serves component docs, prop types, usage examples, and accessibility notes on demand. without this the agent would need the entire design system in context which is just not feasible.
browser automation — a Playwright-backed server that lets agents navigate staging, take screenshots, fill forms, and verify their own work. this is the one that made the biggest difference honestly. agents can now look at what they built and iterate.
workflow state management — tracks where each component is in the build pipeline. agents can check status, update stages, and coordinate handoffs between different phases of the build process.
code generation templates — serves project scaffolding, boilerplate patterns, and framework-specific templates. instead of the agent reinventing file structure every time, it pulls the canonical template and customizes from there.
staging environment management — handles deploying preview builds, managing environment variables, and routing to the right staging instance. agents can deploy their work and get a URL back to verify against.
linear issue tracking — reads and updates Linear issues so agents can check requirements, update status, and leave notes about implementation decisions. keeps the human team in the loop without manual status updates.
why this abstraction works
the thing I keep coming back to is that MCP gives you composability. each server is independent. you can add a new capability without touching the others. an agent working on a component might call the design system server, then the GraphQL server, then the browser server to verify — and each call is just a typed function invocation.
the alternative was building custom integrations for each agent-to-system connection. that's n×m complexity where n is agents and m is systems. MCP makes it n+m. every agent speaks the same protocol, every system exposes the same interface.
// agent workflow: build a component from a spec
const schema = await mcp.call("query-graphql-schema", { domain: "catalog" })
const docs = await mcp.call("get-component-docs", { component: "FluidCard" })
// ... agent generates code using schema + docs ...
const preview = await mcp.call("deploy-staging", {
branch: "feat/catalog-card",
})
const screenshot = await mcp.call("browser-screenshot", { url: preview.url })each line is a different MCP server. the agent doesn't care. it's all just tools.
what I'd do differently
honestly the hardest part wasn't building the servers — it was getting the tool descriptions right. agents are surprisingly sensitive to how you describe a tool. vague descriptions lead to wrong tool selection. overly specific descriptions lead to agents not finding the tool when they need it. we went through probably a dozen iterations on descriptions alone before things felt reliable.
also: caching matters more than you'd think. the design system server was getting hammered with the same component lookups over and over. adding a simple LRU cache cut our response times in half and saved a ton of unnecessary API calls.
seven servers sounds like a lot but each one is pretty focused — maybe 200-400 lines of actual logic. the MCP SDK handles the protocol layer so you're really just writing the business logic for each tool. if you're building AI agents that need access to real systems, I'd seriously look at MCP before building custom integrations. it's the right level of abstraction.