Designing Great MCP Tool Schemas: Best Practices and Anti-Patterns

The single biggest factor that separates a great Model Context Protocol (MCP) server from a frustrating one is tool schema design. The same underlying capability — say, "search the database" — can be a delight or a disaster depending on how you shape its inputs, outputs, and description.

This article distils the patterns that consistently make tools easy for an LLM to use well, and the anti-patterns that make tools quietly useless. Every rule below comes with a concrete example.

The principle to remember

An LLM picks a tool based on its description, calls it with arguments matching the schema, and reads the text response. Schema design is about making each of those three steps unambiguous.

Get it right and the LLM uses your tool fluidly. Get it wrong and the LLM either ignores the tool, calls it with wrong arguments, or fails to parse the response.

Rule 1: Write descriptions for the LLM, not the developer

Developers think descriptions should explain what the tool does. The LLM needs to know when to use it.

// ❌ Developer-facing description
server.tool('weather', 'Weather utility', {...}, ...);

// ❌ Slightly better, still vague
server.tool('weather', 'Returns weather data', {...}, ...);

// ✅ Tells the LLM exactly when to invoke
server.tool(
  'get_weather',
  'Get current weather conditions (temperature, conditions, humidity) for a specific city. Use whenever the user asks about temperature, rain, snow, or general weather in any location.',
  {...},
  ...
);

The rule of thumb: imagine the user asks a question. If reading only the tool description, would the LLM know this is the right tool? If yes, the description is good.

A harder test: a description should help the LLM not call the tool when it is the wrong fit. "Get current weather conditions" implicitly tells the LLM this is for current weather, not historical or forecast — useful negative information.

Rule 2: Name tools as intents, not endpoints

Wrap your REST API as create_issue, not post_issues. The LLM reasons in intent — verbs from the user's perspective — not HTTP semantics.

// ❌ Mirrors the REST endpoint
server.tool('post_orders', '...', {...}, ...);
server.tool('get_orders_by_status', '...', {...}, ...);
server.tool('patch_order', '...', {...}, ...);

// ✅ Reads like natural intent
server.tool('create_order', '...', {...}, ...);
server.tool('find_orders_with_status', '...', {...}, ...);
server.tool('update_order', '...', {...}, ...);

Snake-case verb-noun names map naturally to user intent. The LLM consistently picks them better than HTTP-shaped names.

Rule 3: Use specific names for specific actions

Resist the urge to ship a do_anything super-tool. A small army of single-purpose tools beats one omnibus tool every time.

// ❌ Too generic — the LLM struggles to know what "resource" or "action" means
server.tool('execute', 'Perform an action on a resource', {
  resource: z.string(),
  action: z.string(),
  params: z.any(),
}, ...);

// ✅ Specific tools the LLM picks confidently
server.tool('create_user', '...', { email: z.string(), name: z.string() }, ...);
server.tool('disable_user', '...', { userId: z.string() }, ...);
server.tool('reset_user_password', '...', { userId: z.string() }, ...);

The specific tools also let you write better descriptions and tighter schemas. You can have 30 specific tools that all work well — but you cannot have one generic tool that works well.

Rule 4: Make required arguments truly required

If an argument is mandatory, mark it required. If it has a sensible default, make it optional with a default. Halfway states ("required but with a fallback") confuse the LLM.

// ❌ Half-defined — what happens if limit is missing?
server.tool('search', '...', {
  query: z.string(),
  limit: z.number(),  // required? optional with what default?
}, async ({ query, limit }) => {
  const max = limit || 20;  // implicit default, schema does not say
  ...
});

// ✅ Schema reflects the actual contract
server.tool('search', '...', {
  query: z.string(),
  limit: z.number().int().min(1).max(100).default(20),
}, async ({ query, limit }) => {
  // limit is always present and within bounds
  ...
});

The .default(20) clause means Zod fills in 20 automatically if the LLM omits the argument. Your handler can then trust the value.

Rule 5: Use .describe() on every non-obvious argument

Field-level descriptions are gold for an LLM. They are what the LLM sees when it has to decide what value to pass.

// ❌ Schema looks fine, but the LLM has no idea what "q" expects
{ q: z.string(), n: z.number().int() }

// ✅ Self-documenting
{
  q: z.string().describe('Search query. Supports GitHub-style operators like "is:open label:bug".'),
  n: z.number().int().min(1).max(50).describe('Max results to return (1-50). Default 10.'),
}

For argument names, also prefer clarity over brevity. query beats q, limit beats n, customer_id beats cid. The few extra characters cost nothing and clarify intent.

Rule 6: Constrain with enums when there are fixed choices

When an argument has a small fixed set of valid values, encode it in the schema. The LLM picks from the enum instead of guessing strings.

// ❌ Free-form string — LLM might pass "shipped", "Shipped", "SHIPPED", "in transit"
{ status: z.string() }

// ✅ Constrained — LLM can only pass one of these
{ status: z.enum(['pending', 'shipped', 'delivered', 'cancelled', 'refunded']) }

The enum also self-documents the API. The LLM sees the valid values directly in the schema and never has to guess.

Rule 7: Return text shaped for the LLM, not JSON shaped for code

Developers reflexively return JSON. LLMs read text more efficiently than nested JSON they have to parse. Return preformatted strings unless the caller specifically needs structured data.

// ❌ The LLM has to parse this before answering
return {
  content: [{
    type: 'text',
    text: JSON.stringify({
      orders: [
        { id: 4242, status: 'shipped', total: 89.99, currency: 'USD' },
        { id: 4243, status: 'pending', total: 145.50, currency: 'USD' },
      ],
    }, null, 2),
  }],
};

// ✅ Already formatted for human/LLM reading
return {
  content: [{
    type: 'text',
    text: [
      'Found 2 orders:',
      '• Order #4242 — shipped, $89.99 USD',
      '• Order #4243 — pending, $145.50 USD',
    ].join('\n'),
  }],
};

Fewer tokens, easier for the LLM to read, easier for the user to glance at when the tool result is displayed.

Rule 8: Cap response sizes

Never return unbounded data. A tool that can return 10,000 rows will eventually return 10,000 rows and burn 200,000 tokens of context for one call.

server.tool('list_orders', '...', {
  limit: z.number().int().min(1).max(100).default(20),
  status: z.enum(['pending','shipped','delivered','cancelled']).optional(),
}, async ({ limit, status }) => {
  let rows = await fetchOrders({ status });
  const total = rows.length;
  if (rows.length > limit) rows = rows.slice(0, limit);
  return {
    content: [{
      type: 'text',
      text: rows.length === total
        ? formatOrders(rows)
        : `${formatOrders(rows)}\n\n(Showing first ${limit} of ${total}. Pass a higher limit or filter to see more.)`,
    }],
  };
});

Telling the LLM there are more results — and how to ask for them — preserves the option to dig deeper without forcing the cost upfront.

Rule 9: Use isError: true for recoverable failures

MCP responses can flag themselves as errors. Use this for expected failure modes (not found, validation errors, rate limited) — they let the LLM react gracefully.

server.tool('get_order', '...', { id: z.number() }, async ({ id }) => {
  const order = await db.findOrder(id);
  if (!order) {
    return {
      content: [{ type: 'text', text: `Order #${id} not found.` }],
      isError: true,
    };
  }
  return { content: [{ type: 'text', text: formatOrder(order) }] };
});

Throwing an exception, by contrast, tears down the connection. Reserve throws for truly unexpected failures (database connection lost, etc.) where the server should crash and be restarted.

Rule 10: Output schemas? Only for structured data

MCP supports an optional outputSchema for tools that return structured JSON. Most tools should not bother — return text. Use outputSchema only when a downstream tool needs to consume the output programmatically (rare in chat UX, common in agent orchestration).

Anti-pattern checklist

A quick gut-check before shipping a tool:

Anti-pattern Sign Fix
One-word descriptions 'Fetch data' Write 1-2 sentences with intent + example
HTTP-verb names post_users, get_orders Rename to intents: create_user, find_orders
Free-string arguments where enums fit { status: z.string() } z.enum(['pending','shipped',...])
Required-but-optional arguments Implicit || default in handler Use .default(value) in schema
Raw JSON output JSON.stringify(data) Preformat to text the LLM can read
No size cap on listings return everything Add limit with sensible max
Throws on expected failures if (!found) throw Return isError: true
Cryptic argument names q, n, id (without context) query, limit, customer_id
One mega-tool execute({ action, params }) Split into specific tools
Missing field-level descriptions Plain z.string() Add .describe('...')

If you can answer "no" to all ten, your schemas are in great shape.

A before-and-after example

For a final concrete illustration — here is a poorly-designed tool and its improved version.

// ❌ Poor design
server.tool('search', 'Search the database', {
  q: z.string(),
  l: z.number().optional(),
  t: z.string().optional(),
}, async ({ q, l, t }) => {
  const limit = l || 10;
  const results = await db.search(q, { limit, type: t });
  return { content: [{ type: 'text', text: JSON.stringify(results) }] };
});

Problems: vague description, cryptic argument names, no size cap, no enum constraint on t, raw JSON output.

// ✅ Improved
server.tool(
  'search_customers',
  'Search customer records by name, email, or company. Returns up to 25 matches with summary info; use a more specific query if too many results.',
  {
    query: z.string().describe('Search term — matches against name, email, and company.'),
    limit: z.number().int().min(1).max(25).default(10).describe('Maximum results to return.'),
    status: z.enum(['active', 'churned', 'trial']).optional().describe('Optional filter by customer status.'),
  },
  async ({ query, limit, status }) => {
    const all = await db.searchCustomers(query, { status });
    const shown = all.slice(0, limit);
    if (!shown.length) {
      return { content: [{ type: 'text', text: `No customers match "${query}".` }] };
    }
    const lines = shown.map(c => `• ${c.name} (${c.email}) — ${c.company}, status: ${c.status}`);
    const footer = all.length > limit ? `\n\n(${all.length - limit} more matches not shown.)` : '';
    return { content: [{ type: 'text', text: `${all.length} matches:\n${lines.join('\n')}${footer}` }] };
  }
);

The LLM understands when to use it, knows exactly what to pass, gets a clean response, and learns there are more results without forcing them into context.

Conclusion

Good schema design is the difference between a tool the LLM uses naturally and one it ignores. The ten rules above are not style preferences — each one corresponds to a real behaviour difference in how an LLM picks and uses tools.

The single highest-impact change you can make to an MCP server: rewrite every tool description to include both what it does and when to use it. Most other improvements compound from there.

Write for the LLM. Constrain inputs. Cap outputs. Preformat responses. The rest is detail.

Try it yourself

The same conversation against a well-designed vs. poorly-designed schema produces dramatically different LLM behaviour:

YouFind me the top 5 trial-stage customers whose name contains “Acme”.
Claude · used search_customers with query: "Acme", limit: 5, status: "trial"3 trial customers match “Acme”:

Acme Robotics (avi@acmerobotics.io) — trial day 6 of 14
Acme Health Co. (m.lee@acmehealth.co) — trial day 11 of 14
Acme Edu (priya@acme-edu.org) — trial day 2 of 14

Acme Health Co. is closest to trial expiry — worth a check-in. Want me to draft an email?

The LLM constructed valid arguments on the first try because the enum and the limit.max(25) bound told it exactly what was allowed. With a poorly-designed schema, the same prompt typically produces a tool call with wrong argument names or an unhelpful enum-less status string the server has to defensively normalise.

Leave a Comment

Your email address will not be published. Required fields are marked *