Back to articles

Build an Anthropic agent that survives streaming, tools, and production

The four-layer architecture that stops your TypeScript agent from hanging on streams, dropping tool inputs, or silently losing beta features between staging and prod.


What you’ll build

A resilient Claude agent that streams responses, invokes tools, and ships with production-grade observability. Use this blueprint to bootstrap internal copilots or automate operational workflows with Anthropic’s TypeScript SDK.

Highlights

  • Streaming UX
  • Tool orchestration

Safety nets

  • Typed error paths
  • Usage telemetry

Launch-ready

  • Deployment checklist
  • Troubleshooting

You wired up an Anthropic agent. The demo replied. You added a tool, switched to streaming, and now the loop hangs on the third turn, or your tool inputs arrive as malformed JSON, or your beta features work locally and 404 in prod.

The model is not the problem. The plumbing is.

Treat the agent as four layers and the failure modes get small: SDK client, prompt and goal manager, tool registry, observability. Get those right and the same code that prints a stream locally runs unattended in production.

By the end of this post, you will know how to:

  • Configure one shared client so retries, timeouts, and headers stop drifting across workers
  • Stream text and accumulate tool inputs without dropping fragments
  • Replace the manual streaming loop with toolRunner() when your tools are stable
  • Predict the three failure modes that will hit you in the first week of production

Who this is for

You are building a TypeScript agent on @anthropic-ai/sdk. You have three problems hiding inside one slow or broken turn:

  1. Your client config is duplicated across files, so retries and timeouts disagree.
  2. Your streaming loop drops tool input fragments, or hangs because you broke early without aborting.
  3. Your beta flags work locally and silently fail in another environment.

This is the layering that fixes those before you ship.

Start with one client, not five

Every part of your agent should respect the same timeouts, retries, and headers. The cheapest way to enforce that is a singleton.

// src/anthropicClient.ts
import { Anthropic, type ClientOptions } from "@anthropic-ai/sdk";

const defaults: ClientOptions = {
  apiKey: process.env.ANTHROPIC_API_KEY!,
  timeout: 600_000,  // scales with max_tokens for non-streaming calls
  maxRetries: 3,     // honours Retry-After and Retry-After-Ms headers
  defaultHeaders: {
    "User-Agent": "demo-agent/1.0",
  },
};

let singleton: Anthropic | null = null;

export function getAnthropicClient(
  overrides: Partial<ClientOptions> = {},
) {
  if (!singleton) {
    singleton = new Anthropic({ ...defaults, ...overrides });
  }
  return singleton;
}

Connection pooling stays hot. Retry policy stays consistent. When you need a different fetch, proxy, or base URL for tests or multi-tenant routing, pass overrides at construction — do not fork the config.

The SDK retries 408/409/429/5xx with exponential backoff and honours Retry-After. Tune maxRetries per environment instead of writing your own retry wrapper.

Stream text and accumulate tool inputs in the same loop

The first thing that breaks when you switch from messages.create() to messages.stream() is tool calls. Text arrives as text_delta events. Tool inputs arrive as input_json_delta fragments that you must accumulate yourself until the matching content_block_stop, then JSON.parse once.

If you forget to accumulate, you get truncated JSON. If you break out of the iterator early without aborting, the stream never resolves and your request hangs forever.

const pendingTools = new Map<
  string,
  { name: string; buffer: string[] }
>();

const toolCalls: Array<{ id: string; name: string; input: unknown }> = [];

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    handlers.onText(event.delta.text);
  }

  if (event.type === "content_block_start" && event.content_block.type === "tool_use") {
    pendingTools.set(event.content_block.id, {
      name: event.content_block.name,
      buffer: [],
    });
  }

  if (event.type === "content_block_delta" && event.delta.type === "input_json_delta") {
    pendingTools.get(event.content_block.id)?.buffer.push(event.delta.partial_json);
  }

  if (event.type === "content_block_stop" && event.content_block.type === "tool_use") {
    const entry = pendingTools.get(event.content_block.id);
    toolCalls.push({
      id: event.content_block.id,
      name: entry?.name ?? "unknown",
      input: JSON.parse((entry?.buffer ?? []).join("")),
    });
  }
}

Feed toolCalls into your execution layer, append the tool_result message to history, and re-enter the loop.

The rule: drain the iterator or call stream.controller.abort(). Half-drained streams are the most common hang in production.

Stop hand-rolling the tool loop when your tools are stable

The manual loop above is useful when you need to instrument every event. Once your tool schemas settle, the SDK's helpers remove the entire accumulator.

import Anthropic from "@anthropic-ai/sdk";
import { betaZodTool } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";

const lookupCustomer = betaZodTool({
  name: "lookupCustomer",
  description: "Fetch a customer profile by ID.",
  inputSchema: z.object({ id: z.string().uuid("Customer IDs are UUIDs.") }),
  run: async ({ id }) => {
    const record = await db.customers.findUnique({ where: { id } });
    if (!record) {
      return { error: `No customer found for ${id}` };
    }
    return { name: record.name, tier: record.tier, openTickets: record.tickets.length };
  },
});

const finalMessage = await client.beta.messages.toolRunner({
  model: "claude-sonnet-4-5",
  max_tokens: 2048,
  system: "You are a pragmatic assistant...",
  messages: [{ role: "user", content: "Look up customer 8d047c1e-..." }],
  tools: [lookupCustomer],
});

betaZodTool enforces the input contract with Zod before your handler runs. toolRunner() streams events, executes tools, and stitches tool_result blocks back into the conversation for you.

The trade-off: you lose per-event hooks. If you need to log every text_delta to a UI socket, keep the manual loop. If you are building a backend agent that returns a final message, use the helper.

Treat beta features as a deployment concern

Beta features land via the betas array on client.beta.messages.create() and .stream():

betas: ["code-execution-2025-08-25", "mcp-client-2025-04-04"]

The Files API needs files-api-2025-04-14. MCP needs mcp-client-2025-04-04. Without the string, the response payload returns a schema error and your tool call silently fails.

The failure mode worth naming: the beta string works in your dev workspace, your prod workspace has not been granted that beta yet, and the request returns a generic API error. Log the betas array you sent on every request, and log the workspace ID. When prod breaks and staging works, that is the first place to look.

Branch on typed errors, not on status codes

The SDK exposes typed errors. Branch on them instead of parsing status codes by hand.

import { Anthropic } from "@anthropic-ai/sdk";

try {
  await runStreamingAgent(goal, handlers);
} catch (error) {
  if (error instanceof Anthropic.RateLimitError) {
    logger.warn("Rate limit hit", { retryAfter: error.headers["retry-after"] });
  } else if (error instanceof Anthropic.APIError) {
    logger.error("Anthropic API error", {
      status: error.status,
      detail: error.error,
    });
  } else {
    throw error;
  }
}

RateLimitError, APIConnectionError, APIConnectionTimeoutError, and APIResponseValidationError each tell you something the status code does not. APIResponseValidationError in particular means the model returned a shape the SDK could not parse — usually a beta feature mismatch.

Instrument await stream.finalMessage() for usage (input, output, cache reads, cache writes), capture the anthropic-request-id header on every response, and tag spans with X-Stainless-Retry-Count. Use .withResponse() when you need the raw response object for OpenTelemetry.

What you will hit next

Three predictions for the team that ships this:

  1. Your tool inputs will arrive truncated. Someone will refactor the streaming loop, forget that input_json_delta is fragmented, and JSON.parse a single delta. The fix is the accumulator above — but you will only notice once a long tool argument starts failing on real users.
  2. Your stream will hang because you broke early without aborting. The first time a tool call returns an error and the developer adds break inside the iterator, the request never resolves and the connection sits open until your platform timeout fires. Wire stream.controller.abort() into every early exit.
  3. Your beta flags will silently get stripped between staging and prod. A config diff, a missing env var, a workspace that was never granted the beta. The request returns an APIError with a generic message, the tool quietly never fires, and you spend an afternoon staring at the prompt. Log the betas array on every request.

If you are already feeling the pull of any of these, you are far enough along to need the layering above.

Troubleshooting cheat sheet

  • 401 or 403: print process.env.ANTHROPIC_API_KEY?.slice(0, 4) in dev and confirm the workspace has the right permissions.
  • APIConnectionTimeoutError: confirm outbound HTTPS, raise timeout, retry with smaller max_tokens. Persistent failures usually trace to a corporate proxy.
  • Streams never finish: drain the iterator or call stream.controller.abort() on early exit.
  • Tool rejections: validate your JSON Schema. Malformed input or missing required fields surface as schema errors in the response payload.
  • Beta feature denied: the error body names the missing beta string. Add it to betas or request workspace access.
  • Unexpected browser warning: only set dangerouslyAllowBrowser: true in trusted contexts, and rotate keys if one shipped client-side.

The real lesson

Three sentences.

The Anthropic SDK is not the source of your agent bugs — your streaming loop, your client config, and your beta flags are. Centralise the client, accumulate tool inputs correctly, and log every beta string you send. The smallest investment that makes the rest of the project fast is one singleton and one accumulator.


If you are building a TypeScript agent right now, send me one tool definition from your loop and I will tell you whether toolRunner will simplify it or hide a contract bug. [email protected].