Skip to main content

Cost & Auto-Resume

ZenSearch enforces dollar ceilings on every agent run — pre-flight and live — and pauses runs that hit a budget cap so they can be resumed instead of failing outright.

This page covers the user-visible behavior. Self-hosters can tune the underlying limits via environment variables; see the deployment configuration reference.

What's enforced per run

Every agent run carries five budgets:

BudgetWhat it bounds
CostEstimated + actual model spend in USD
IterationsNumber of model-invocation cycles
Tool callsTotal external tool invocations
TokensCumulative LLM tokens across the run
Wall-clockMaximum total run time

When any budget is exceeded, the run synthesizes a partial answer with the budget reason attached and saves a resumable checkpoint. This is the soft-limit pause — and it's recoverable, not a hard failure.

Budget tiers

The agent is given a budget tier appropriate to the request:

TierUse case
FactoidQuick lookups, single-fact answers
ProceduralStep-by-step how-to walkthroughs
ExploratoryResearch-style questions across many sources
ComparativeSide-by-side comparisons of options or sources
AutomationScheduled / event-triggered runs (the deepest tier — non-interactive, the answer is the deliverable)

The tier is picked automatically from the request. Self-hosters can also set a generous floor in deployment config — tier assignment only ever bumps budgets up from that floor, never down.

Pre-flight cost gate

Before any tokens are spent, the platform estimates the run's worst-case cost. If the estimate exceeds your per-run cap, the run is rejected before the first model call. If a per-team daily cap is set, the day's existing spend is included in the comparison.

This catches obvious cost mistakes (the wrong model, an oversized synthesis) without burning model budget to discover them.

Live cost gate

If actual spend crosses the cap mid-run, the agent finalizes a partial answer with truncation_reason="cost" and saves the resumable checkpoint. The chat UI cost meter updates live every few iterations during the run so you can see where the budget is going.

Resuming a paused run

Each paused run gets a Continue button in the chat UI. Clicking it:

  1. Rehydrates the run state from the checkpoint
  2. Extends every budget (cost, iterations, tokens, wall-clock) by a fresh allowance
  3. Re-enters the agent graph from where it stopped, with all prior research preserved

Resumes are bounded — there's a configurable cap on how many times a single run can be resumed before the platform abandons it (to prevent unbounded cost accumulation across many resume clicks).

For scheduled automations, pause-and-resume is automatic on the next scheduler tick — no human button click required, also bounded by a per-automation max-resume-attempts setting.

Pause states you might see

StateMeaning
Paused — costHit per-run or per-team-day cost cap
Paused — iterationsHit max model-invocation cycles
Paused — tool callsHit max external tool invocations
Paused — tokensHit cumulative-token cap
Paused — wall-clockHit max total run time
Paused — awaiting approvalWaiting on a human approval decision
SupersededA child resume took over — see the child run for the real outcome
AbandonedResume cap exceeded; no further resumes will be attempted

All of these states are visible in the run trace so you can see the full pause/resume chain.

Per-tool timeouts

Beyond the run-level wall-clock budget, individual tool calls also carry a wall-clock cap so one slow connector can't starve fast ones. Parallel tool batches share the remaining run time fairly. A tool that times out is automatically retried once before the failure is surfaced to the agent.

Best practices

  1. Watch the cost meter on long agent runs — if it's pacing toward the cap, consider rephrasing the question more narrowly rather than letting the budget exhaust.
  2. Set a per-team daily cap in addition to per-run cap if you're rolling out agents to a wide audience for the first time.
  3. For automations, give each automation a generous max-resume-attempts (3 is the typical default) so transient pauses don't kill scheduled work.
  4. If you hit cost-pause routinely on a particular workflow, consider pinning the agent to a cheaper model alias or breaking the work into smaller steps.
  • Agents — overall agent framework
  • Human Approval — pause-and-resume cycle for human decisions (a different kind of pause)
  • AI Models — which model alias the agent uses