Cost & Auto-Resume

ZenSearch enforces dollar ceilings on every agent run — pre-flight and live — and pauses runs that hit a budget cap so they can be resumed instead of failing outright.

This page covers the user-visible behavior. Self-hosters can tune the underlying limits via environment variables; see the deployment configuration reference.

What's enforced per run

Every agent run carries five budgets:

Budget	What it bounds
Cost	Estimated + actual model spend in USD
Iterations	Number of model-invocation cycles
Tool calls	Total external tool invocations
Tokens	Cumulative LLM tokens across the run
Wall-clock	Maximum total run time

When any budget is exceeded, the run synthesizes a partial answer with the budget reason attached and saves a resumable checkpoint. This is the soft-limit pause — and it's recoverable, not a hard failure.

Budget tiers

The agent is given a budget tier appropriate to the request:

Tier	Use case
Factoid	Quick lookups, single-fact answers
Procedural	Step-by-step how-to walkthroughs
Exploratory	Research-style questions across many sources
Comparative	Side-by-side comparisons of options or sources
Automation	Scheduled / event-triggered runs (the deepest tier — non-interactive, the answer is the deliverable)

The tier is picked automatically from the request. Self-hosters can also set a generous floor in deployment config — tier assignment only ever bumps budgets up from that floor, never down.

Pre-flight cost gate

Before any tokens are spent, the platform estimates the run's worst-case cost. If the estimate exceeds your per-run cap, the run is rejected before the first model call. If a per-team daily cap is set, the day's existing spend is included in the comparison.

This catches obvious cost mistakes (the wrong model, an oversized synthesis) without burning model budget to discover them.

Live cost gate

If actual spend crosses the cap mid-run, the agent finalizes a partial answer with truncation_reason="cost" and saves the resumable checkpoint. The chat UI cost meter updates live every few iterations during the run so you can see where the budget is going.

Resuming a paused run

Each paused run gets a Continue button in the chat UI. Clicking it:

Rehydrates the run state from the checkpoint
Extends every budget (cost, iterations, tokens, wall-clock) by a fresh allowance
Re-enters the agent graph from where it stopped, with all prior research preserved

Resumes are bounded — there's a configurable cap on how many times a single run can be resumed before the platform abandons it (to prevent unbounded cost accumulation across many resume clicks).

For scheduled automations, pause-and-resume is automatic on the next scheduler tick — no human button click required, also bounded by a per-automation max-resume-attempts setting.

Pause states you might see

State	Meaning
Paused — cost	Hit per-run or per-team-day cost cap
Paused — iterations	Hit max model-invocation cycles
Paused — tool calls	Hit max external tool invocations
Paused — tokens	Hit cumulative-token cap
Paused — wall-clock	Hit max total run time
Paused — awaiting approval	Waiting on a human approval decision
Superseded	A child resume took over — see the child run for the real outcome
Abandoned	Resume cap exceeded; no further resumes will be attempted

All of these states are visible in the run trace so you can see the full pause/resume chain.

Per-tool timeouts

Beyond the run-level wall-clock budget, individual tool calls also carry a wall-clock cap so one slow connector can't starve fast ones. Parallel tool batches share the remaining run time fairly. A tool that times out is automatically retried once before the failure is surfaced to the agent.

Best practices

Watch the cost meter on long agent runs — if it's pacing toward the cap, consider rephrasing the question more narrowly rather than letting the budget exhaust.
Set a per-team daily cap in addition to per-run cap if you're rolling out agents to a wide audience for the first time.
For automations, give each automation a generous max-resume-attempts (3 is the typical default) so transient pauses don't kill scheduled work.
If you hit cost-pause routinely on a particular workflow, consider pinning the agent to a cheaper model alias or breaking the work into smaller steps.

Agents — overall agent framework
Human Approval — pause-and-resume cycle for human decisions (a different kind of pause)
AI Models — which model alias the agent uses

What's enforced per run​

Budget tiers​

Pre-flight cost gate​

Live cost gate​

Resuming a paused run​

Pause states you might see​

Per-tool timeouts​

Best practices​

Related​