Cost & Auto-Resume
ZenSearch enforces dollar ceilings on every agent run — pre-flight and live — and pauses runs that hit a budget cap so they can be resumed instead of failing outright.
This page covers the user-visible behavior. Self-hosters can tune the underlying limits via environment variables; see the deployment configuration reference.
What's enforced per run
Every agent run carries five budgets:
| Budget | What it bounds |
|---|---|
| Cost | Estimated + actual model spend in USD |
| Iterations | Number of model-invocation cycles |
| Tool calls | Total external tool invocations |
| Tokens | Cumulative LLM tokens across the run |
| Wall-clock | Maximum total run time |
When any budget is exceeded, the run synthesizes a partial answer with the budget reason attached and saves a resumable checkpoint. This is the soft-limit pause — and it's recoverable, not a hard failure.
Budget tiers
The agent is given a budget tier appropriate to the request:
| Tier | Use case |
|---|---|
| Factoid | Quick lookups, single-fact answers |
| Procedural | Step-by-step how-to walkthroughs |
| Exploratory | Research-style questions across many sources |
| Comparative | Side-by-side comparisons of options or sources |
| Automation | Scheduled / event-triggered runs (the deepest tier — non-interactive, the answer is the deliverable) |
The tier is picked automatically from the request. Self-hosters can also set a generous floor in deployment config — tier assignment only ever bumps budgets up from that floor, never down.
Pre-flight cost gate
Before any tokens are spent, the platform estimates the run's worst-case cost. If the estimate exceeds your per-run cap, the run is rejected before the first model call. If a per-team daily cap is set, the day's existing spend is included in the comparison.
This catches obvious cost mistakes (the wrong model, an oversized synthesis) without burning model budget to discover them.
Live cost gate
If actual spend crosses the cap mid-run, the agent finalizes a partial answer with truncation_reason="cost" and saves the resumable checkpoint. The chat UI cost meter updates live every few iterations during the run so you can see where the budget is going.
Resuming a paused run
Each paused run gets a Continue button in the chat UI. Clicking it:
- Rehydrates the run state from the checkpoint
- Extends every budget (cost, iterations, tokens, wall-clock) by a fresh allowance
- Re-enters the agent graph from where it stopped, with all prior research preserved
Resumes are bounded — there's a configurable cap on how many times a single run can be resumed before the platform abandons it (to prevent unbounded cost accumulation across many resume clicks).
For scheduled automations, pause-and-resume is automatic on the next scheduler tick — no human button click required, also bounded by a per-automation max-resume-attempts setting.
Pause states you might see
| State | Meaning |
|---|---|
| Paused — cost | Hit per-run or per-team-day cost cap |
| Paused — iterations | Hit max model-invocation cycles |
| Paused — tool calls | Hit max external tool invocations |
| Paused — tokens | Hit cumulative-token cap |
| Paused — wall-clock | Hit max total run time |
| Paused — awaiting approval | Waiting on a human approval decision |
| Superseded | A child resume took over — see the child run for the real outcome |
| Abandoned | Resume cap exceeded; no further resumes will be attempted |
All of these states are visible in the run trace so you can see the full pause/resume chain.
Per-tool timeouts
Beyond the run-level wall-clock budget, individual tool calls also carry a wall-clock cap so one slow connector can't starve fast ones. Parallel tool batches share the remaining run time fairly. A tool that times out is automatically retried once before the failure is surfaced to the agent.
Best practices
- Watch the cost meter on long agent runs — if it's pacing toward the cap, consider rephrasing the question more narrowly rather than letting the budget exhaust.
- Set a per-team daily cap in addition to per-run cap if you're rolling out agents to a wide audience for the first time.
- For automations, give each automation a generous max-resume-attempts (3 is the typical default) so transient pauses don't kill scheduled work.
- If you hit cost-pause routinely on a particular workflow, consider pinning the agent to a cheaper model alias or breaking the work into smaller steps.
Related
- Agents — overall agent framework
- Human Approval — pause-and-resume cycle for human decisions (a different kind of pause)
- AI Models — which model alias the agent uses