● July 4, 2026 Product · 13 min

Inside mcp.llm4agents.com: 67 tools, one endpoint, two ways to pay

One Streamable HTTP endpoint. Sixty-seven documented tools across twelve families. Two ways to pay for every call: a Bearer key against a prepaid balance, or an x402 signed payment from an agent that has no account at all. This is the guided tour of mcp.llm4agents.com that most of our API users never got.

The pattern we see in support conversations is consistent. An operator connects the MCP server for one thing — usually fetch_html — and months later still has no idea that the same endpoint would have solved four other problems they ended up wiring through external SaaS: a search API subscription, an embeddings pipeline, a transactional email account, a CAPTCHA-solving service. The API docs carry the full schemas and current prices; this post is the map that tells you what exists and what each family is for.

One endpoint, two payment modes

Everything lives at a single URL, speaking the MCP Streamable HTTP transport in JSON response mode: you POST a JSON-RPC request, you get a JSON object back. No SSE stream to manage, no session dance for simple calls.

MCP Endpoint: https://mcp.llm4agents.com/mcp
Method:       POST
Auth:         Authorization: Bearer YOUR_API_KEY   (or X-PAYMENT for x402 walk-up)
Protocol:     MCP Streamable HTTP (JSON response mode)

A tool call is the standard MCP envelope:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "ai_classify",
    "arguments": { "text": "Refund my annual plan", "labels": ["billing", "support", "sales"] }
  }
}

Billing follows the same reserve-settle pattern as chat completions on the gateway: worst case is reserved up front, the actual cost is settled after the call, and the X-Cost-Usd-Cents response header tells you exactly what you paid. The authentication is the same API key you already use for /v1/chat/completions — one key, one balance, tokens and tools on the same meter.

The catalog at a glance

Counting what the documentation describes today, twelve families and 67 callable tools:

Family        Tools  Anchor prices (Bearer)
Scraper           6  fetch_html $0.0007 · screenshot $0.0010 · extract $0.0012
Sessions          4  reserve worst-case, settle actual (see below)
Search            4  google_search / news / maps $0.0012 flat · batch ×100
Image             3  generate $0.01–0.02 · edit $0.02 · analyze $0.006
Workspace        10  upload $0.0001/MB · storage ~$0.03/GB-month
AI                8  embed 0.1¢ · summarize 0.5¢ · classify 1¢ · STT 1.5¢/MB
Notify            6  telegram/discord/slack/webhook 1¢ · email 2¢ · sms 3¢
Data             11  dns/ip free · unfurl/rss/whois/crypto/fx/qr 1¢
Vector            3  upsert 0.5¢/100 items · query 1¢ · delete free
Web crawl         1  0.5¢/page (min 2¢), balance-only
Memory            4  set 1¢ (≤64 KB JSON, TTL) · get/list/delete free
Web3              4  balance/tx/nft/ens 1¢ · eth, polygon, base, solana
Document          3  pdf_parse 0.5¢/page · doc_extract 0.5¢/unit · article 1¢

The free operations — dns_lookup, ip_geolocate, memory_get, vector_delete, the workspace listing calls, CAPTCHA result polling — are rate-limited at 60 requests per minute per agent. Everything else bills sub-cent amounts against the same balance your inference runs on. Prices are the current documented defaults and the platform reserves the right to change them; treat the numbers here as anchors, not contracts.

Scraping: one-shots, proxy tiers, sessions

The scraper family is the oldest and the most used. Six one-shot tools — fetch_html, markdown, links, screenshot, pdf, extract — each open a headless browser, do one thing, and close it. Every one takes a proxy_tier parameter: none, datacenter, or residential, priced accordingly ($0.0007 to $0.0042 per call depending on tool and tier). With auto_fallback: true, a failed fetch escalates through the chain none → datacenter → residential, and you are billed at the tier that actually returned the page — not the one you asked for.

When one shot is not enough — login walls, multi-step forms, anything where page two depends on what you did on page one — you upgrade to a session. Four tools manage the lifecycle: session_create, session_exec, session_status, session_close. A session keeps the browser alive for up to five minutes and fifty actions, with two concurrent sessions per agent. Cost is reserved at worst case when you create it and settled to actual usage on close:

// session_close returns the settled reality
{ "duration_ms": 45000, "actions_count": 12, "cost_cents": 1.74 }

The documented anchors give you a feel for the range: a 30-second session with 3 actions settles around $0.009 with no proxy, while a maxed-out session — five minutes, 50 actions, residential proxy — tops out near $0.099. The other limits worth memorizing: 30-second timeout per tool call inside a session, 5 MB maximum payload, and the two-session concurrency cap, which doubles as a blast-radius control if an agent goes off script.

The rule of thumb: three or more page interactions against the same origin, use a session. Below that, one-shots are cheaper and simpler.

Search, images, and the Workers AI primitives

Three search tools — google_search, google_news, google_maps — return structured results at a flat $0.0012 per call, no browser involved. The sleeper hit is google_batch_search: up to 100 queries in a single call at $0.0012 each, one HTTP round trip instead of a hundred. A research agent that fans out twenty query variations pays 2.4 cents and gets everything back in one response.

All three share the same parameter surface: gl and hl for country and language, location for a geographic hint, pagination, and tbs date filters — qdr:h, qdr:d, qdr:w for past hour, day, week. That last one is what turns google_news into a monitoring primitive: a nightly cron that batches fifty qdr:d queries costs six cents and covers an entire competitive landscape.

The image family covers the generative loop: generate_image ($0.01 up to 1.5 megapixels, $0.02 above), edit_image ($0.02 flat), and analyze_image ($0.006), the last one being a vision-model Q&A over any URL or base64 image.

The AI family is eight inference primitives backed by Cloudflare Workers AI: ai_summarize and ai_translate at 0.5¢, ai_embed at 0.1¢ for 768-dimensional vectors, ai_classify, ai_moderate, and ai_rerank at 1¢, image_to_text at 2¢, and speech_to_text — Whisper, metered at 1.5¢ per MB of audio. The point of these is economic, and it is the same point we made in the fleet economics post: an agent should not spend frontier-model tokens on mechanical transforms. Classifying a support ticket with ai_classify costs a fixed cent; doing it inside a Claude context costs whatever the surrounding conversation costs, every time.

The batch limits are generous enough to matter: ai_embed accepts up to 100 strings per call, ai_rerank scores up to 100 candidate documents against a query, and ai_classify takes up to 20 candidate labels. A retrieval pipeline that embeds a hundred chunks pays 0.1¢ — once — for the lot.

The RAG spine: workspace, documents, vector, memory

Four families compose into a complete retrieval pipeline with no external vendor.

The workspace is a private per-agent file store backed by Cloudflare R2: ten tools covering upload (inline up to 10 MB, or an init/finalize flow with a single-use presigned PUT for anything larger), download, copy, extend, and lifecycle. Storage runs about $0.03 per GB-month. One design decision worth knowing before you build on it: direct R2 URLs are never exposed. Downloads route through the platform worker as single-use tokens — the second hit on the same URL returns 410. That makes a workspace URL safe to hand to an end user as a citation link without creating a permanent public artifact.

The docs publish worked examples that are worth internalizing, because the per-MB rates are small enough to mislead: uploading 1 MB for a day hits the 1¢ minimum that applies to all paid workspace operations; 100 MB stored for 30 days runs about $0.03; a full GB for a month is roughly $0.14, and downloading that GB back out costs about $0.05. For agent artifacts — scrape outputs, generated reports, screenshots — the workspace is effectively free; the minimum is the price you actually pay.

The document tools turn files into text: pdf_parse (0.5¢ per page, reading from a workspace file or a URL), doc_extract for docx/xlsx/csv, and article_extract for reader-mode markdown from any web article. The vector store is Cloudflare Vectorize underneath, with bge 768-dim embeddings: vector_upsert auto-embeds text at 0.5¢ per 100 items, vector_query runs a similarity search with metadata filters at 1¢. And memory is a key-value store for durable agent state: memory_set holds up to 64 KB of JSON with an optional TTL at 1¢ per write; reads are free.

The metadata filters on vector_query are the multi-tenant story: one corpus structure, one agent, with each query scoped to { tenant: customer_id } at retrieval time. On platforms that weld storage and retrieval together you build this isolation yourself; here it is a filter argument.

The whole ingest path, end to end:

await mcp.workspace_upload({ filename: 'handbook.pdf', content_base64, days_to_store: 90 });
const parsed = await mcp.pdf_parse({ workspace_file: 'handbook.pdf' });
await mcp.vector_upsert({ items: chunk(parsed.text).map((text, i) => ({ id: `hb-${i}`, text })) });

// at answer time, the model calls this itself if it's in the allowlist
await mcp.vector_query({ query: 'refund policy on annual plans', top_k: 5 });

There is also web_crawl, a breadth-first site crawler with page and depth limits that can render JavaScript and save its output — concatenated markdown or a link map — straight into the workspace, at 0.5¢ per page crawled. Ingesting a documentation site into a RAG corpus is one call.

Notify, data utilities, and web3 reads

The notify family closes the loop with humans: send_telegram, send_discord, send_slack, and webhook_post at 1¢ (you supply the bot token or webhook URL), send_email at 2¢ and send_sms at 3¢ through the platform's own providers. A detail that matters for anyone running untrusted prompts: webhook_post is SSRF-guarded — requests to private, loopback, and link-local address ranges are rejected before any outbound call is made. A prompt-injected agent cannot use it to probe your internal network.

The data family is eleven small utilities that agents otherwise reimplement badly: free dns_lookup and ip_geolocate, plus url_unfurl, rss_parse, youtube_transcript, whois, crypto_price, fx_convert, and qr_generate at 1¢ each, and an async CAPTCHA pair — captcha_solve_create at 2¢ submits a task for reCAPTCHA v2, hCaptcha, or Turnstile, and captcha_solve_result polls it for free.

The web3 family is deliberately boring, and that is its feature: token_balance, tx_status, nft_metadata, and ens_resolve are read-only lookups across Ethereum, Polygon, Base, and Solana at 1¢ each. These tools never sign or send transactions. An agent that needs to verify an on-chain payment landed can check tx_status without anyone having to hand it a key.

Bearer vs x402: the ten percent, and the exceptions

Almost every paid tool accepts two payment modes. Bearer mode debits your prepaid balance — the classic developer experience. x402 walk-up mode lets an agent with no account pay per call with a signed stablecoin authorization, and every walk-up price is 10 percent below the Bearer price: 1¢ becomes 0.9¢, ai_embed's 0.1¢ becomes 0.09¢, Whisper's 1.5¢/MB becomes 1.35¢/MB. Three tools are balance-only and take no x402 at all: web_crawl, pdf_parse, and doc_extract.

The walk-up flow is easy to see live, because the server quotes it to anyone. Send an unauthenticated request to the endpoint and you get HTTP 402 with a payment-required header; base64-decode it and this is what came back when we did exactly that while writing this post:

$ curl -i -X POST https://mcp.llm4agents.com/mcp -d '{"jsonrpc":"2.0", ...}'
HTTP/2 402

// payment-required header, decoded:
{
  "x402Version": 2,
  "accepts": [{
    "scheme": "exact",
    "network": "eip155:8453",                                  // Base mainnet
    "amount": "10000",                                          // $0.01 in 6-decimal USDC units
    "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",      // USDC on Base
    "payTo": "0x0D741Ab0968906f8338C60f79b81B49c23258C12",
    "maxTimeoutSeconds": 300,
    "extra": { "name": "USD Coin", "version": "2" }         // EIP-712 domain
  }]
}

Every field is doing work. The asset is the canonical USDC contract on Base. The extra block is the EIP-712 signing domain the client needs to build a valid ERC-3009 transferWithAuthorization: a signed message with a random 32-byte nonce and a validAfter/validBefore window, which is why the quote carries a 300-second timeout. The agent signs, retries the request with the authorization in the X-PAYMENT header, and the call executes — no account, no balance, no relationship. The mechanics are the x402 standard, now a Linux Foundation project, and the full walkthrough is in our x402 deep dive.

Which mode is yours? — Bearer if you operate the agent and want one balance, one invoice, and the 60-req/min free tier. x402 walk-up if your agent serves third parties who should pay their own way, or if it roams infrastructure it has no account on. The 10 percent discount rewards the mode where settlement is immediate and final.

Sixty-seven tools is an attack surface

A model that can see 67 tools can be talked into calling 67 tools. The threat is not theoretical — indirect prompt injection through scraped pages is exactly the input this server processes all day. The discipline, unchanged from our threat model post, is scope minimization: pass an allowlist so each agent sees only what its job requires.

const conv = client.chat.conversation({
  model: 'anthropic/claude-sonnet-4.6',
  system: '...',
  tools: {
    mcp: {
      url: 'https://mcp.llm4agents.com/mcp',
      // research agent: read the web, build the corpus — nothing else
      allow: ['google_search', 'google_batch_search', 'markdown',
              'article_extract', 'vector_upsert', 'vector_query'],
    },
  },
});

A triage agent gets memory_set, memory_get, ai_classify, send_email and nothing more. No notify tools on anything that reads untrusted input unless the notification target is pinned. The platform-side limits help — SSRF guards on webhook_post, read-only web3, two concurrent sessions, rate-limited free calls — but the allowlist is the control you own.

What is on the roadmap

Three tools are planned but not live: code_exec (sandboxed code execution), schedule_task (deferred and recurring runs), and semantic_memory (retrieval-based memory beyond the key-value store). Alongside them, the Agent Builder visual flow and Agent Cron remain in development, as noted in the migration runbook. Until they ship, the working substitutes are the ones operators already use: external schedulers for cron, memory_set plus vector_query for memory. Do not build against roadmap items; build against the 67 that answer today.

What it means for LLM4Agents

The MCP server is the second leg of the platform, and structurally it is the more interesting one. The gateway sells tokens; the MCP server sells actions. Both run on the same key, the same balance, the same reserve-settle billing, the same cost headers. That composition is what separates an agent platform from a model reseller: the agent that reasons on the gateway and acts through the MCP server never leaves the billing perimeter, which is why per-task cost accounting — the number every operator actually manages — stays coherent.

The x402 lane is the strategic bet. Every tool on this server is individually machine-payable by any agent on the internet holding a wallet — no signup, no API key issuance, no billing relationship. When we argue that machine-to-machine payments are the substrate of the agentic economy, this catalog is the argument deployed: 60-plus priced endpoints where the entire commercial relationship is an HTTP 402 and a signature.

Staying on the frontier

What this catalog needs next, in order:

Ship the three roadmap tools. code_exec, schedule_task, and semantic_memory are the gaps operators currently leave the platform to fill, and each departure drags part of the workload to infrastructure we do not meter.
Make the catalog discoverable without payment. Today an unauthenticated tools/list returns 402. A free, machine-readable catalog with prices — and an llms.txt on the MCP domain — would let walk-up agents budget a task before spending a cent, which is exactly the behavior we should reward.
Move allowlists server-side. Scoping tools in client config is discipline; scoping them at API-key level is enforcement. Keys that can only ever call their allowlist shrink the blast radius of both prompt injection and leaked credentials.
Adopt the MCP Tasks extension when it lands. The 2026-07-28 release candidate formalizes long-running operations — a natural fit for web_crawl, CAPTCHA solves, and future code_exec runs that outlive a request cycle.
Per-tool spend caps and anomaly alerts. Reserve-settle makes every call individually accountable; the missing layer is the operator-facing budget that stops a looping agent at the 50th generate_image, not the 5,000th.

Inside mcp.llm4agents.com: 67 tools, one endpoint, two ways to pay

One endpoint, two payment modes

The catalog at a glance

Scraping: one-shots, proxy tiers, sessions

Search, images, and the Workers AI primitives

The RAG spine: workspace, documents, vector, memory

Notify, data utilities, and web3 reads

Bearer vs x402: the ten percent, and the exceptions

Sixty-seven tools is an attack surface

What is on the roadmap

What it means for LLM4Agents

Staying on the frontier

One key. Tokens and 67 tools on the same meter.