Documentation / API Reference

API Reference

Complete REST API documentation for WebPeel. All endpoints return JSON and use standard HTTP status codes.

Base URL

Production: https://api.webpeel.dev
Self-hosted: http://localhost:3000 (configurable via PORT env var)

GET/v1/fetch POST/v1/fetch GET/v1/search POST/v1/screenshot POST/v1/answer POST/v1/agent POST/v1/crawl POST/v1/batch/scrape POST/v1/map POST/v1/scrape POST/v2/scrape POST/mcp GET/health GET/v1/stats GET/v1/cli/usage

Authentication

Pass your API key via the Authorization header or the X-API-Key header:

# Option 1: Bearer token (recommended)
curl https://api.webpeel.dev/v1/fetch?url=https://example.com \
  -H "Authorization: Bearer wp_live_abc123..."

# Option 2: X-API-Key header
curl https://api.webpeel.dev/v1/fetch?url=https://example.com \
  -H "X-API-Key: wp_live_abc123..."

Anonymous Access

No API key is required for basic usage. Anonymous requests are subject to the free-tier rate limits (25 requests/hour). Sign up for an API key to get a weekly quota and access the dashboard.

GET /v1/fetch

Fetch and extract clean, AI-ready content from any URL. WebPeel automatically escalates from fast HTTP to browser rendering to stealth mode as needed.

GET /v1/fetch

Auth optional 1 credit

Query Parameters

Parameter	Type	Default	Description
`url` required	string	—	URL to fetch (max 2048 characters). Must be `http://` or `https://`.
`format`	string	`markdown`	Output format: `markdown`, `text`, or `html`
`render`	boolean	`false`	Force headless browser rendering (for JS-heavy sites, SPAs)
`wait`	number	`0`	Milliseconds to wait after page load (0–60000). Only applies when `render=true`.
`includeTags`	string	—	Comma-separated HTML tags/selectors to include (e.g. `article,main`)
`excludeTags`	string	—	Comma-separated HTML tags/selectors to remove (e.g. `nav,footer,.sidebar`)
`onlyMainContent`	boolean	`false`	Shortcut: sets `includeTags` to `main,article,.content,#content`
`images`	boolean	`false`	Extract image URLs from the page
`location`	string	—	ISO 3166-1 alpha-2 country code for geo-targeting (e.g. `US`, `DE`)
`languages`	string	—	Comma-separated language preferences (e.g. `en,de`)
`actions`	string	—	JSON-encoded array of page actions to execute before extraction. Auto-enables `render`.
`maxAge`	number	`172800000`	Cache freshness threshold in milliseconds (default 2 days). Set to `0` to bypass cache.
`storeInCache`	boolean	`true`	Set to `false` to skip caching the response

Response

{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "metadata": {
    "description": "Example Domain",
    "author": null,
    "published": null,
    "image": null,
    "canonical": "https://example.com"
  },
  "links": ["https://www.iana.org/domains/example"],
  "tokens": 125,
  "method": "simple",
  "elapsed": 118,
  "contentType": "html",
  "quality": 0.95,
  "fingerprint": "a1b2c3d4e5f6g7h8"
}

Response fields

Field	Type	Description
`url`	string	Final URL after redirects
`title`	string	Page title
`content`	string	Extracted content in the requested format
`metadata`	object	Structured metadata: `description`, `author`, `published`, `image`, `canonical`
`links`	string[]	All links found on the page (absolute, deduplicated)
`tokens`	number	Estimated token count (content.length / 4)
`method`	string	Fetch method used: `simple`, `browser`, or `stealth`
`elapsed`	number	Processing time in milliseconds
`contentType`	string	Detected type: `html`, `json`, `xml`, `text`, `document`
`quality`	number	Extraction quality score (0–1)
`fingerprint`	string	SHA-256 hash prefix for change detection
`images`	object[]	Image info (only when `images=true`): `src`, `alt`, `title`, `width`, `height`
`screenshot`	string	Base64 PNG screenshot (only when `screenshot=true` via POST)

Examples

# Basic fetch
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"

# With browser rendering
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com&render=true"

# Extract only main content, exclude nav/footer
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com&onlyMainContent=true&excludeTags=nav,footer"

# With authentication
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer YOUR_API_KEY"

const response = await fetch(
  'https://api.webpeel.dev/v1/fetch?url=https://example.com&format=markdown',
  {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  }
);
const data = await response.json();

console.log(data.title);    // "Example Domain"
console.log(data.content);  // Markdown content
console.log(data.method);   // "simple" | "browser" | "stealth"

import urllib.request, json

url = "https://api.webpeel.dev/v1/fetch?url=https://example.com"
req = urllib.request.Request(url)
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    print(data["title"])    # "Example Domain"
    print(data["content"])  # Markdown content

Error Responses

Status	Code	Description
400	`invalid_request`	Missing or invalid `url`, `format`, `wait`, or `actions` parameter
400	`invalid_url`	URL too long (>2048 chars) or malformed
400	`forbidden_url`	Attempt to fetch localhost, private networks, or non-HTTP URLs (SSRF protection)
429	`rate_limited`	Hourly burst limit exceeded
500	`TIMEOUT`	Request timed out
500	`BLOCKED`	Site blocked the request — try `render=true` or stealth mode
500	`internal_error`	Unexpected server error

POST /v1/fetch

Same as GET /v1/fetch but accepts a JSON body. Adds support for page actions, inline LLM extraction (BYOK), and the Firecrawl-compatible formats array.

POST /v1/fetch

Auth optional 1 credit

Request Body (JSON)

Field	Type	Default	Description
`url` required	string	—	URL to fetch (max 2048 characters)
`render`	boolean	`false`	Force browser rendering
`wait`	number	`0`	Wait time in ms after page load (0–60000)
`format`	string	`markdown`	`markdown`, `text`, or `html`
`includeTags`	string[]	—	Array of HTML tags/selectors to include
`excludeTags`	string[]	—	Array of HTML tags/selectors to remove
`onlyMainContent`	boolean	`false`	Include only main content tags
`images`	boolean	`false`	Extract image URLs
`location`	string	—	Country code for geo-targeting
`languages`	string[]	—	Language preferences
`actions`	object[]	—	Page actions to execute before extraction. Auto-enables `render`.
`storeInCache`	boolean	`true`	Store result in server-side cache
Inline LLM Extraction (BYOK)
`extract`	object	—	Object with `schema` (JSON Schema) and/or `prompt` (string) for LLM extraction
`llmProvider`	string	—	Required when `extract` is set. One of: `openai`, `anthropic`, `google`
`llmApiKey`	string	—	Required when `extract` is set. Your LLM API key (BYOK)
`llmModel`	string	Provider default	LLM model name (optional)
`formats`	array	—	Firecrawl-compatible formats array. Supports `{type: "json", schema: {...}}` for extraction.

Response

Same as GET /v1/fetch, plus these additional fields when inline extraction is used:

{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "# Example Domain\n...",
  "json": {
    "company": "Example Corp",
    "founded": 2006
  },
  "extractTokensUsed": { "input": 1234, "output": 56 },
  ...
}

Examples

# Basic POST fetch
curl -X POST https://api.webpeel.dev/v1/fetch \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com",
    "format": "markdown",
    "onlyMainContent": true
  }'

# With inline LLM extraction
curl -X POST https://api.webpeel.dev/v1/fetch \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/about",
    "extract": {
      "schema": {
        "type": "object",
        "properties": {
          "company": { "type": "string" },
          "founded": { "type": "number" }
        }
      }
    },
    "llmProvider": "openai",
    "llmApiKey": "sk-..."
  }'

const response = await fetch('https://api.webpeel.dev/v1/fetch', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    url: 'https://example.com/about',
    extract: {
      schema: {
        type: 'object',
        properties: {
          company: { type: 'string' },
          founded: { type: 'number' }
        }
      }
    },
    llmProvider: 'openai',
    llmApiKey: 'sk-...'
  })
});

const data = await response.json();
console.log(data.json);  // { company: "Example Corp", founded: 2006 }

import urllib.request, json

body = json.dumps({
    "url": "https://example.com/about",
    "extract": {
        "schema": {
            "type": "object",
            "properties": {
                "company": {"type": "string"},
                "founded": {"type": "number"}
            }
        }
    },
    "llmProvider": "openai",
    "llmApiKey": "sk-..."
}).encode()

req = urllib.request.Request(
    "https://api.webpeel.dev/v1/fetch",
    data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    print(data["json"])  # {"company": "Example Corp", "founded": 2006}

Page Actions

Page actions let you interact with a page (click, type, scroll, wait) before content is extracted. Actions auto-enable browser rendering. Compatible with Firecrawl's action format.

Action Type	Fields	Description
`click`	`selector`	Click an element
`type`	`selector`, `value`/`text`	Type text into an input (keystroke by keystroke)
`fill`	`selector`, `value`/`text`	Fill an input (set value directly)
`select`	`selector`, `value`	Select a dropdown option
`scroll`	`direction`, `amount`	Scroll the page. Direction: `up`, `down`, `left`, `right`
`wait`	`ms`/`milliseconds`	Wait a fixed duration
`waitForSelector`	`selector`, `timeout`	Wait for an element to appear
`press`	`key`	Press a keyboard key (e.g. `Enter`)
`hover`	`selector`	Hover over an element
`screenshot`	—	Take a screenshot at this point

{
  "url": "https://example.com/search",
  "actions": [
    { "type": "fill", "selector": "#search-input", "value": "AI agents" },
    { "type": "click", "selector": "#search-btn" },
    { "type": "wait", "ms": 2000 },
    { "type": "scroll", "direction": "down", "amount": 500 }
  ]
}

GET /v1/search

Search the web using DuckDuckGo (free, default) or Brave Search (BYOK for higher-quality results). Optionally scrape each result URL for full content.

GET /v1/search

Auth optional 1 credit

Query Parameters

Parameter	Type	Default	Description
`q` required	string	—	Search query (max 500 characters)
`provider`	string	`duckduckgo`	Search provider: `duckduckgo` or `brave`
`searchApiKey`	string	—	Brave Search API key. Alternative: `x-search-api-key` header.
`count`	number	`5`	Number of results (1–10)
`scrapeResults`	boolean	`false`	When `true`, fetches each result URL and adds `content` field
`sources`	string	`web`	Comma-separated: `web`, `news`, `images`
`categories`	string	—	Filter by category: `github`, `pdf`, `docs`, `blog`, `news`, `video`, `social`
`tbs`	string	—	Time filter: `qdr:d` (day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year)
`country`	string	—	Country/region code (e.g. `us-en`, `de-de`)
`location`	string	—	Location string for localized results

Response

{
  "success": true,
  "data": {
    "web": [
      {
        "title": "Example Domain",
        "url": "https://example.com",
        "snippet": "This domain is for use in illustrative examples...",
        "content": "# Example Domain\n..."  // Only if scrapeResults=true
      }
    ],
    "news": [ ... ],   // Only if sources includes "news"
    "images": [ ... ]  // Only if sources includes "images"
  }
}

Examples

# Basic search
curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"

# With Brave Search (BYOK)
curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&provider=brave" \
  -H "x-search-api-key: YOUR_BRAVE_KEY"

# Search + scrape results
curl "https://api.webpeel.dev/v1/search?q=MCP+protocol&scrapeResults=true&count=3"

const resp = await fetch(
  'https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5',
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);
const { data } = await resp.json();

for (const result of data.web) {
  console.log(`${result.title}: ${result.url}`);
}

import urllib.request, json

url = "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"
req = urllib.request.Request(url)
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    for r in data["data"]["web"]:
        print(f"{r['title']}: {r['url']}")

POST /v1/screenshot

Take a screenshot of any URL and return a base64-encoded image. Supports full-page capture, custom viewport sizes, page actions, and stealth mode.

POST /v1/screenshot

Auth optional 1 credit (stealth)

Request Body (JSON)

Field	Type	Default	Description
`url` required	string	—	URL to screenshot (max 2048 characters)
`fullPage`	boolean	`false`	Capture the entire scrollable page
`width`	number	`1280`	Viewport width in pixels (100–5000)
`height`	number	`720`	Viewport height in pixels (100–5000)
`format`	string	`png`	Image format: `png`, `jpeg`, or `jpg`
`quality`	number	—	Image quality for JPEG (1–100)
`waitFor`	number	`0`	Milliseconds to wait before capture (0–60000)
`timeout`	number	`30000`	Total timeout in milliseconds
`stealth`	boolean	`false`	Use stealth mode to bypass bot detection
`actions`	object[]	—	Page actions to execute before screenshot
`headers`	object	—	Custom HTTP headers
`cookies`	string[]	—	Cookies to set (key=value pairs)

Response

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "screenshot": "data:image/png;base64,iVBORw0KGgo...",
    "metadata": {
      "sourceURL": "https://example.com",
      "format": "png",
      "width": 1280,
      "height": 720,
      "fullPage": false
    }
  }
}

Examples

curl -X POST https://api.webpeel.dev/v1/screenshot \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com",
    "fullPage": true,
    "width": 1440,
    "format": "png"
  }'

const resp = await fetch('https://api.webpeel.dev/v1/screenshot', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    url: 'https://example.com',
    fullPage: true,
    width: 1440
  })
});

const { data } = await resp.json();
// data.screenshot is a data: URL — use directly in <img src="...">

import urllib.request, json, base64

body = json.dumps({
    "url": "https://example.com",
    "fullPage": True,
    "width": 1440
}).encode()

req = urllib.request.Request(
    "https://api.webpeel.dev/v1/screenshot",
    data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    # Save screenshot to file
    img_data = data["data"]["screenshot"].split(",")[1]
    with open("screenshot.png", "wb") as f:
        f.write(base64.b64decode(img_data))

POST /v1/answer

Ask a question, search the web, fetch top sources, and generate an LLM answer with inline citations ([1], [2]). BYOK — you provide your own LLM API key. Supports SSE streaming.

POST /v1/answer

Auth optional 1 credit + LLM cost (BYOK)

Request Body (JSON)

Field	Type	Default	Description
`question` required	string	—	The question to answer (max 2000 characters)
`llmProvider` required	string	—	LLM provider: `openai`, `anthropic`, or `google`
`llmApiKey` required	string	—	Your LLM API key (BYOK)
`llmModel`	string	Provider default	Model name. Defaults: `gpt-4o-mini` (OpenAI), `claude-3-5-sonnet-latest` (Anthropic), `gemini-1.5-flash` (Google)
`searchProvider`	string	`duckduckgo`	Search provider: `duckduckgo` or `brave`
`searchApiKey`	string	—	Brave Search API key (also accepted via `x-search-api-key` header)
`maxSources`	number	`5`	Maximum sources to fetch and cite (1–10)
`stream`	boolean	`false`	When `true`, stream response via SSE (`text/event-stream`)

Response (non-streaming)

{
  "answer": "MCP (Model Context Protocol) is an open standard... [1] ... [2] ...",
  "citations": [
    { "title": "Model Context Protocol", "url": "https://...", "snippet": "..." },
    { "title": "MCP Documentation", "url": "https://...", "snippet": "..." }
  ],
  "searchProvider": "duckduckgo",
  "llmProvider": "openai",
  "llmModel": "gpt-4o-mini",
  "tokensUsed": { "input": 3456, "output": 789 }
}

Response (streaming)

When stream=true, the endpoint returns text/event-stream with JSON-encoded events:

data: {"type":"chunk","text":"MCP (Model Context Protocol) is"}
data: {"type":"chunk","text":" an open standard for AI tool..."}
data: {"type":"done","citations":[...],"searchProvider":"duckduckgo","llmProvider":"openai","llmModel":"gpt-4o-mini","tokensUsed":{"input":3456,"output":789}}

Examples

curl -X POST https://api.webpeel.dev/v1/answer \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is MCP and why does it matter for AI agents?",
    "llmProvider": "openai",
    "llmApiKey": "sk-...",
    "maxSources": 5,
    "stream": false
  }'

const resp = await fetch('https://api.webpeel.dev/v1/answer', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: 'What is MCP and why does it matter for AI agents?',
    llmProvider: 'openai',
    llmApiKey: 'sk-...',
    maxSources: 5
  })
});

const data = await resp.json();
console.log(data.answer);
console.log(data.citations);

import urllib.request, json

body = json.dumps({
    "question": "What is MCP and why does it matter for AI agents?",
    "llmProvider": "openai",
    "llmApiKey": "sk-...",
    "maxSources": 5
}).encode()

req = urllib.request.Request(
    "https://api.webpeel.dev/v1/answer",
    data=body, method="POST"
)
req.add_header("Content-Type", "application/json")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    print(data["answer"])

POST /v1/agent

Run an autonomous AI research agent. It plans search queries, fetches pages, and synthesizes a structured answer using your LLM (BYOK). Supports basic and thorough research depth, topic tuning, JSON Schema output, and SSE streaming.

POST /v1/agent

Auth optional 1 credit per page fetched + LLM cost (BYOK)

Request Body (JSON)

Field	Type	Default	Description
`prompt` required	string	—	Natural language research task
`llmApiKey` required	string	—	OpenAI-compatible API key (BYOK)
`llmModel`	string	`gpt-4o-mini`	LLM model to use
`llmApiBase`	string	`https://api.openai.com/v1`	Custom LLM API base URL (for OpenRouter, local models, etc.)
`urls`	string[]	—	Starting URLs. If not provided, the agent searches the web.
`depth`	string	`basic`	Research depth: `basic` (1 query, ~3 sources) or `thorough` (multi-query, ~10 sources, gap analysis)
`topic`	string	`general`	Topic hint: `general`, `news`, `technical`, `academic`. Adjusts search queries and source prioritization.
`maxSources`	number	`5`	Max sources to fetch and analyze (1–20)
`maxCredits`	number	—	Spending cap (stops fetching when reached)
`outputSchema`	object	—	JSON Schema for structured output. Agent validates and may retry if output doesn't match.
`schema`	object	—	Legacy alias for `outputSchema`
`stream`	boolean	`false`	When `true`, stream progress and answer via SSE

Response (synchronous)

{
  "success": true,
  "data": {
    "answer": "## AI Coding Assistants Comparison\n...",
    "keyFindings": ["Finding 1", "Finding 2"]
  },
  "answer": "## AI Coding Assistants Comparison\n...",
  "sources": ["https://...", "https://..."],
  "sourcesDetailed": [
    { "url": "https://...", "title": "Source Title" }
  ],
  "pagesVisited": 5,
  "creditsUsed": 6,
  "tokensUsed": { "input": 12000, "output": 2500 }
}

SSE Streaming Events

When stream=true, the endpoint returns text/event-stream:

data: {"type":"step","action":"searching","query":"AI coding assistants 2026"}
data: {"type":"step","action":"fetching","url":"https://..."}
data: {"type":"step","action":"analyzing","summary":"Synthesizing from 5 sources..."}
data: {"type":"chunk","text":"## AI Coding Assistants\n"}
data: {"type":"chunk","text":"Here are the top 5..."}
data: {"type":"done","answer":"...","sources":[...],"tokensUsed":{...}}

Async Mode

For long-running research, use the async endpoint:

POST /v1/agent/async

Same request body as above, plus optional webhook (WebhookConfig object). Returns a job ID:

{ "success": true, "id": "job-uuid", "url": "/v1/agent/job-uuid" }

GET /v1/agent/:id

Poll job status and results.

DELETE /v1/agent/:id

Cancel a running agent job.

Example

curl -X POST https://api.webpeel.dev/v1/agent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "prompt": "Compare the top 5 AI coding assistants by features and pricing",
    "llmApiKey": "sk-...",
    "depth": "thorough",
    "topic": "technical",
    "maxSources": 10,
    "outputSchema": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "features": { "type": "array", "items": { "type": "string" } },
          "pricing": { "type": "string" }
        },
        "required": ["name", "features", "pricing"]
      }
    }
  }'

POST /v1/crawl

Start an async crawl job. Follows links from a starting URL and extracts content from each page. Returns a job ID for polling. Supports webhooks and SSE for real-time progress.

POST /v1/crawl

Auth optional 1 credit per page crawled

Request Body (JSON)

Field	Type	Default	Description
`url` required	string	—	Starting URL to crawl from
`limit`	number	`100`	Maximum pages to crawl
`maxDepth`	number	`3`	Maximum link-following depth
`scrapeOptions`	object	—	Passed through to each page fetch (e.g. `format`, `render`)
`location`	string	—	Country code for geo-targeting
`languages`	string \| string[]	—	Language preferences
`webhook`	object	—	Webhook config for progress notifications

Response (202 Accepted)

{
  "success": true,
  "id": "crawl_abc123",
  "url": "/v1/crawl/crawl_abc123"
}

GET /v1/crawl/:id — Poll Results

GET /v1/crawl/:id

Returns job status and results. Supports SSE by sending Accept: text/event-stream for real-time progress updates.

{
  "success": true,
  "status": "completed",
  "progress": 100,
  "total": 50,
  "completed": 50,
  "creditsUsed": 50,
  "expiresAt": "2026-02-16T00:00:00.000Z",
  "data": [
    {
      "url": "https://example.com",
      "title": "Home",
      "markdown": "# Home\n...",
      "links": ["https://example.com/about"]
    }
  ]
}

DELETE /v1/crawl/:id — Cancel Job

DELETE /v1/crawl/:id

Cancel a queued or in-progress crawl job.

Example

# Start crawl
curl -X POST https://api.webpeel.dev/v1/crawl \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{ "url": "https://docs.example.com", "limit": 50, "maxDepth": 3 }'

# Poll results
curl "https://api.webpeel.dev/v1/crawl/crawl_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Cancel
curl -X DELETE "https://api.webpeel.dev/v1/crawl/crawl_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

// Start crawl
const { id } = await (await fetch('https://api.webpeel.dev/v1/crawl', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({ url: 'https://docs.example.com', limit: 50 })
})).json();

// Poll until complete
let job;
do {
  await new Promise(r => setTimeout(r, 2000));
  job = await (await fetch(`https://api.webpeel.dev/v1/crawl/${id}`, {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  })).json();
  console.log(`Progress: ${job.completed}/${job.total}`);
} while (job.status === 'processing' || job.status === 'queued');

console.log(`Done! ${job.data.length} pages crawled.`);

import urllib.request, json, time

# Start crawl
body = json.dumps({"url": "https://docs.example.com", "limit": 50}).encode()
req = urllib.request.Request(
    "https://api.webpeel.dev/v1/crawl", data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    job_id = json.loads(resp.read())["id"]

# Poll until complete
while True:
    time.sleep(2)
    req = urllib.request.Request(f"https://api.webpeel.dev/v1/crawl/{job_id}")
    req.add_header("Authorization", "Bearer YOUR_API_KEY")
    with urllib.request.urlopen(req) as resp:
        job = json.loads(resp.read())
    if job["status"] in ("completed", "failed"):
        break
    print(f"Progress: {job['completed']}/{job['total']}")

print(f"Done! {len(job['data'])} pages.")

POST /v1/batch/scrape

Batch scrape multiple URLs concurrently. Creates an async job with up to 5 parallel fetches. Poll for results or use webhooks.

POST /v1/batch/scrape

Auth optional 1 credit per URL

Request Body (JSON)

Field	Type	Default	Description
`urls` required	string[]	—	Array of URLs to scrape (max 100)
`formats`	string[]	`["markdown"]`	Output formats
`maxTokens`	number	—	Max tokens per result
`extract`	object	—	Extraction options (passed through to each fetch)
`webhook`	string	—	Webhook URL for progress notifications

Response (202 Accepted)

{
  "success": true,
  "id": "batch_xyz789",
  "url": "/v1/batch/scrape/batch_xyz789"
}

GET /v1/batch/scrape/:id — Poll Results

Same response format as crawl jobs:

{
  "success": true,
  "status": "completed",
  "progress": 100,
  "total": 10,
  "completed": 10,
  "creditsUsed": 10,
  "data": [ { "url": "...", "content": "...", ... }, ... ]
}

DELETE /v1/batch/scrape/:id — Cancel Job

Cancel a running batch job.

Example

curl -X POST https://api.webpeel.dev/v1/batch/scrape \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "formats": ["markdown"]
  }'

POST /v1/map

Discover all URLs on a domain using sitemap.xml and link crawling. Returns a flat list of URLs without fetching page content. Much faster and cheaper than a full crawl.

POST /v1/map

Auth optional 1 credit

Request Body (JSON)

Field	Type	Default	Description
`url` required	string	—	Starting URL or domain
`limit`	number	`5000`	Maximum URLs to return
`search`	string	—	Filter URLs containing this substring

Response

{
  "success": true,
  "links": [
    "https://example.com/",
    "https://example.com/about",
    "https://example.com/blog",
    "https://example.com/blog/post-1",
    ...
  ]
}

Example

curl -X POST https://api.webpeel.dev/v1/map \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "limit": 100, "search": "/blog" }'

GET /v1/jobs

List all async jobs (crawl, batch, agent). Useful for dashboards and monitoring.

GET /v1/jobs

Auth optional Free

Query Parameters

Parameter	Type	Default	Description
`type`	string	—	Filter by type: `crawl`, `batch`, `extract`
`status`	string	—	Filter by status: `queued`, `processing`, `completed`, `failed`, `cancelled`
`limit`	number	`50`	Maximum results to return

Response

{
  "success": true,
  "count": 3,
  "jobs": [
    {
      "id": "crawl_abc123",
      "type": "crawl",
      "status": "completed",
      "progress": 100,
      "total": 50,
      "completed": 50,
      "creditsUsed": 50,
      "createdAt": "2026-02-15T10:00:00Z",
      "expiresAt": "2026-02-16T10:00:00Z"
    }
  ]
}

POST /v1/scrape (Firecrawl-compatible)

Drop-in replacement for Firecrawl's /v1/scrape endpoint. Change your base URL from api.firecrawl.dev to api.webpeel.dev and your existing code works — zero changes needed.

POST /v1/scrape

Auth optional 1 credit

Request Body (Firecrawl format)

Field	Type	Default	Description
`url` required	string	—	URL to scrape
`formats`	string[]	`["markdown"]`	Output formats: `markdown`, `html`, `rawHtml`, `links`, `screenshot`, `images`, `json`, `branding`, `summary`
`onlyMainContent`	boolean	`true`	Smart content extraction (Firecrawl defaults to true)
`includeTags`	string[]	—	Tags to include
`excludeTags`	string[]	—	Tags to exclude
`waitFor`	number	—	Wait time in ms (auto-enables browser rendering)
`timeout`	number	`30000`	Request timeout in ms
`actions`	object[]	—	Page actions (Firecrawl format supported)
`headers`	object	—	Custom headers
`location`	object	—	Object with `country` and `languages`
Inline LLM Extraction (BYOK)
`extract`	object	—	`{ schema: {...}, prompt: "..." }` for LLM extraction
`llmProvider`	string	—	`openai`, `anthropic`, or `google`
`llmApiKey`	string	—	Your LLM API key (BYOK)
`llmModel`	string	—	Model override

Response

{
  "success": true,
  "data": {
    "markdown": "# Page Title\n\nContent...",
    "metadata": {
      "title": "Page Title",
      "description": "...",
      "sourceURL": "https://example.com",
      "statusCode": 200
    },
    "links": ["https://..."],
    "screenshot": "data:image/png;base64,...",
    "json": { ... }
  }
}

POST /v2/scrape (Firecrawl v2)

Firecrawl v2 compatible endpoint with first-class screenshot support. When formats is ["screenshot"] only, it uses a dedicated screenshot pipeline for faster, higher-quality captures.

POST /v2/scrape

Auth optional 1 credit

Accepts the same body as POST /v1/scrape plus these additional fields:

Field	Type	Default	Description
`fullPage`	boolean	`false`	Full-page screenshot
`width`	number	—	Viewport width (pixels)
`height`	number	—	Viewport height (pixels)
`screenshotFormat`	string	`png`	Image format: `png` or `jpeg`
`quality`	number	—	JPEG quality (1–100)

Also available as POST /v1/fetch

POST /v2/scrape is an alias for POST /v1/fetch. They share the same handler. Use whichever you prefer.

POST /mcp

Hosted MCP endpoint using Streamable HTTP transport. Connect any MCP client (Claude, Cursor, VS Code) with a single URL — no local server required.

POST /mcp

Auth required 1 credit per tool call

Connection

{
  "url": "https://api.webpeel.dev/mcp",
  "headers": {
    "Authorization": "Bearer YOUR_API_KEY"
  }
}

Available Tools

Tool Name	Description
`webpeel_fetch`	Fetch a URL and return clean markdown. Supports render, stealth, actions, inline extraction.
`webpeel_search`	Search the web (DuckDuckGo). Returns titles, URLs, snippets.
`webpeel_crawl`	Crawl a website following links. Max 100 pages, depth 1–5.
`webpeel_map`	Discover all URLs on a domain via sitemap + crawling.
`webpeel_extract`	Extract structured data using CSS selectors or AI.
`webpeel_batch`	Fetch multiple URLs in batch (max 50, configurable concurrency).
`webpeel_agent`	Autonomous web research: search, fetch, synthesize (BYOK).

Stateless: No session management required. Each request is authenticated and handled independently.

GET /mcp returns 405. DELETE /mcp returns 200 (no-op for stateless mode).

GET /health

Health check endpoint. Returns server status, version, and uptime. Not rate-limited — safe for monitoring services to poll frequently.

GET /health

No auth Free

Response

{
  "status": "healthy",
  "version": "0.8.1",
  "uptime": 86400,
  "timestamp": "2026-02-15T21:00:00.000Z"
}

GET /v1/stats

Get usage statistics for the authenticated API key's account. Shows total requests, success rate, and average response time.

GET /v1/stats

Auth required Free

Response

{
  "totalRequests": 1523,
  "successRate": 98.2,
  "avgResponseTime": 342
}

PostgreSQL Required

This endpoint requires a PostgreSQL backend. Self-hosted instances using in-memory storage will return 501 Not Implemented.

GET /v1/cli/usage

Get detailed usage information for the CLI's webpeel usage command. Shows plan info, weekly quota usage, burst limits, and reset times.

GET /v1/cli/usage

Auth required Free

Response

{
  "plan": {
    "tier": "pro",
    "weeklyLimit": 1250,
    "burstLimit": 100
  },
  "weekly": {
    "used": 342,
    "limit": 1250,
    "remaining": 908,
    "resetsAt": "2026-02-17T00:00:00.000Z",
    "percentUsed": 27
  },
  "burst": {
    "used": 12,
    "limit": 100,
    "resetsIn": "47m"
  },
  "canFetch": true,
  "upgradeUrl": "https://webpeel.dev/#pricing"
}

Webhooks

Async jobs (crawl, batch, agent) support webhook notifications. Webhooks are signed with HMAC-SHA256 if you provide a secret, and retried up to 3 times with exponential backoff.

Webhook Config Object

Field	Type	Description
`url`	string	HTTPS URL to receive webhooks
`events`	string[]	Events to subscribe to: `started`, `page`, `completed`, `failed`
`secret`	string	HMAC-SHA256 signing secret. Signature sent in `X-WebPeel-Signature` header.
`metadata`	object	Custom metadata included in every webhook payload

Webhook Payload

{
  "event": "completed",
  "timestamp": "2026-02-15T21:00:00.000Z",
  "data": {
    "jobId": "crawl_abc123",
    "total": 50,
    "completed": 50
  }
}

Rate Limits

WebPeel uses a two-tier rate limiting system: weekly quotas (soft limit) and hourly burst limits (hard limit).

Tier	Weekly Quota	Burst Limit	Price
Free	125 / week	25 / hour	$0
Pro	1,250 / week	100 / hour	$9/mo
Max	6,250 / week	500 / hour	$29/mo

Soft vs Hard Limits

Weekly quota exceeded (soft limit): Requests are not blocked. Instead, they are degraded to HTTP-only mode (no browser rendering or stealth). The X-Soft-Limited header is set.
Burst limit exceeded (hard limit): Returns 429 Too Many Requests with a Retry-After header.
Extra usage: If enabled on your account, you can continue using full features beyond your weekly quota at per-request pricing.

Response Headers

WebPeel includes useful metadata in response headers:

Header	Description
`X-Credits-Used`	Number of credits consumed by this request
`X-Processing-Time`	Server-side processing time in milliseconds
`X-Fetch-Type`	Fetch method used: `basic`, `stealth`, `search`, `screenshot`
`X-Cache`	`HIT` or `MISS`
`X-Cache-Age`	Cache entry age in seconds (on HIT)
`X-WebPeel-Plan`	Your plan tier: `free`, `pro`, `max`
`X-Weekly-Limit`	Your weekly quota
`X-Weekly-Used`	Credits used this week
`X-Weekly-Remaining`	Credits remaining this week
`X-Weekly-Resets-At`	ISO timestamp of next Monday 00:00 UTC
`X-Burst-Limit`	Hourly burst limit
`X-Burst-Used`	Burst requests this hour
`X-Burst-Remaining`	Burst requests remaining
`X-Soft-Limited`	Present when over weekly quota (degraded mode)
`X-Degraded`	Explanation of degradation (e.g. "render=true downgraded to HTTP-only")
`X-Extra-Usage-Charged`	Cost charged to extra usage balance
`X-Extra-Usage-New-Balance`	Remaining extra usage balance
`X-RateLimit-Limit`	Burst limit for the sliding window
`X-RateLimit-Remaining`	Remaining requests in the window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Error Responses

All errors return a consistent JSON format:

{
  "error": "error_code",
  "message": "Human-readable error message"
}

Status	Error Code	Description
400	`invalid_request`	Missing or invalid parameters
400	`invalid_url`	Malformed URL or URL too long (>2048 chars)
400	`forbidden_url`	SSRF protection — cannot fetch localhost, private networks, or non-HTTP URLs
400	`invalid_json`	Malformed JSON in request body
401	`invalid_key`	Invalid or revoked API key
401	`unauthorized`	Valid API key required (for protected endpoints)
404	`not_found`	Route not found or job ID not found
429	`rate_limited`	Hourly burst limit exceeded. Check `Retry-After` header.
429	`burst_limit_exceeded`	Hourly burst limit exceeded (per API key). Check `Retry-After`.
500	`TIMEOUT`	Request timed out while fetching
500	`BLOCKED`	Target site blocked the request — try `render=true` or `stealth=true`
500	`NETWORK`	Network error (DNS failure, connection refused, etc.)
500	`internal_error`	Unexpected server error
500	`search_failed`	Search request failed
500	`answer_failed`	Answer generation failed
501	`not_implemented`	Feature requires PostgreSQL backend (self-hosted only)