API Reference

Complete REST API documentation for WebPeel. All endpoints return JSON and use standard HTTP status codes.

Base URL
Production: https://api.webpeel.dev
Self-hosted: http://localhost:3000 (configurable via PORT env var)

Authentication

Pass your API key via the Authorization header or the X-API-Key header:

# Option 1: Bearer token (recommended)
curl https://api.webpeel.dev/v1/fetch?url=https://example.com \
  -H "Authorization: Bearer wp_live_abc123..."

# Option 2: X-API-Key header
curl https://api.webpeel.dev/v1/fetch?url=https://example.com \
  -H "X-API-Key: wp_live_abc123..."
Anonymous Access
No API key is required for basic usage. Anonymous requests are subject to the free-tier rate limits (25 requests/hour). Sign up for an API key to get a weekly quota and access the dashboard.

GET /v1/fetch

Fetch and extract clean, AI-ready content from any URL. WebPeel automatically escalates from fast HTTP to browser rendering to stealth mode as needed.

GET /v1/fetch
Auth optional 1 credit

Query Parameters

ParameterTypeDefaultDescription
url required string URL to fetch (max 2048 characters). Must be http:// or https://.
format string markdown Output format: markdown, text, or html
render boolean false Force headless browser rendering (for JS-heavy sites, SPAs)
wait number 0 Milliseconds to wait after page load (0–60000). Only applies when render=true.
includeTags string Comma-separated HTML tags/selectors to include (e.g. article,main)
excludeTags string Comma-separated HTML tags/selectors to remove (e.g. nav,footer,.sidebar)
onlyMainContent boolean false Shortcut: sets includeTags to main,article,.content,#content
images boolean false Extract image URLs from the page
location string ISO 3166-1 alpha-2 country code for geo-targeting (e.g. US, DE)
languages string Comma-separated language preferences (e.g. en,de)
actions string JSON-encoded array of page actions to execute before extraction. Auto-enables render.
maxAge number 172800000 Cache freshness threshold in milliseconds (default 2 days). Set to 0 to bypass cache.
storeInCache boolean true Set to false to skip caching the response

Response

{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "metadata": {
    "description": "Example Domain",
    "author": null,
    "published": null,
    "image": null,
    "canonical": "https://example.com"
  },
  "links": ["https://www.iana.org/domains/example"],
  "tokens": 125,
  "method": "simple",
  "elapsed": 118,
  "contentType": "html",
  "quality": 0.95,
  "fingerprint": "a1b2c3d4e5f6g7h8"
}
Response fields
FieldTypeDescription
urlstringFinal URL after redirects
titlestringPage title
contentstringExtracted content in the requested format
metadataobjectStructured metadata: description, author, published, image, canonical
linksstring[]All links found on the page (absolute, deduplicated)
tokensnumberEstimated token count (content.length / 4)
methodstringFetch method used: simple, browser, or stealth
elapsednumberProcessing time in milliseconds
contentTypestringDetected type: html, json, xml, text, document
qualitynumberExtraction quality score (0–1)
fingerprintstringSHA-256 hash prefix for change detection
imagesobject[]Image info (only when images=true): src, alt, title, width, height
screenshotstringBase64 PNG screenshot (only when screenshot=true via POST)

Examples

# Basic fetch
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"

# With browser rendering
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com&render=true"

# Extract only main content, exclude nav/footer
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com&onlyMainContent=true&excludeTags=nav,footer"

# With authentication
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer YOUR_API_KEY"
const response = await fetch(
  'https://api.webpeel.dev/v1/fetch?url=https://example.com&format=markdown',
  {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  }
);
const data = await response.json();

console.log(data.title);    // "Example Domain"
console.log(data.content);  // Markdown content
console.log(data.method);   // "simple" | "browser" | "stealth"
import urllib.request, json

url = "https://api.webpeel.dev/v1/fetch?url=https://example.com"
req = urllib.request.Request(url)
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    print(data["title"])    # "Example Domain"
    print(data["content"])  # Markdown content

Error Responses

StatusCodeDescription
400invalid_requestMissing or invalid url, format, wait, or actions parameter
400invalid_urlURL too long (>2048 chars) or malformed
400forbidden_urlAttempt to fetch localhost, private networks, or non-HTTP URLs (SSRF protection)
429rate_limitedHourly burst limit exceeded
500TIMEOUTRequest timed out
500BLOCKEDSite blocked the request — try render=true or stealth mode
500internal_errorUnexpected server error

POST /v1/fetch

Same as GET /v1/fetch but accepts a JSON body. Adds support for page actions, inline LLM extraction (BYOK), and the Firecrawl-compatible formats array.

POST /v1/fetch
Auth optional 1 credit

Request Body (JSON)

FieldTypeDefaultDescription
url required string URL to fetch (max 2048 characters)
render boolean false Force browser rendering
wait number 0 Wait time in ms after page load (0–60000)
format string markdown markdown, text, or html
includeTags string[] Array of HTML tags/selectors to include
excludeTags string[] Array of HTML tags/selectors to remove
onlyMainContent boolean false Include only main content tags
images boolean false Extract image URLs
location string Country code for geo-targeting
languages string[] Language preferences
actions object[] Page actions to execute before extraction. Auto-enables render.
storeInCache boolean true Store result in server-side cache
Inline LLM Extraction (BYOK)
extract object Object with schema (JSON Schema) and/or prompt (string) for LLM extraction
llmProvider string Required when extract is set. One of: openai, anthropic, google
llmApiKey string Required when extract is set. Your LLM API key (BYOK)
llmModel string Provider default LLM model name (optional)
formats array Firecrawl-compatible formats array. Supports {type: "json", schema: {...}} for extraction.

Response

Same as GET /v1/fetch, plus these additional fields when inline extraction is used:

{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "# Example Domain\n...",
  "json": {
    "company": "Example Corp",
    "founded": 2006
  },
  "extractTokensUsed": { "input": 1234, "output": 56 },
  ...
}

Examples

# Basic POST fetch
curl -X POST https://api.webpeel.dev/v1/fetch \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com",
    "format": "markdown",
    "onlyMainContent": true
  }'

# With inline LLM extraction
curl -X POST https://api.webpeel.dev/v1/fetch \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/about",
    "extract": {
      "schema": {
        "type": "object",
        "properties": {
          "company": { "type": "string" },
          "founded": { "type": "number" }
        }
      }
    },
    "llmProvider": "openai",
    "llmApiKey": "sk-..."
  }'
const response = await fetch('https://api.webpeel.dev/v1/fetch', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    url: 'https://example.com/about',
    extract: {
      schema: {
        type: 'object',
        properties: {
          company: { type: 'string' },
          founded: { type: 'number' }
        }
      }
    },
    llmProvider: 'openai',
    llmApiKey: 'sk-...'
  })
});

const data = await response.json();
console.log(data.json);  // { company: "Example Corp", founded: 2006 }
import urllib.request, json

body = json.dumps({
    "url": "https://example.com/about",
    "extract": {
        "schema": {
            "type": "object",
            "properties": {
                "company": {"type": "string"},
                "founded": {"type": "number"}
            }
        }
    },
    "llmProvider": "openai",
    "llmApiKey": "sk-..."
}).encode()

req = urllib.request.Request(
    "https://api.webpeel.dev/v1/fetch",
    data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    print(data["json"])  # {"company": "Example Corp", "founded": 2006}

Page Actions

Page actions let you interact with a page (click, type, scroll, wait) before content is extracted. Actions auto-enable browser rendering. Compatible with Firecrawl's action format.

Action TypeFieldsDescription
clickselectorClick an element
typeselector, value/textType text into an input (keystroke by keystroke)
fillselector, value/textFill an input (set value directly)
selectselector, valueSelect a dropdown option
scrolldirection, amountScroll the page. Direction: up, down, left, right
waitms/millisecondsWait a fixed duration
waitForSelectorselector, timeoutWait for an element to appear
presskeyPress a keyboard key (e.g. Enter)
hoverselectorHover over an element
screenshotTake a screenshot at this point
{
  "url": "https://example.com/search",
  "actions": [
    { "type": "fill", "selector": "#search-input", "value": "AI agents" },
    { "type": "click", "selector": "#search-btn" },
    { "type": "wait", "ms": 2000 },
    { "type": "scroll", "direction": "down", "amount": 500 }
  ]
}

Search the web using DuckDuckGo (free, default) or Brave Search (BYOK for higher-quality results). Optionally scrape each result URL for full content.

GET /v1/search
Auth optional 1 credit

Query Parameters

ParameterTypeDefaultDescription
q required string Search query (max 500 characters)
provider string duckduckgo Search provider: duckduckgo or brave
searchApiKey string Brave Search API key. Alternative: x-search-api-key header.
count number 5 Number of results (1–10)
scrapeResults boolean false When true, fetches each result URL and adds content field
sources string web Comma-separated: web, news, images
categories string Filter by category: github, pdf, docs, blog, news, video, social
tbs string Time filter: qdr:d (day), qdr:w (week), qdr:m (month), qdr:y (year)
country string Country/region code (e.g. us-en, de-de)
location string Location string for localized results

Response

{
  "success": true,
  "data": {
    "web": [
      {
        "title": "Example Domain",
        "url": "https://example.com",
        "snippet": "This domain is for use in illustrative examples...",
        "content": "# Example Domain\n..."  // Only if scrapeResults=true
      }
    ],
    "news": [ ... ],   // Only if sources includes "news"
    "images": [ ... ]  // Only if sources includes "images"
  }
}

Examples

# Basic search
curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"

# With Brave Search (BYOK)
curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&provider=brave" \
  -H "x-search-api-key: YOUR_BRAVE_KEY"

# Search + scrape results
curl "https://api.webpeel.dev/v1/search?q=MCP+protocol&scrapeResults=true&count=3"
const resp = await fetch(
  'https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5',
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);
const { data } = await resp.json();

for (const result of data.web) {
  console.log(`${result.title}: ${result.url}`);
}
import urllib.request, json

url = "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"
req = urllib.request.Request(url)
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    for r in data["data"]["web"]:
        print(f"{r['title']}: {r['url']}")

POST /v1/screenshot

Take a screenshot of any URL and return a base64-encoded image. Supports full-page capture, custom viewport sizes, page actions, and stealth mode.

POST /v1/screenshot
Auth optional 1 credit (stealth)

Request Body (JSON)

FieldTypeDefaultDescription
url required string URL to screenshot (max 2048 characters)
fullPage boolean false Capture the entire scrollable page
width number 1280 Viewport width in pixels (100–5000)
height number 720 Viewport height in pixels (100–5000)
format string png Image format: png, jpeg, or jpg
quality number Image quality for JPEG (1–100)
waitFor number 0 Milliseconds to wait before capture (0–60000)
timeout number 30000 Total timeout in milliseconds
stealth boolean false Use stealth mode to bypass bot detection
actions object[] Page actions to execute before screenshot
headers object Custom HTTP headers
cookies string[] Cookies to set (key=value pairs)

Response

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "screenshot": "...",
    "metadata": {
      "sourceURL": "https://example.com",
      "format": "png",
      "width": 1280,
      "height": 720,
      "fullPage": false
    }
  }
}

Examples

curl -X POST https://api.webpeel.dev/v1/screenshot \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com",
    "fullPage": true,
    "width": 1440,
    "format": "png"
  }'
const resp = await fetch('https://api.webpeel.dev/v1/screenshot', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    url: 'https://example.com',
    fullPage: true,
    width: 1440
  })
});

const { data } = await resp.json();
// data.screenshot is a data: URL — use directly in <img src="...">
import urllib.request, json, base64

body = json.dumps({
    "url": "https://example.com",
    "fullPage": True,
    "width": 1440
}).encode()

req = urllib.request.Request(
    "https://api.webpeel.dev/v1/screenshot",
    data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    # Save screenshot to file
    img_data = data["data"]["screenshot"].split(",")[1]
    with open("screenshot.png", "wb") as f:
        f.write(base64.b64decode(img_data))

POST /v1/answer

Ask a question, search the web, fetch top sources, and generate an LLM answer with inline citations ([1], [2]). BYOK — you provide your own LLM API key. Supports SSE streaming.

POST /v1/answer
Auth optional 1 credit + LLM cost (BYOK)

Request Body (JSON)

FieldTypeDefaultDescription
question required string The question to answer (max 2000 characters)
llmProvider required string LLM provider: openai, anthropic, or google
llmApiKey required string Your LLM API key (BYOK)
llmModel string Provider default Model name. Defaults: gpt-4o-mini (OpenAI), claude-3-5-sonnet-latest (Anthropic), gemini-1.5-flash (Google)
searchProvider string duckduckgo Search provider: duckduckgo or brave
searchApiKey string Brave Search API key (also accepted via x-search-api-key header)
maxSources number 5 Maximum sources to fetch and cite (1–10)
stream boolean false When true, stream response via SSE (text/event-stream)

Response (non-streaming)

{
  "answer": "MCP (Model Context Protocol) is an open standard... [1] ... [2] ...",
  "citations": [
    { "title": "Model Context Protocol", "url": "https://...", "snippet": "..." },
    { "title": "MCP Documentation", "url": "https://...", "snippet": "..." }
  ],
  "searchProvider": "duckduckgo",
  "llmProvider": "openai",
  "llmModel": "gpt-4o-mini",
  "tokensUsed": { "input": 3456, "output": 789 }
}

Response (streaming)

When stream=true, the endpoint returns text/event-stream with JSON-encoded events:

data: {"type":"chunk","text":"MCP (Model Context Protocol) is"}
data: {"type":"chunk","text":" an open standard for AI tool..."}
data: {"type":"done","citations":[...],"searchProvider":"duckduckgo","llmProvider":"openai","llmModel":"gpt-4o-mini","tokensUsed":{"input":3456,"output":789}}

Examples

curl -X POST https://api.webpeel.dev/v1/answer \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is MCP and why does it matter for AI agents?",
    "llmProvider": "openai",
    "llmApiKey": "sk-...",
    "maxSources": 5,
    "stream": false
  }'
const resp = await fetch('https://api.webpeel.dev/v1/answer', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: 'What is MCP and why does it matter for AI agents?',
    llmProvider: 'openai',
    llmApiKey: 'sk-...',
    maxSources: 5
  })
});

const data = await resp.json();
console.log(data.answer);
console.log(data.citations);
import urllib.request, json

body = json.dumps({
    "question": "What is MCP and why does it matter for AI agents?",
    "llmProvider": "openai",
    "llmApiKey": "sk-...",
    "maxSources": 5
}).encode()

req = urllib.request.Request(
    "https://api.webpeel.dev/v1/answer",
    data=body, method="POST"
)
req.add_header("Content-Type", "application/json")

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read())
    print(data["answer"])

POST /v1/agent

Run an autonomous AI research agent. It plans search queries, fetches pages, and synthesizes a structured answer using your LLM (BYOK). Supports basic and thorough research depth, topic tuning, JSON Schema output, and SSE streaming.

POST /v1/agent
Auth optional 1 credit per page fetched + LLM cost (BYOK)

Request Body (JSON)

FieldTypeDefaultDescription
prompt required string Natural language research task
llmApiKey required string OpenAI-compatible API key (BYOK)
llmModel string gpt-4o-mini LLM model to use
llmApiBase string https://api.openai.com/v1 Custom LLM API base URL (for OpenRouter, local models, etc.)
urls string[] Starting URLs. If not provided, the agent searches the web.
depth string basic Research depth: basic (1 query, ~3 sources) or thorough (multi-query, ~10 sources, gap analysis)
topic string general Topic hint: general, news, technical, academic. Adjusts search queries and source prioritization.
maxSources number 5 Max sources to fetch and analyze (1–20)
maxCredits number Spending cap (stops fetching when reached)
outputSchema object JSON Schema for structured output. Agent validates and may retry if output doesn't match.
schema object Legacy alias for outputSchema
stream boolean false When true, stream progress and answer via SSE

Response (synchronous)

{
  "success": true,
  "data": {
    "answer": "## AI Coding Assistants Comparison\n...",
    "keyFindings": ["Finding 1", "Finding 2"]
  },
  "answer": "## AI Coding Assistants Comparison\n...",
  "sources": ["https://...", "https://..."],
  "sourcesDetailed": [
    { "url": "https://...", "title": "Source Title" }
  ],
  "pagesVisited": 5,
  "creditsUsed": 6,
  "tokensUsed": { "input": 12000, "output": 2500 }
}

SSE Streaming Events

When stream=true, the endpoint returns text/event-stream:

data: {"type":"step","action":"searching","query":"AI coding assistants 2026"}
data: {"type":"step","action":"fetching","url":"https://..."}
data: {"type":"step","action":"analyzing","summary":"Synthesizing from 5 sources..."}
data: {"type":"chunk","text":"## AI Coding Assistants\n"}
data: {"type":"chunk","text":"Here are the top 5..."}
data: {"type":"done","answer":"...","sources":[...],"tokensUsed":{...}}

Async Mode

For long-running research, use the async endpoint:

POST /v1/agent/async

Same request body as above, plus optional webhook (WebhookConfig object). Returns a job ID:

{ "success": true, "id": "job-uuid", "url": "/v1/agent/job-uuid" }
GET /v1/agent/:id

Poll job status and results.

DELETE /v1/agent/:id

Cancel a running agent job.

Example

curl -X POST https://api.webpeel.dev/v1/agent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "prompt": "Compare the top 5 AI coding assistants by features and pricing",
    "llmApiKey": "sk-...",
    "depth": "thorough",
    "topic": "technical",
    "maxSources": 10,
    "outputSchema": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "features": { "type": "array", "items": { "type": "string" } },
          "pricing": { "type": "string" }
        },
        "required": ["name", "features", "pricing"]
      }
    }
  }'

POST /v1/crawl

Start an async crawl job. Follows links from a starting URL and extracts content from each page. Returns a job ID for polling. Supports webhooks and SSE for real-time progress.

POST /v1/crawl
Auth optional 1 credit per page crawled

Request Body (JSON)

FieldTypeDefaultDescription
url required string Starting URL to crawl from
limit number 100 Maximum pages to crawl
maxDepth number 3 Maximum link-following depth
scrapeOptions object Passed through to each page fetch (e.g. format, render)
location string Country code for geo-targeting
languages string | string[] Language preferences
webhook object Webhook config for progress notifications

Response (202 Accepted)

{
  "success": true,
  "id": "crawl_abc123",
  "url": "/v1/crawl/crawl_abc123"
}

GET /v1/crawl/:id — Poll Results

GET /v1/crawl/:id

Returns job status and results. Supports SSE by sending Accept: text/event-stream for real-time progress updates.

{
  "success": true,
  "status": "completed",
  "progress": 100,
  "total": 50,
  "completed": 50,
  "creditsUsed": 50,
  "expiresAt": "2026-02-16T00:00:00.000Z",
  "data": [
    {
      "url": "https://example.com",
      "title": "Home",
      "markdown": "# Home\n...",
      "links": ["https://example.com/about"]
    }
  ]
}

DELETE /v1/crawl/:id — Cancel Job

DELETE /v1/crawl/:id

Cancel a queued or in-progress crawl job.

Example

# Start crawl
curl -X POST https://api.webpeel.dev/v1/crawl \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{ "url": "https://docs.example.com", "limit": 50, "maxDepth": 3 }'

# Poll results
curl "https://api.webpeel.dev/v1/crawl/crawl_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Cancel
curl -X DELETE "https://api.webpeel.dev/v1/crawl/crawl_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"
// Start crawl
const { id } = await (await fetch('https://api.webpeel.dev/v1/crawl', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({ url: 'https://docs.example.com', limit: 50 })
})).json();

// Poll until complete
let job;
do {
  await new Promise(r => setTimeout(r, 2000));
  job = await (await fetch(`https://api.webpeel.dev/v1/crawl/${id}`, {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  })).json();
  console.log(`Progress: ${job.completed}/${job.total}`);
} while (job.status === 'processing' || job.status === 'queued');

console.log(`Done! ${job.data.length} pages crawled.`);
import urllib.request, json, time

# Start crawl
body = json.dumps({"url": "https://docs.example.com", "limit": 50}).encode()
req = urllib.request.Request(
    "https://api.webpeel.dev/v1/crawl", data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")

with urllib.request.urlopen(req) as resp:
    job_id = json.loads(resp.read())["id"]

# Poll until complete
while True:
    time.sleep(2)
    req = urllib.request.Request(f"https://api.webpeel.dev/v1/crawl/{job_id}")
    req.add_header("Authorization", "Bearer YOUR_API_KEY")
    with urllib.request.urlopen(req) as resp:
        job = json.loads(resp.read())
    if job["status"] in ("completed", "failed"):
        break
    print(f"Progress: {job['completed']}/{job['total']}")

print(f"Done! {len(job['data'])} pages.")

POST /v1/batch/scrape

Batch scrape multiple URLs concurrently. Creates an async job with up to 5 parallel fetches. Poll for results or use webhooks.

POST /v1/batch/scrape
Auth optional 1 credit per URL

Request Body (JSON)

FieldTypeDefaultDescription
urls required string[] Array of URLs to scrape (max 100)
formats string[] ["markdown"] Output formats
maxTokens number Max tokens per result
extract object Extraction options (passed through to each fetch)
webhook string Webhook URL for progress notifications

Response (202 Accepted)

{
  "success": true,
  "id": "batch_xyz789",
  "url": "/v1/batch/scrape/batch_xyz789"
}

GET /v1/batch/scrape/:id — Poll Results

Same response format as crawl jobs:

{
  "success": true,
  "status": "completed",
  "progress": 100,
  "total": 10,
  "completed": 10,
  "creditsUsed": 10,
  "data": [ { "url": "...", "content": "...", ... }, ... ]
}

DELETE /v1/batch/scrape/:id — Cancel Job

Cancel a running batch job.

Example

curl -X POST https://api.webpeel.dev/v1/batch/scrape \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "formats": ["markdown"]
  }'

POST /v1/map

Discover all URLs on a domain using sitemap.xml and link crawling. Returns a flat list of URLs without fetching page content. Much faster and cheaper than a full crawl.

POST /v1/map
Auth optional 1 credit

Request Body (JSON)

FieldTypeDefaultDescription
url required string Starting URL or domain
limit number 5000 Maximum URLs to return
search string Filter URLs containing this substring

Response

{
  "success": true,
  "links": [
    "https://example.com/",
    "https://example.com/about",
    "https://example.com/blog",
    "https://example.com/blog/post-1",
    ...
  ]
}

Example

curl -X POST https://api.webpeel.dev/v1/map \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "limit": 100, "search": "/blog" }'

GET /v1/jobs

List all async jobs (crawl, batch, agent). Useful for dashboards and monitoring.

GET /v1/jobs
Auth optional Free

Query Parameters

ParameterTypeDefaultDescription
typestringFilter by type: crawl, batch, extract
statusstringFilter by status: queued, processing, completed, failed, cancelled
limitnumber50Maximum results to return

Response

{
  "success": true,
  "count": 3,
  "jobs": [
    {
      "id": "crawl_abc123",
      "type": "crawl",
      "status": "completed",
      "progress": 100,
      "total": 50,
      "completed": 50,
      "creditsUsed": 50,
      "createdAt": "2026-02-15T10:00:00Z",
      "expiresAt": "2026-02-16T10:00:00Z"
    }
  ]
}

POST /v1/scrape (Firecrawl-compatible)

Drop-in replacement for Firecrawl's /v1/scrape endpoint. Change your base URL from api.firecrawl.dev to api.webpeel.dev and your existing code works — zero changes needed.

POST /v1/scrape
Auth optional 1 credit

Request Body (Firecrawl format)

FieldTypeDefaultDescription
url required string URL to scrape
formats string[] ["markdown"] Output formats: markdown, html, rawHtml, links, screenshot, images, json, branding, summary
onlyMainContent boolean true Smart content extraction (Firecrawl defaults to true)
includeTags string[] Tags to include
excludeTags string[] Tags to exclude
waitFor number Wait time in ms (auto-enables browser rendering)
timeout number 30000 Request timeout in ms
actions object[] Page actions (Firecrawl format supported)
headers object Custom headers
location object Object with country and languages
Inline LLM Extraction (BYOK)
extract object { schema: {...}, prompt: "..." } for LLM extraction
llmProvider string openai, anthropic, or google
llmApiKey string Your LLM API key (BYOK)
llmModel string Model override

Response

{
  "success": true,
  "data": {
    "markdown": "# Page Title\n\nContent...",
    "metadata": {
      "title": "Page Title",
      "description": "...",
      "sourceURL": "https://example.com",
      "statusCode": 200
    },
    "links": ["https://..."],
    "screenshot": "data:image/png;base64,...",
    "json": { ... }
  }
}

POST /v2/scrape (Firecrawl v2)

Firecrawl v2 compatible endpoint with first-class screenshot support. When formats is ["screenshot"] only, it uses a dedicated screenshot pipeline for faster, higher-quality captures.

POST /v2/scrape
Auth optional 1 credit

Accepts the same body as POST /v1/scrape plus these additional fields:

FieldTypeDefaultDescription
fullPagebooleanfalseFull-page screenshot
widthnumberViewport width (pixels)
heightnumberViewport height (pixels)
screenshotFormatstringpngImage format: png or jpeg
qualitynumberJPEG quality (1–100)
Also available as POST /v1/fetch
POST /v2/scrape is an alias for POST /v1/fetch. They share the same handler. Use whichever you prefer.

POST /mcp

Hosted MCP endpoint using Streamable HTTP transport. Connect any MCP client (Claude, Cursor, VS Code) with a single URL — no local server required.

POST /mcp
Auth required 1 credit per tool call

Connection

{
  "url": "https://api.webpeel.dev/mcp",
  "headers": {
    "Authorization": "Bearer YOUR_API_KEY"
  }
}

Available Tools

Tool NameDescription
webpeel_fetchFetch a URL and return clean markdown. Supports render, stealth, actions, inline extraction.
webpeel_searchSearch the web (DuckDuckGo). Returns titles, URLs, snippets.
webpeel_crawlCrawl a website following links. Max 100 pages, depth 1–5.
webpeel_mapDiscover all URLs on a domain via sitemap + crawling.
webpeel_extractExtract structured data using CSS selectors or AI.
webpeel_batchFetch multiple URLs in batch (max 50, configurable concurrency).
webpeel_agentAutonomous web research: search, fetch, synthesize (BYOK).

Stateless: No session management required. Each request is authenticated and handled independently.

GET /mcp returns 405. DELETE /mcp returns 200 (no-op for stateless mode).

GET /health

Health check endpoint. Returns server status, version, and uptime. Not rate-limited — safe for monitoring services to poll frequently.

GET /health
No auth Free

Response

{
  "status": "healthy",
  "version": "0.8.1",
  "uptime": 86400,
  "timestamp": "2026-02-15T21:00:00.000Z"
}

GET /v1/stats

Get usage statistics for the authenticated API key's account. Shows total requests, success rate, and average response time.

GET /v1/stats
Auth required Free

Response

{
  "totalRequests": 1523,
  "successRate": 98.2,
  "avgResponseTime": 342
}
PostgreSQL Required
This endpoint requires a PostgreSQL backend. Self-hosted instances using in-memory storage will return 501 Not Implemented.

GET /v1/cli/usage

Get detailed usage information for the CLI's webpeel usage command. Shows plan info, weekly quota usage, burst limits, and reset times.

GET /v1/cli/usage
Auth required Free

Response

{
  "plan": {
    "tier": "pro",
    "weeklyLimit": 1250,
    "burstLimit": 100
  },
  "weekly": {
    "used": 342,
    "limit": 1250,
    "remaining": 908,
    "resetsAt": "2026-02-17T00:00:00.000Z",
    "percentUsed": 27
  },
  "burst": {
    "used": 12,
    "limit": 100,
    "resetsIn": "47m"
  },
  "canFetch": true,
  "upgradeUrl": "https://webpeel.dev/#pricing"
}

Webhooks

Async jobs (crawl, batch, agent) support webhook notifications. Webhooks are signed with HMAC-SHA256 if you provide a secret, and retried up to 3 times with exponential backoff.

Webhook Config Object

FieldTypeDescription
urlstringHTTPS URL to receive webhooks
eventsstring[]Events to subscribe to: started, page, completed, failed
secretstringHMAC-SHA256 signing secret. Signature sent in X-WebPeel-Signature header.
metadataobjectCustom metadata included in every webhook payload

Webhook Payload

{
  "event": "completed",
  "timestamp": "2026-02-15T21:00:00.000Z",
  "data": {
    "jobId": "crawl_abc123",
    "total": 50,
    "completed": 50
  }
}

Rate Limits

WebPeel uses a two-tier rate limiting system: weekly quotas (soft limit) and hourly burst limits (hard limit).

Tier Weekly Quota Burst Limit Price
Free 125 / week 25 / hour $0
Pro 1,250 / week 100 / hour $9/mo
Max 6,250 / week 500 / hour $29/mo

Soft vs Hard Limits

Response Headers

WebPeel includes useful metadata in response headers:

HeaderDescription
X-Credits-UsedNumber of credits consumed by this request
X-Processing-TimeServer-side processing time in milliseconds
X-Fetch-TypeFetch method used: basic, stealth, search, screenshot
X-CacheHIT or MISS
X-Cache-AgeCache entry age in seconds (on HIT)
X-WebPeel-PlanYour plan tier: free, pro, max
X-Weekly-LimitYour weekly quota
X-Weekly-UsedCredits used this week
X-Weekly-RemainingCredits remaining this week
X-Weekly-Resets-AtISO timestamp of next Monday 00:00 UTC
X-Burst-LimitHourly burst limit
X-Burst-UsedBurst requests this hour
X-Burst-RemainingBurst requests remaining
X-Soft-LimitedPresent when over weekly quota (degraded mode)
X-DegradedExplanation of degradation (e.g. "render=true downgraded to HTTP-only")
X-Extra-Usage-ChargedCost charged to extra usage balance
X-Extra-Usage-New-BalanceRemaining extra usage balance
X-RateLimit-LimitBurst limit for the sliding window
X-RateLimit-RemainingRemaining requests in the window
X-RateLimit-ResetUnix timestamp when the window resets

Error Responses

All errors return a consistent JSON format:

{
  "error": "error_code",
  "message": "Human-readable error message"
}
Status Error Code Description
400invalid_requestMissing or invalid parameters
400invalid_urlMalformed URL or URL too long (>2048 chars)
400forbidden_urlSSRF protection — cannot fetch localhost, private networks, or non-HTTP URLs
400invalid_jsonMalformed JSON in request body
401invalid_keyInvalid or revoked API key
401unauthorizedValid API key required (for protected endpoints)
404not_foundRoute not found or job ID not found
429rate_limitedHourly burst limit exceeded. Check Retry-After header.
429burst_limit_exceededHourly burst limit exceeded (per API key). Check Retry-After.
500TIMEOUTRequest timed out while fetching
500BLOCKEDTarget site blocked the request — try render=true or stealth=true
500NETWORKNetwork error (DNS failure, connection refused, etc.)
500internal_errorUnexpected server error
500search_failedSearch request failed
500answer_failedAnswer generation failed
501not_implementedFeature requires PostgreSQL backend (self-hosted only)