API Reference
Complete REST API documentation for WebPeel. All endpoints return JSON and use standard HTTP status codes.
https://api.webpeel.devSelf-hosted:
http://localhost:3000 (configurable via PORT env var)
Authentication
Pass your API key via the Authorization header or the X-API-Key header:
# Option 1: Bearer token (recommended)
curl https://api.webpeel.dev/v1/fetch?url=https://example.com \
-H "Authorization: Bearer wp_live_abc123..."
# Option 2: X-API-Key header
curl https://api.webpeel.dev/v1/fetch?url=https://example.com \
-H "X-API-Key: wp_live_abc123..."
GET /v1/fetch
Fetch and extract clean, AI-ready content from any URL. WebPeel automatically escalates from fast HTTP to browser rendering to stealth mode as needed.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url required |
string | — | URL to fetch (max 2048 characters). Must be http:// or https://. |
format |
string | markdown |
Output format: markdown, text, or html |
render |
boolean | false |
Force headless browser rendering (for JS-heavy sites, SPAs) |
wait |
number | 0 |
Milliseconds to wait after page load (0–60000). Only applies when render=true. |
includeTags |
string | — | Comma-separated HTML tags/selectors to include (e.g. article,main) |
excludeTags |
string | — | Comma-separated HTML tags/selectors to remove (e.g. nav,footer,.sidebar) |
onlyMainContent |
boolean | false |
Shortcut: sets includeTags to main,article,.content,#content |
images |
boolean | false |
Extract image URLs from the page |
location |
string | — | ISO 3166-1 alpha-2 country code for geo-targeting (e.g. US, DE) |
languages |
string | — | Comma-separated language preferences (e.g. en,de) |
actions |
string | — | JSON-encoded array of page actions to execute before extraction. Auto-enables render. |
maxAge |
number | 172800000 |
Cache freshness threshold in milliseconds (default 2 days). Set to 0 to bypass cache. |
storeInCache |
boolean | true |
Set to false to skip caching the response |
Response
{
"url": "https://example.com",
"title": "Example Domain",
"content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"description": "Example Domain",
"author": null,
"published": null,
"image": null,
"canonical": "https://example.com"
},
"links": ["https://www.iana.org/domains/example"],
"tokens": 125,
"method": "simple",
"elapsed": 118,
"contentType": "html",
"quality": 0.95,
"fingerprint": "a1b2c3d4e5f6g7h8"
}
Response fields
| Field | Type | Description |
|---|---|---|
url | string | Final URL after redirects |
title | string | Page title |
content | string | Extracted content in the requested format |
metadata | object | Structured metadata: description, author, published, image, canonical |
links | string[] | All links found on the page (absolute, deduplicated) |
tokens | number | Estimated token count (content.length / 4) |
method | string | Fetch method used: simple, browser, or stealth |
elapsed | number | Processing time in milliseconds |
contentType | string | Detected type: html, json, xml, text, document |
quality | number | Extraction quality score (0–1) |
fingerprint | string | SHA-256 hash prefix for change detection |
images | object[] | Image info (only when images=true): src, alt, title, width, height |
screenshot | string | Base64 PNG screenshot (only when screenshot=true via POST) |
Examples
# Basic fetch
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"
# With browser rendering
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com&render=true"
# Extract only main content, exclude nav/footer
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com&onlyMainContent=true&excludeTags=nav,footer"
# With authentication
curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
-H "Authorization: Bearer YOUR_API_KEY"
const response = await fetch(
'https://api.webpeel.dev/v1/fetch?url=https://example.com&format=markdown',
{
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
}
);
const data = await response.json();
console.log(data.title); // "Example Domain"
console.log(data.content); // Markdown content
console.log(data.method); // "simple" | "browser" | "stealth"
import urllib.request, json
url = "https://api.webpeel.dev/v1/fetch?url=https://example.com"
req = urllib.request.Request(url)
req.add_header("Authorization", "Bearer YOUR_API_KEY")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
print(data["title"]) # "Example Domain"
print(data["content"]) # Markdown content
Error Responses
| Status | Code | Description |
|---|---|---|
| 400 | invalid_request | Missing or invalid url, format, wait, or actions parameter |
| 400 | invalid_url | URL too long (>2048 chars) or malformed |
| 400 | forbidden_url | Attempt to fetch localhost, private networks, or non-HTTP URLs (SSRF protection) |
| 429 | rate_limited | Hourly burst limit exceeded |
| 500 | TIMEOUT | Request timed out |
| 500 | BLOCKED | Site blocked the request — try render=true or stealth mode |
| 500 | internal_error | Unexpected server error |
POST /v1/fetch
Same as GET /v1/fetch but accepts a JSON body. Adds support for page actions, inline LLM extraction (BYOK), and the Firecrawl-compatible formats array.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
url required |
string | — | URL to fetch (max 2048 characters) |
render |
boolean | false |
Force browser rendering |
wait |
number | 0 |
Wait time in ms after page load (0–60000) |
format |
string | markdown |
markdown, text, or html |
includeTags |
string[] | — | Array of HTML tags/selectors to include |
excludeTags |
string[] | — | Array of HTML tags/selectors to remove |
onlyMainContent |
boolean | false |
Include only main content tags |
images |
boolean | false |
Extract image URLs |
location |
string | — | Country code for geo-targeting |
languages |
string[] | — | Language preferences |
actions |
object[] | — | Page actions to execute before extraction. Auto-enables render. |
storeInCache |
boolean | true |
Store result in server-side cache |
| Inline LLM Extraction (BYOK) | |||
extract |
object | — | Object with schema (JSON Schema) and/or prompt (string) for LLM extraction |
llmProvider |
string | — | Required when extract is set. One of: openai, anthropic, google |
llmApiKey |
string | — | Required when extract is set. Your LLM API key (BYOK) |
llmModel |
string | Provider default | LLM model name (optional) |
formats |
array | — | Firecrawl-compatible formats array. Supports {type: "json", schema: {...}} for extraction. |
Response
Same as GET /v1/fetch, plus these additional fields when inline extraction is used:
{
"url": "https://example.com",
"title": "Example Domain",
"content": "# Example Domain\n...",
"json": {
"company": "Example Corp",
"founded": 2006
},
"extractTokensUsed": { "input": 1234, "output": 56 },
...
}
Examples
# Basic POST fetch
curl -X POST https://api.webpeel.dev/v1/fetch \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://example.com",
"format": "markdown",
"onlyMainContent": true
}'
# With inline LLM extraction
curl -X POST https://api.webpeel.dev/v1/fetch \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://example.com/about",
"extract": {
"schema": {
"type": "object",
"properties": {
"company": { "type": "string" },
"founded": { "type": "number" }
}
}
},
"llmProvider": "openai",
"llmApiKey": "sk-..."
}'
const response = await fetch('https://api.webpeel.dev/v1/fetch', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
url: 'https://example.com/about',
extract: {
schema: {
type: 'object',
properties: {
company: { type: 'string' },
founded: { type: 'number' }
}
}
},
llmProvider: 'openai',
llmApiKey: 'sk-...'
})
});
const data = await response.json();
console.log(data.json); // { company: "Example Corp", founded: 2006 }
import urllib.request, json
body = json.dumps({
"url": "https://example.com/about",
"extract": {
"schema": {
"type": "object",
"properties": {
"company": {"type": "string"},
"founded": {"type": "number"}
}
}
},
"llmProvider": "openai",
"llmApiKey": "sk-..."
}).encode()
req = urllib.request.Request(
"https://api.webpeel.dev/v1/fetch",
data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
print(data["json"]) # {"company": "Example Corp", "founded": 2006}
Page Actions
Page actions let you interact with a page (click, type, scroll, wait) before content is extracted. Actions auto-enable browser rendering. Compatible with Firecrawl's action format.
| Action Type | Fields | Description |
|---|---|---|
click | selector | Click an element |
type | selector, value/text | Type text into an input (keystroke by keystroke) |
fill | selector, value/text | Fill an input (set value directly) |
select | selector, value | Select a dropdown option |
scroll | direction, amount | Scroll the page. Direction: up, down, left, right |
wait | ms/milliseconds | Wait a fixed duration |
waitForSelector | selector, timeout | Wait for an element to appear |
press | key | Press a keyboard key (e.g. Enter) |
hover | selector | Hover over an element |
screenshot | — | Take a screenshot at this point |
{
"url": "https://example.com/search",
"actions": [
{ "type": "fill", "selector": "#search-input", "value": "AI agents" },
{ "type": "click", "selector": "#search-btn" },
{ "type": "wait", "ms": 2000 },
{ "type": "scroll", "direction": "down", "amount": 500 }
]
}
GET /v1/search
Search the web using DuckDuckGo (free, default) or Brave Search (BYOK for higher-quality results). Optionally scrape each result URL for full content.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
q required |
string | — | Search query (max 500 characters) |
provider |
string | duckduckgo |
Search provider: duckduckgo or brave |
searchApiKey |
string | — | Brave Search API key. Alternative: x-search-api-key header. |
count |
number | 5 |
Number of results (1–10) |
scrapeResults |
boolean | false |
When true, fetches each result URL and adds content field |
sources |
string | web |
Comma-separated: web, news, images |
categories |
string | — | Filter by category: github, pdf, docs, blog, news, video, social |
tbs |
string | — | Time filter: qdr:d (day), qdr:w (week), qdr:m (month), qdr:y (year) |
country |
string | — | Country/region code (e.g. us-en, de-de) |
location |
string | — | Location string for localized results |
Response
{
"success": true,
"data": {
"web": [
{
"title": "Example Domain",
"url": "https://example.com",
"snippet": "This domain is for use in illustrative examples...",
"content": "# Example Domain\n..." // Only if scrapeResults=true
}
],
"news": [ ... ], // Only if sources includes "news"
"images": [ ... ] // Only if sources includes "images"
}
}
Examples
# Basic search
curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"
# With Brave Search (BYOK)
curl "https://api.webpeel.dev/v1/search?q=latest+AI+news&provider=brave" \
-H "x-search-api-key: YOUR_BRAVE_KEY"
# Search + scrape results
curl "https://api.webpeel.dev/v1/search?q=MCP+protocol&scrapeResults=true&count=3"
const resp = await fetch(
'https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5',
{ headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);
const { data } = await resp.json();
for (const result of data.web) {
console.log(`${result.title}: ${result.url}`);
}
import urllib.request, json
url = "https://api.webpeel.dev/v1/search?q=latest+AI+news&count=5"
req = urllib.request.Request(url)
req.add_header("Authorization", "Bearer YOUR_API_KEY")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
for r in data["data"]["web"]:
print(f"{r['title']}: {r['url']}")
POST /v1/screenshot
Take a screenshot of any URL and return a base64-encoded image. Supports full-page capture, custom viewport sizes, page actions, and stealth mode.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
url required |
string | — | URL to screenshot (max 2048 characters) |
fullPage |
boolean | false |
Capture the entire scrollable page |
width |
number | 1280 |
Viewport width in pixels (100–5000) |
height |
number | 720 |
Viewport height in pixels (100–5000) |
format |
string | png |
Image format: png, jpeg, or jpg |
quality |
number | — | Image quality for JPEG (1–100) |
waitFor |
number | 0 |
Milliseconds to wait before capture (0–60000) |
timeout |
number | 30000 |
Total timeout in milliseconds |
stealth |
boolean | false |
Use stealth mode to bypass bot detection |
actions |
object[] | — | Page actions to execute before screenshot |
headers |
object | — | Custom HTTP headers |
cookies |
string[] | — | Cookies to set (key=value pairs) |
Response
{
"success": true,
"data": {
"url": "https://example.com",
"screenshot": "data:image/png;base64,iVBORw0KGgo...",
"metadata": {
"sourceURL": "https://example.com",
"format": "png",
"width": 1280,
"height": 720,
"fullPage": false
}
}
}
Examples
curl -X POST https://api.webpeel.dev/v1/screenshot \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://example.com",
"fullPage": true,
"width": 1440,
"format": "png"
}'
const resp = await fetch('https://api.webpeel.dev/v1/screenshot', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
url: 'https://example.com',
fullPage: true,
width: 1440
})
});
const { data } = await resp.json();
// data.screenshot is a data: URL — use directly in <img src="...">
import urllib.request, json, base64
body = json.dumps({
"url": "https://example.com",
"fullPage": True,
"width": 1440
}).encode()
req = urllib.request.Request(
"https://api.webpeel.dev/v1/screenshot",
data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
# Save screenshot to file
img_data = data["data"]["screenshot"].split(",")[1]
with open("screenshot.png", "wb") as f:
f.write(base64.b64decode(img_data))
POST /v1/answer
Ask a question, search the web, fetch top sources, and generate an LLM answer with inline citations ([1], [2]). BYOK — you provide your own LLM API key. Supports SSE streaming.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
question required |
string | — | The question to answer (max 2000 characters) |
llmProvider required |
string | — | LLM provider: openai, anthropic, or google |
llmApiKey required |
string | — | Your LLM API key (BYOK) |
llmModel |
string | Provider default | Model name. Defaults: gpt-4o-mini (OpenAI), claude-3-5-sonnet-latest (Anthropic), gemini-1.5-flash (Google) |
searchProvider |
string | duckduckgo |
Search provider: duckduckgo or brave |
searchApiKey |
string | — | Brave Search API key (also accepted via x-search-api-key header) |
maxSources |
number | 5 |
Maximum sources to fetch and cite (1–10) |
stream |
boolean | false |
When true, stream response via SSE (text/event-stream) |
Response (non-streaming)
{
"answer": "MCP (Model Context Protocol) is an open standard... [1] ... [2] ...",
"citations": [
{ "title": "Model Context Protocol", "url": "https://...", "snippet": "..." },
{ "title": "MCP Documentation", "url": "https://...", "snippet": "..." }
],
"searchProvider": "duckduckgo",
"llmProvider": "openai",
"llmModel": "gpt-4o-mini",
"tokensUsed": { "input": 3456, "output": 789 }
}
Response (streaming)
When stream=true, the endpoint returns text/event-stream with JSON-encoded events:
data: {"type":"chunk","text":"MCP (Model Context Protocol) is"}
data: {"type":"chunk","text":" an open standard for AI tool..."}
data: {"type":"done","citations":[...],"searchProvider":"duckduckgo","llmProvider":"openai","llmModel":"gpt-4o-mini","tokensUsed":{"input":3456,"output":789}}
Examples
curl -X POST https://api.webpeel.dev/v1/answer \
-H "Content-Type: application/json" \
-d '{
"question": "What is MCP and why does it matter for AI agents?",
"llmProvider": "openai",
"llmApiKey": "sk-...",
"maxSources": 5,
"stream": false
}'
const resp = await fetch('https://api.webpeel.dev/v1/answer', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
question: 'What is MCP and why does it matter for AI agents?',
llmProvider: 'openai',
llmApiKey: 'sk-...',
maxSources: 5
})
});
const data = await resp.json();
console.log(data.answer);
console.log(data.citations);
import urllib.request, json
body = json.dumps({
"question": "What is MCP and why does it matter for AI agents?",
"llmProvider": "openai",
"llmApiKey": "sk-...",
"maxSources": 5
}).encode()
req = urllib.request.Request(
"https://api.webpeel.dev/v1/answer",
data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
print(data["answer"])
POST /v1/agent
Run an autonomous AI research agent. It plans search queries, fetches pages, and synthesizes a structured answer using your LLM (BYOK). Supports basic and thorough research depth, topic tuning, JSON Schema output, and SSE streaming.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
prompt required |
string | — | Natural language research task |
llmApiKey required |
string | — | OpenAI-compatible API key (BYOK) |
llmModel |
string | gpt-4o-mini |
LLM model to use |
llmApiBase |
string | https://api.openai.com/v1 |
Custom LLM API base URL (for OpenRouter, local models, etc.) |
urls |
string[] | — | Starting URLs. If not provided, the agent searches the web. |
depth |
string | basic |
Research depth: basic (1 query, ~3 sources) or thorough (multi-query, ~10 sources, gap analysis) |
topic |
string | general |
Topic hint: general, news, technical, academic. Adjusts search queries and source prioritization. |
maxSources |
number | 5 |
Max sources to fetch and analyze (1–20) |
maxCredits |
number | — | Spending cap (stops fetching when reached) |
outputSchema |
object | — | JSON Schema for structured output. Agent validates and may retry if output doesn't match. |
schema |
object | — | Legacy alias for outputSchema |
stream |
boolean | false |
When true, stream progress and answer via SSE |
Response (synchronous)
{
"success": true,
"data": {
"answer": "## AI Coding Assistants Comparison\n...",
"keyFindings": ["Finding 1", "Finding 2"]
},
"answer": "## AI Coding Assistants Comparison\n...",
"sources": ["https://...", "https://..."],
"sourcesDetailed": [
{ "url": "https://...", "title": "Source Title" }
],
"pagesVisited": 5,
"creditsUsed": 6,
"tokensUsed": { "input": 12000, "output": 2500 }
}
SSE Streaming Events
When stream=true, the endpoint returns text/event-stream:
data: {"type":"step","action":"searching","query":"AI coding assistants 2026"}
data: {"type":"step","action":"fetching","url":"https://..."}
data: {"type":"step","action":"analyzing","summary":"Synthesizing from 5 sources..."}
data: {"type":"chunk","text":"## AI Coding Assistants\n"}
data: {"type":"chunk","text":"Here are the top 5..."}
data: {"type":"done","answer":"...","sources":[...],"tokensUsed":{...}}
Async Mode
For long-running research, use the async endpoint:
Same request body as above, plus optional webhook (WebhookConfig object). Returns a job ID:
{ "success": true, "id": "job-uuid", "url": "/v1/agent/job-uuid" }
Poll job status and results.
Cancel a running agent job.
Example
curl -X POST https://api.webpeel.dev/v1/agent \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"prompt": "Compare the top 5 AI coding assistants by features and pricing",
"llmApiKey": "sk-...",
"depth": "thorough",
"topic": "technical",
"maxSources": 10,
"outputSchema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"features": { "type": "array", "items": { "type": "string" } },
"pricing": { "type": "string" }
},
"required": ["name", "features", "pricing"]
}
}
}'
POST /v1/crawl
Start an async crawl job. Follows links from a starting URL and extracts content from each page. Returns a job ID for polling. Supports webhooks and SSE for real-time progress.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
url required |
string | — | Starting URL to crawl from |
limit |
number | 100 |
Maximum pages to crawl |
maxDepth |
number | 3 |
Maximum link-following depth |
scrapeOptions |
object | — | Passed through to each page fetch (e.g. format, render) |
location |
string | — | Country code for geo-targeting |
languages |
string | string[] | — | Language preferences |
webhook |
object | — | Webhook config for progress notifications |
Response (202 Accepted)
{
"success": true,
"id": "crawl_abc123",
"url": "/v1/crawl/crawl_abc123"
}
GET /v1/crawl/:id — Poll Results
Returns job status and results. Supports SSE by sending Accept: text/event-stream for real-time progress updates.
{
"success": true,
"status": "completed",
"progress": 100,
"total": 50,
"completed": 50,
"creditsUsed": 50,
"expiresAt": "2026-02-16T00:00:00.000Z",
"data": [
{
"url": "https://example.com",
"title": "Home",
"markdown": "# Home\n...",
"links": ["https://example.com/about"]
}
]
}
DELETE /v1/crawl/:id — Cancel Job
Cancel a queued or in-progress crawl job.
Example
# Start crawl
curl -X POST https://api.webpeel.dev/v1/crawl \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{ "url": "https://docs.example.com", "limit": 50, "maxDepth": 3 }'
# Poll results
curl "https://api.webpeel.dev/v1/crawl/crawl_abc123" \
-H "Authorization: Bearer YOUR_API_KEY"
# Cancel
curl -X DELETE "https://api.webpeel.dev/v1/crawl/crawl_abc123" \
-H "Authorization: Bearer YOUR_API_KEY"
// Start crawl
const { id } = await (await fetch('https://api.webpeel.dev/v1/crawl', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({ url: 'https://docs.example.com', limit: 50 })
})).json();
// Poll until complete
let job;
do {
await new Promise(r => setTimeout(r, 2000));
job = await (await fetch(`https://api.webpeel.dev/v1/crawl/${id}`, {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
})).json();
console.log(`Progress: ${job.completed}/${job.total}`);
} while (job.status === 'processing' || job.status === 'queued');
console.log(`Done! ${job.data.length} pages crawled.`);
import urllib.request, json, time
# Start crawl
body = json.dumps({"url": "https://docs.example.com", "limit": 50}).encode()
req = urllib.request.Request(
"https://api.webpeel.dev/v1/crawl", data=body, method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", "Bearer YOUR_API_KEY")
with urllib.request.urlopen(req) as resp:
job_id = json.loads(resp.read())["id"]
# Poll until complete
while True:
time.sleep(2)
req = urllib.request.Request(f"https://api.webpeel.dev/v1/crawl/{job_id}")
req.add_header("Authorization", "Bearer YOUR_API_KEY")
with urllib.request.urlopen(req) as resp:
job = json.loads(resp.read())
if job["status"] in ("completed", "failed"):
break
print(f"Progress: {job['completed']}/{job['total']}")
print(f"Done! {len(job['data'])} pages.")
POST /v1/batch/scrape
Batch scrape multiple URLs concurrently. Creates an async job with up to 5 parallel fetches. Poll for results or use webhooks.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
urls required |
string[] | — | Array of URLs to scrape (max 100) |
formats |
string[] | ["markdown"] |
Output formats |
maxTokens |
number | — | Max tokens per result |
extract |
object | — | Extraction options (passed through to each fetch) |
webhook |
string | — | Webhook URL for progress notifications |
Response (202 Accepted)
{
"success": true,
"id": "batch_xyz789",
"url": "/v1/batch/scrape/batch_xyz789"
}
GET /v1/batch/scrape/:id — Poll Results
Same response format as crawl jobs:
{
"success": true,
"status": "completed",
"progress": 100,
"total": 10,
"completed": 10,
"creditsUsed": 10,
"data": [ { "url": "...", "content": "...", ... }, ... ]
}
DELETE /v1/batch/scrape/:id — Cancel Job
Cancel a running batch job.
Example
curl -X POST https://api.webpeel.dev/v1/batch/scrape \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"formats": ["markdown"]
}'
POST /v1/map
Discover all URLs on a domain using sitemap.xml and link crawling. Returns a flat list of URLs without fetching page content. Much faster and cheaper than a full crawl.
Request Body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
url required |
string | — | Starting URL or domain |
limit |
number | 5000 |
Maximum URLs to return |
search |
string | — | Filter URLs containing this substring |
Response
{
"success": true,
"links": [
"https://example.com/",
"https://example.com/about",
"https://example.com/blog",
"https://example.com/blog/post-1",
...
]
}
Example
curl -X POST https://api.webpeel.dev/v1/map \
-H "Content-Type: application/json" \
-d '{ "url": "https://example.com", "limit": 100, "search": "/blog" }'
GET /v1/jobs
List all async jobs (crawl, batch, agent). Useful for dashboards and monitoring.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
type | string | — | Filter by type: crawl, batch, extract |
status | string | — | Filter by status: queued, processing, completed, failed, cancelled |
limit | number | 50 | Maximum results to return |
Response
{
"success": true,
"count": 3,
"jobs": [
{
"id": "crawl_abc123",
"type": "crawl",
"status": "completed",
"progress": 100,
"total": 50,
"completed": 50,
"creditsUsed": 50,
"createdAt": "2026-02-15T10:00:00Z",
"expiresAt": "2026-02-16T10:00:00Z"
}
]
}
POST /v1/scrape (Firecrawl-compatible)
Drop-in replacement for Firecrawl's /v1/scrape endpoint. Change your base URL from api.firecrawl.dev to api.webpeel.dev and your existing code works — zero changes needed.
Request Body (Firecrawl format)
| Field | Type | Default | Description |
|---|---|---|---|
url required |
string | — | URL to scrape |
formats |
string[] | ["markdown"] |
Output formats: markdown, html, rawHtml, links, screenshot, images, json, branding, summary |
onlyMainContent |
boolean | true |
Smart content extraction (Firecrawl defaults to true) |
includeTags |
string[] | — | Tags to include |
excludeTags |
string[] | — | Tags to exclude |
waitFor |
number | — | Wait time in ms (auto-enables browser rendering) |
timeout |
number | 30000 |
Request timeout in ms |
actions |
object[] | — | Page actions (Firecrawl format supported) |
headers |
object | — | Custom headers |
location |
object | — | Object with country and languages |
| Inline LLM Extraction (BYOK) | |||
extract |
object | — | { schema: {...}, prompt: "..." } for LLM extraction |
llmProvider |
string | — | openai, anthropic, or google |
llmApiKey |
string | — | Your LLM API key (BYOK) |
llmModel |
string | — | Model override |
Response
{
"success": true,
"data": {
"markdown": "# Page Title\n\nContent...",
"metadata": {
"title": "Page Title",
"description": "...",
"sourceURL": "https://example.com",
"statusCode": 200
},
"links": ["https://..."],
"screenshot": "data:image/png;base64,...",
"json": { ... }
}
}
POST /v2/scrape (Firecrawl v2)
Firecrawl v2 compatible endpoint with first-class screenshot support. When formats is ["screenshot"] only, it uses a dedicated screenshot pipeline for faster, higher-quality captures.
Accepts the same body as POST /v1/scrape plus these additional fields:
| Field | Type | Default | Description |
|---|---|---|---|
fullPage | boolean | false | Full-page screenshot |
width | number | — | Viewport width (pixels) |
height | number | — | Viewport height (pixels) |
screenshotFormat | string | png | Image format: png or jpeg |
quality | number | — | JPEG quality (1–100) |
POST /v2/scrape is an alias for POST /v1/fetch. They share the same handler. Use whichever you prefer.
POST /mcp
Hosted MCP endpoint using Streamable HTTP transport. Connect any MCP client (Claude, Cursor, VS Code) with a single URL — no local server required.
Connection
{
"url": "https://api.webpeel.dev/mcp",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
Available Tools
| Tool Name | Description |
|---|---|
webpeel_fetch | Fetch a URL and return clean markdown. Supports render, stealth, actions, inline extraction. |
webpeel_search | Search the web (DuckDuckGo). Returns titles, URLs, snippets. |
webpeel_crawl | Crawl a website following links. Max 100 pages, depth 1–5. |
webpeel_map | Discover all URLs on a domain via sitemap + crawling. |
webpeel_extract | Extract structured data using CSS selectors or AI. |
webpeel_batch | Fetch multiple URLs in batch (max 50, configurable concurrency). |
webpeel_agent | Autonomous web research: search, fetch, synthesize (BYOK). |
Stateless: No session management required. Each request is authenticated and handled independently.
GET /mcp returns 405. DELETE /mcp returns 200 (no-op for stateless mode).
GET /health
Health check endpoint. Returns server status, version, and uptime. Not rate-limited — safe for monitoring services to poll frequently.
Response
{
"status": "healthy",
"version": "0.8.1",
"uptime": 86400,
"timestamp": "2026-02-15T21:00:00.000Z"
}
GET /v1/stats
Get usage statistics for the authenticated API key's account. Shows total requests, success rate, and average response time.
Response
{
"totalRequests": 1523,
"successRate": 98.2,
"avgResponseTime": 342
}
501 Not Implemented.
GET /v1/cli/usage
Get detailed usage information for the CLI's webpeel usage command. Shows plan info, weekly quota usage, burst limits, and reset times.
Response
{
"plan": {
"tier": "pro",
"weeklyLimit": 1250,
"burstLimit": 100
},
"weekly": {
"used": 342,
"limit": 1250,
"remaining": 908,
"resetsAt": "2026-02-17T00:00:00.000Z",
"percentUsed": 27
},
"burst": {
"used": 12,
"limit": 100,
"resetsIn": "47m"
},
"canFetch": true,
"upgradeUrl": "https://webpeel.dev/#pricing"
}
Webhooks
Async jobs (crawl, batch, agent) support webhook notifications. Webhooks are signed with HMAC-SHA256 if you provide a secret, and retried up to 3 times with exponential backoff.
Webhook Config Object
| Field | Type | Description |
|---|---|---|
url | string | HTTPS URL to receive webhooks |
events | string[] | Events to subscribe to: started, page, completed, failed |
secret | string | HMAC-SHA256 signing secret. Signature sent in X-WebPeel-Signature header. |
metadata | object | Custom metadata included in every webhook payload |
Webhook Payload
{
"event": "completed",
"timestamp": "2026-02-15T21:00:00.000Z",
"data": {
"jobId": "crawl_abc123",
"total": 50,
"completed": 50
}
}
Rate Limits
WebPeel uses a two-tier rate limiting system: weekly quotas (soft limit) and hourly burst limits (hard limit).
| Tier | Weekly Quota | Burst Limit | Price |
|---|---|---|---|
| Free | 125 / week | 25 / hour | $0 |
| Pro | 1,250 / week | 100 / hour | $9/mo |
| Max | 6,250 / week | 500 / hour | $29/mo |
Soft vs Hard Limits
- Weekly quota exceeded (soft limit): Requests are not blocked. Instead, they are degraded to HTTP-only mode (no browser rendering or stealth). The
X-Soft-Limitedheader is set. - Burst limit exceeded (hard limit): Returns
429 Too Many Requestswith aRetry-Afterheader. - Extra usage: If enabled on your account, you can continue using full features beyond your weekly quota at per-request pricing.
Response Headers
WebPeel includes useful metadata in response headers:
| Header | Description |
|---|---|
X-Credits-Used | Number of credits consumed by this request |
X-Processing-Time | Server-side processing time in milliseconds |
X-Fetch-Type | Fetch method used: basic, stealth, search, screenshot |
X-Cache | HIT or MISS |
X-Cache-Age | Cache entry age in seconds (on HIT) |
X-WebPeel-Plan | Your plan tier: free, pro, max |
X-Weekly-Limit | Your weekly quota |
X-Weekly-Used | Credits used this week |
X-Weekly-Remaining | Credits remaining this week |
X-Weekly-Resets-At | ISO timestamp of next Monday 00:00 UTC |
X-Burst-Limit | Hourly burst limit |
X-Burst-Used | Burst requests this hour |
X-Burst-Remaining | Burst requests remaining |
X-Soft-Limited | Present when over weekly quota (degraded mode) |
X-Degraded | Explanation of degradation (e.g. "render=true downgraded to HTTP-only") |
X-Extra-Usage-Charged | Cost charged to extra usage balance |
X-Extra-Usage-New-Balance | Remaining extra usage balance |
X-RateLimit-Limit | Burst limit for the sliding window |
X-RateLimit-Remaining | Remaining requests in the window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Error Responses
All errors return a consistent JSON format:
{
"error": "error_code",
"message": "Human-readable error message"
}
| Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request | Missing or invalid parameters |
| 400 | invalid_url | Malformed URL or URL too long (>2048 chars) |
| 400 | forbidden_url | SSRF protection — cannot fetch localhost, private networks, or non-HTTP URLs |
| 400 | invalid_json | Malformed JSON in request body |
| 401 | invalid_key | Invalid or revoked API key |
| 401 | unauthorized | Valid API key required (for protected endpoints) |
| 404 | not_found | Route not found or job ID not found |
| 429 | rate_limited | Hourly burst limit exceeded. Check Retry-After header. |
| 429 | burst_limit_exceeded | Hourly burst limit exceeded (per API key). Check Retry-After. |
| 500 | TIMEOUT | Request timed out while fetching |
| 500 | BLOCKED | Target site blocked the request — try render=true or stealth=true |
| 500 | NETWORK | Network error (DNS failure, connection refused, etc.) |
| 500 | internal_error | Unexpected server error |
| 500 | search_failed | Search request failed |
| 500 | answer_failed | Answer generation failed |
| 501 | not_implemented | Feature requires PostgreSQL backend (self-hosted only) |