Crawl & Map
Recursively crawl a website to scrape all pages, or map a domain's URL structure. Fully Firecrawl-compatible — change only the base URL to migrate.
/v1/crawl and /v1/map. See Migrating from Firecrawl for full details.
Endpoints
Start an async crawl job. Scrapes all pages on a domain, respecting the same-origin policy. Returns a job ID for polling.
Poll crawl job status and retrieve completed page data.
Discover and return all URLs on a domain without scraping the content. Fast sitemap generation.
POST /v1/crawl
Request Body
| Parameter | Type | Default | Description |
|---|---|---|---|
| url | string | — | Required. Starting URL. The crawler stays within this domain. |
| limit | number | 100 |
Maximum number of pages to crawl. |
| maxDepth | number | 3 |
Maximum link depth from the starting URL. |
| formats | string[] | ["markdown"] |
Output format for each page: markdown, html, links. |
| excludePaths | string[] | — | URL path patterns to exclude from crawling (e.g. ["/admin", "/login"]). |
| includePaths | string[] | — | Only crawl paths matching these patterns. |
| webhook | string | — | Webhook URL for job events (started, page, completed, failed). |
| scrapeOptions | object | — | Options for each page fetch (see API Reference for full option list). |
Response — 202 Accepted
{
"success": true,
"id": "crawl_01J8XKZP4T...",
"url": "/v1/crawl/crawl_01J8XKZP4T..."
}
GET /v1/crawl/:id
Response
{
"success": true,
"status": "completed",
"total": 48,
"completed": 48,
"creditsUsed": 48,
"expiresAt": "2024-03-05T12:00:00.000Z",
"data": [
{
"markdown": "# Home Page\n\nWelcome to...",
"metadata": {
"title": "Home — Example.com",
"description": "...",
"sourceURL": "https://example.com",
"statusCode": 200
}
},
{
"markdown": "# About Us\n\n...",
"metadata": {
"title": "About — Example.com",
"sourceURL": "https://example.com/about",
"statusCode": 200
}
}
]
}
Job Status Values
pending— queued, not yet startedscraping— actively crawling pages (maps toprocessingin Firecrawl format)completed— crawl finishedfailed— crawl failedcancelled— cancelled by user
POST /v1/map
Quickly enumerate all URLs on a domain by parsing sitemaps and following links — without scraping page content. Much faster than a full crawl.
Request Body
| Parameter | Type | Default | Description |
|---|---|---|---|
| url | string | — | Required. Domain root URL to map. |
| limit | number | 5000 |
Maximum number of URLs to return. |
| search | string | — | Filter URLs containing this substring. |
Response
{
"success": true,
"links": [
"https://example.com",
"https://example.com/about",
"https://example.com/pricing",
"https://example.com/blog",
"https://example.com/blog/post-1",
"https://example.com/docs",
"..."
]
}
Firecrawl-Compatible Scrape Endpoint
In addition to crawl and map, WebPeel also exposes Firecrawl-compatible scrape endpoints:
Firecrawl v1 scrape — same request/response format as Firecrawl's POST /v1/scrape. Accepts formats, actions, extract, and all Firecrawl options.
Firecrawl v2 scrape with first-class screenshot support. When formats: ["screenshot"] is the only format, returns image data directly.
Examples
curl -X POST https://api.webpeel.dev/v1/crawl \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"maxDepth": 3,
"formats": ["markdown"],
"excludePaths": ["/admin", "/login", "/cart"]
}'
curl https://api.webpeel.dev/v1/crawl/crawl_01J8XKZP4T... \
-H "Authorization: Bearer YOUR_API_KEY"
# Map all URLs on a domain
curl -X POST https://api.webpeel.dev/v1/map \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"limit": 1000
}'
# Filter to only blog URLs
curl -X POST https://api.webpeel.dev/v1/map \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"search": "/blog/",
"limit": 500
}'
# Firecrawl-compatible scrape (drop-in replacement)
curl -X POST https://api.webpeel.dev/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown", "links"],
"onlyMainContent": true
}'
const API_KEY = process.env.WEBPEEL_API_KEY;
const BASE = 'https://api.webpeel.dev';
// Start crawl
const startRes = await fetch(`${BASE}/v1/crawl`, {
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://docs.example.com',
limit: 100,
formats: ['markdown'],
}),
});
const { id } = await startRes.json();
// Poll until complete
let result;
do {
await new Promise(r => setTimeout(r, 3000));
const pollRes = await fetch(`${BASE}/v1/crawl/${id}`, {
headers: { 'Authorization': `Bearer ${API_KEY}` },
});
result = await pollRes.json();
console.log(`${result.completed}/${result.total} pages crawled (${result.status})`);
} while (['pending', 'scraping'].includes(result.status));
console.log(`Crawl complete. ${result.data.length} pages scraped.`);
Limits
- Max pages per crawl: Configurable via
limitparameter - Max map URLs: 5000 per request
- Credits: 1 credit per page scraped during crawl
- Map credits: 1 credit per map request (regardless of URL count)
- Job TTL: Results expire after 24 hours
- Scope: Crawl stays within the origin domain (no cross-domain following)