Crawl & Map

Recursively crawl a website to scrape all pages, or map a domain's URL structure. Fully Firecrawl-compatible — change only the base URL to migrate.

Firecrawl compatible: These endpoints accept the same request/response format as Firecrawl's /v1/crawl and /v1/map. See Migrating from Firecrawl for full details.

Endpoints

POST/v1/crawlAuth RequiredFirecrawl compat

Start an async crawl job. Scrapes all pages on a domain, respecting the same-origin policy. Returns a job ID for polling.

GET/v1/crawl/:idAuth Required

Poll crawl job status and retrieve completed page data.

POST/v1/mapAuth RequiredFirecrawl compat

Discover and return all URLs on a domain without scraping the content. Fast sitemap generation.

POST /v1/crawl

Request Body

Parameter	Type	Default	Description
url	string	—	Required. Starting URL. The crawler stays within this domain.
limit	number	`100`	Maximum number of pages to crawl.
maxDepth	number	`3`	Maximum link depth from the starting URL.
formats	string[]	`["markdown"]`	Output format for each page: `markdown`, `html`, `links`.
excludePaths	string[]	—	URL path patterns to exclude from crawling (e.g. `["/admin", "/login"]`).
includePaths	string[]	—	Only crawl paths matching these patterns.
webhook	string	—	Webhook URL for job events (`started`, `page`, `completed`, `failed`).
scrapeOptions	object	—	Options for each page fetch (see API Reference for full option list).

Response — 202 Accepted

{
  "success": true,
  "id": "crawl_01J8XKZP4T...",
  "url": "/v1/crawl/crawl_01J8XKZP4T..."
}

GET /v1/crawl/:id

Response

{
  "success": true,
  "status": "completed",
  "total": 48,
  "completed": 48,
  "creditsUsed": 48,
  "expiresAt": "2024-03-05T12:00:00.000Z",
  "data": [
    {
      "markdown": "# Home Page\n\nWelcome to...",
      "metadata": {
        "title": "Home — Example.com",
        "description": "...",
        "sourceURL": "https://example.com",
        "statusCode": 200
      }
    },
    {
      "markdown": "# About Us\n\n...",
      "metadata": {
        "title": "About — Example.com",
        "sourceURL": "https://example.com/about",
        "statusCode": 200
      }
    }
  ]
}

Job Status Values

pending — queued, not yet started
scraping — actively crawling pages (maps to processing in Firecrawl format)
completed — crawl finished
failed — crawl failed
cancelled — cancelled by user

POST /v1/map

Quickly enumerate all URLs on a domain by parsing sitemaps and following links — without scraping page content. Much faster than a full crawl.

Request Body

Parameter	Type	Default	Description
url	string	—	Required. Domain root URL to map.
limit	number	`5000`	Maximum number of URLs to return.
search	string	—	Filter URLs containing this substring.

Response

{
  "success": true,
  "links": [
    "https://example.com",
    "https://example.com/about",
    "https://example.com/pricing",
    "https://example.com/blog",
    "https://example.com/blog/post-1",
    "https://example.com/docs",
    "..."
  ]
}

Firecrawl-Compatible Scrape Endpoint

In addition to crawl and map, WebPeel also exposes Firecrawl-compatible scrape endpoints:

POST/v1/scrapeFirecrawl compat

Firecrawl v1 scrape — same request/response format as Firecrawl's POST /v1/scrape. Accepts formats, actions, extract, and all Firecrawl options.

POST/v2/scrapeFirecrawl v2 compat

Firecrawl v2 scrape with first-class screenshot support. When formats: ["screenshot"] is the only format, returns image data directly.

Examples

curl -X POST https://api.webpeel.dev/v1/crawl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "maxDepth": 3,
    "formats": ["markdown"],
    "excludePaths": ["/admin", "/login", "/cart"]
  }'

curl https://api.webpeel.dev/v1/crawl/crawl_01J8XKZP4T... \
  -H "Authorization: Bearer YOUR_API_KEY"

# Map all URLs on a domain
curl -X POST https://api.webpeel.dev/v1/map \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "limit": 1000
  }'

# Filter to only blog URLs
curl -X POST https://api.webpeel.dev/v1/map \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "search": "/blog/",
    "limit": 500
  }'

# Firecrawl-compatible scrape (drop-in replacement)
curl -X POST https://api.webpeel.dev/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown", "links"],
    "onlyMainContent": true
  }'

const API_KEY = process.env.WEBPEEL_API_KEY;
const BASE = 'https://api.webpeel.dev';

// Start crawl
const startRes = await fetch(`${BASE}/v1/crawl`, {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    url: 'https://docs.example.com',
    limit: 100,
    formats: ['markdown'],
  }),
});
const { id } = await startRes.json();

// Poll until complete
let result;
do {
  await new Promise(r => setTimeout(r, 3000));
  const pollRes = await fetch(`${BASE}/v1/crawl/${id}`, {
    headers: { 'Authorization': `Bearer ${API_KEY}` },
  });
  result = await pollRes.json();
  console.log(`${result.completed}/${result.total} pages crawled (${result.status})`);
} while (['pending', 'scraping'].includes(result.status));

console.log(`Crawl complete. ${result.data.length} pages scraped.`);

Limits

Max pages per crawl: Configurable via limit parameter
Max map URLs: 5000 per request
Credits: 1 credit per page scraped during crawl
Map credits: 1 credit per map request (regardless of URL count)
Job TTL: Results expire after 24 hours
Scope: Crawl stays within the origin domain (no cross-domain following)