💡 Try this endpoint in the Playground →

Batch Scraping

Scrape up to 100 URLs concurrently in a single API call. Supports real-time SSE streaming for live results, async job polling, and webhook callbacks.

Two delivery modes: Send Accept: text/event-stream for real-time SSE streaming as each URL completes, or use the default async mode to poll GET /v1/batch/scrape/:id for the full batch result.

Endpoints

POST/v1/batch/scrapeAuth Required

Submit a batch of URLs for concurrent scraping. Returns a job ID immediately; results are delivered via SSE or polling.

GET/v1/batch/scrape/:idAuth Required

Poll a batch job for status and results. Returns progress, completed count, and the full data array when done.

DELETE/v1/batch/scrape/:idAuth Required

Cancel a running batch job. Only works on jobs that are pending or processing.

POST /v1/batch/scrape

Request Body

ParameterTypeRequiredDescription
urls string[] Required Array of URLs to scrape. Maximum 100 URLs per batch.
formats string[] Optional Output formats. Default: ["markdown"]. Options: markdown, html, text.
concurrency number Optional Max concurrent fetches. Default: 5. Range: 1–10.
extract object Optional Structured extraction options. See Structured Extraction.
maxTokens number Optional Truncate content to this many tokens per URL.
webhook string Optional URL to receive webhook events: started, page, completed, failed.

Response (Async Mode — default)

Returns 202 Accepted immediately with a job ID:

{
  "success": true,
  "id": "batch_01J8XKZP4T...",
  "url": "/v1/batch/scrape/batch_01J8XKZP4T..."
}

Response (SSE Mode)

Send Accept: text/event-stream to receive results as each URL completes:

event: started
data: {"batchId":"batch_01J8XKZP4T...","totalUrls":3}

event: result
data: {"url":"https://example.com","content":"# Example\n...","index":0}

event: error
data: {"url":"https://broken.url","error":"FETCH_ERROR","message":"...","index":1}

event: done
data: {"batchId":"batch_01J8XKZP4T...","completed":2,"failed":1,"duration":4521}

GET /v1/batch/scrape/:id

Response

{
  "success": true,
  "status": "completed",
  "total": 3,
  "completed": 3,
  "creditsUsed": 3,
  "data": [
    {
      "url": "https://example.com",
      "content": "# Example Domain\nThis domain is for illustrative examples...",
      "metadata": { "title": "Example Domain", "description": "..." }
    },
    {
      "url": "https://broken.url",
      "error": "fetch failed"
    }
  ],
  "expiresAt": "2024-03-05T12:00:00.000Z"
}

Job Status Values

Examples

# Submit batch
curl -X POST https://api.webpeel.dev/v1/batch/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com",
      "https://httpbin.org/get",
      "https://news.ycombinator.com"
    ],
    "formats": ["markdown"],
    "concurrency": 3
  }'

# Poll for results
curl https://api.webpeel.dev/v1/batch/scrape/batch_01J8XKZP4T... \
  -H "Authorization: Bearer YOUR_API_KEY"
# Stream results as each URL completes
curl -X POST https://api.webpeel.dev/v1/batch/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "urls": ["https://example.com", "https://news.ycombinator.com"],
    "concurrency": 2
  }'
// Submit and poll
const res = await fetch('https://api.webpeel.dev/v1/batch/scrape', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.WEBPEEL_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    urls: ['https://example.com', 'https://httpbin.org/get'],
    formats: ['markdown'],
    concurrency: 5,
  }),
});
const { id } = await res.json();

// Poll until complete
let job;
do {
  await new Promise(r => setTimeout(r, 2000));
  const poll = await fetch(`https://api.webpeel.dev/v1/batch/scrape/${id}`, {
    headers: { 'Authorization': `Bearer ${process.env.WEBPEEL_API_KEY}` },
  });
  job = await poll.json();
} while (job.status === 'pending' || job.status === 'processing');

console.log(`Done: ${job.completed} pages fetched`);
console.log(job.data);

Webhook Events

If you provide a webhook URL, WebPeel POSTs JSON events as the batch progresses:

EventPayload
started { jobId, total }
page { jobId, url, completed, total }
completed { jobId, total, completed }
failed { jobId, error }

Limits