Try this endpoint in the Playground →
Batch Scraping
Scrape up to 100 URLs concurrently in a single API call. Supports real-time SSE streaming for live results, async job polling, and webhook callbacks.
Two delivery modes: Send
Accept: text/event-stream for real-time SSE streaming as each URL completes, or use the default async mode to poll GET /v1/batch/scrape/:id for the full batch result.
Endpoints
POST/v1/batch/scrapeAuth Required
Submit a batch of URLs for concurrent scraping. Returns a job ID immediately; results are delivered via SSE or polling.
GET/v1/batch/scrape/:idAuth Required
Poll a batch job for status and results. Returns progress, completed count, and the full data array when done.
DELETE/v1/batch/scrape/:idAuth Required
Cancel a running batch job. Only works on jobs that are pending or processing.
POST /v1/batch/scrape
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| urls | string[] | Required | Array of URLs to scrape. Maximum 100 URLs per batch. |
| formats | string[] | Optional | Output formats. Default: ["markdown"]. Options: markdown, html, text. |
| concurrency | number | Optional | Max concurrent fetches. Default: 5. Range: 1–10. |
| extract | object | Optional | Structured extraction options. See Structured Extraction. |
| maxTokens | number | Optional | Truncate content to this many tokens per URL. |
| webhook | string | Optional | URL to receive webhook events: started, page, completed, failed. |
Response (Async Mode — default)
Returns 202 Accepted immediately with a job ID:
{
"success": true,
"id": "batch_01J8XKZP4T...",
"url": "/v1/batch/scrape/batch_01J8XKZP4T..."
}
Response (SSE Mode)
Send Accept: text/event-stream to receive results as each URL completes:
event: started
data: {"batchId":"batch_01J8XKZP4T...","totalUrls":3}
event: result
data: {"url":"https://example.com","content":"# Example\n...","index":0}
event: error
data: {"url":"https://broken.url","error":"FETCH_ERROR","message":"...","index":1}
event: done
data: {"batchId":"batch_01J8XKZP4T...","completed":2,"failed":1,"duration":4521}
GET /v1/batch/scrape/:id
Response
{
"success": true,
"status": "completed",
"total": 3,
"completed": 3,
"creditsUsed": 3,
"data": [
{
"url": "https://example.com",
"content": "# Example Domain\nThis domain is for illustrative examples...",
"metadata": { "title": "Example Domain", "description": "..." }
},
{
"url": "https://broken.url",
"error": "fetch failed"
}
],
"expiresAt": "2024-03-05T12:00:00.000Z"
}
Job Status Values
pending— job created, not yet startedprocessing— URLs are being fetchedcompleted— all URLs processed (some may have errored)failed— the job itself failed (infrastructure error)cancelled— cancelled via DELETE
Examples
# Submit batch
curl -X POST https://api.webpeel.dev/v1/batch/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com",
"https://httpbin.org/get",
"https://news.ycombinator.com"
],
"formats": ["markdown"],
"concurrency": 3
}'
# Poll for results
curl https://api.webpeel.dev/v1/batch/scrape/batch_01J8XKZP4T... \
-H "Authorization: Bearer YOUR_API_KEY"
# Stream results as each URL completes
curl -X POST https://api.webpeel.dev/v1/batch/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"urls": ["https://example.com", "https://news.ycombinator.com"],
"concurrency": 2
}'
// Submit and poll
const res = await fetch('https://api.webpeel.dev/v1/batch/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.WEBPEEL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
urls: ['https://example.com', 'https://httpbin.org/get'],
formats: ['markdown'],
concurrency: 5,
}),
});
const { id } = await res.json();
// Poll until complete
let job;
do {
await new Promise(r => setTimeout(r, 2000));
const poll = await fetch(`https://api.webpeel.dev/v1/batch/scrape/${id}`, {
headers: { 'Authorization': `Bearer ${process.env.WEBPEEL_API_KEY}` },
});
job = await poll.json();
} while (job.status === 'pending' || job.status === 'processing');
console.log(`Done: ${job.completed} pages fetched`);
console.log(job.data);
Webhook Events
If you provide a webhook URL, WebPeel POSTs JSON events as the batch progresses:
| Event | Payload |
|---|---|
started |
{ jobId, total } |
page |
{ jobId, url, completed, total } |
completed |
{ jobId, total, completed } |
failed |
{ jobId, error } |
Limits
- Max URLs per batch: 100
- Max concurrency: 10 (default: 5)
- Credits: 1 credit per URL successfully fetched
- Job TTL: Results expire after 24 hours