This page documents the exact methodology behind the performance numbers shown on webpeel.dev — test setup, domain selection, metric definitions, and comparison approach.
Last updated: February 2026 · Tested across 512 URLs · Re-run quarterly
All tests were run from a single AWS t3.medium instance in us-east-1. Each extraction attempt used the WebPeel API endpoint at api.webpeel.dev — the same infrastructure all users access. No special provisioning or cache warming was performed before testing.
Each URL was fetched 5 times in sequence with a 10-second pause between attempts. The median result across 5 runs was recorded. Failed attempts (timeout > 30s, HTTP 5xx, or empty response body) were counted as failures. A single failed run in 5 did not constitute a domain failure — majority (≥3/5) success was required for the domain to be marked successful.
Testing was conducted over a 72-hour window in February 2026. Domains were retested at three different times of day (08:00, 14:00, 22:00 UTC) to account for traffic-based anti-bot variations.
Measures what percentage of the main article/product body text is returned in the extraction output, relative to the full page source. Evaluated by comparing extracted token count against a manually-verified "ground truth" extraction for each test URL.
We define completeness as: (extracted_tokens / expected_tokens) × 100, capped at 100%. Navigation menus, cookie banners, footers, and advertisement blocks are excluded from expected content. Scores below 60% were counted as "incomplete." The 98% figure reflects the average completeness score across all successfully fetched domains.
The percentage of fetch attempts that returned usable, non-blocked content on sites with active bot protection. A "protected" site is defined as any domain using Cloudflare Bot Management, Akamai, PerimeterX, Datadome, or similar CAPTCHA/JS challenge systems — verified by checking response headers and challenge page fingerprints.
82 of the 512 test URLs were classified as protected. Of these, WebPeel successfully extracted content from 80 (97.6%). The 2 failures were sites using aggressive fingerprinting that required browser-resident sessions not replicated by our headless environment.
Time from API request received to first byte of structured response, measured server-side. Excludes network round-trip from client to server. The p50 (median) across all successful extractions was 396ms. The 650ms figure is our p75 — i.e., 75% of extractions complete in under 650ms. P99 is approximately 4.2 seconds (browser-rendered pages with heavy JS).
Measured over a 90-day rolling window using external synthetic monitoring (UptimeRobot, 1-minute intervals). The API endpoint https://api.webpeel.dev/health is checked every 60 seconds from 3 geographic regions. Downtime is counted as any period where ≥2 of 3 regions report failure. The 99.9% SLA represents our target and historical average — exact current uptime is shown on our status page.
512 URLs across 32 domain categories. Domains were selected to represent a realistic distribution of content types encountered by AI agents in production — news, e-commerce, documentation, paywalled content, and protected enterprise sites.
Disclosure: We built WebPeel. These benchmarks were run by us, not an independent third party. We have documented our methodology in full so you can reproduce the results. The test scripts are available in the benchmarks/ directory of our open-source repository.
Firecrawl was tested using their hosted API (api.firecrawl.dev) with a paid plan. The same 512 URLs were submitted to POST /v1/scrape with formats: ["markdown"] — their recommended extraction endpoint. API calls were rate-limited to their documented limits. Test was run with a valid Firecrawl API key during February 2026.
| Tool | Content Completeness | Protected Site Success | Avg Latency (p50) |
|---|---|---|---|
| WebPeel | 98% | 97.6% | 396ms |
| Firecrawl | 82% | 65% | 1,240ms |
| Raw HTTP fetch | 56% | 35% | 180ms |
"Raw HTTP fetch" refers to a simple fetch(url) call with a standard browser User-Agent and no special headers or rendering. This represents the baseline of what you'd get using curl or a naive Node.js/Python request, without any bot mitigation, JavaScript rendering, or readability processing. It is included as a baseline, not as a competitive comparison.
For each URL, a "ground truth" extraction was created by manually identifying the main body content (article text, product description, documentation body) and recording its token count. Each tool's extraction was then scored by comparing its output token count against ground truth — navigation menus, ads, footers, and sidebars were excluded from scoring. Scores were normalized 0–100%.
A domain was classified as "protected" if its HTTP response headers included evidence of Cloudflare Bot Management, Akamai mPulse, PerimeterX, Datadome, or similar systems — or if it returned a CAPTCHA/JS challenge page on first request with a plain User-Agent.
These results represent a point-in-time measurement. Anti-bot systems update continuously, and success rates on specific domains may have changed since this test was run. We re-run benchmarks quarterly and update this page accordingly.
Firecrawl's performance may vary based on plan tier, concurrency, and their ongoing infrastructure improvements. Our results reflect a single test period and are not an authoritative or permanent characterization of their product.
Content completeness is inherently subjective — different use cases value different types of content. Our methodology prioritizes main body text as an approximation for AI agent use cases.
Questions about this methodology? Contact us or open an issue on GitHub.