Benchmark Methodology

How we measure extraction performance

This page documents the exact methodology behind the performance numbers shown on webpeel.dev — test setup, domain selection, metric definitions, and comparison approach.

Last updated: February 2026 · Tested across 512 URLs · Re-run quarterly

Results Summary

99.9% Uptime SLA
97.6% Protected Site Success
98% Content Completeness
<650ms Avg. Latency (p50)

Test Setup

Infrastructure

All tests were run from a single AWS t3.medium instance in us-east-1. Each extraction attempt used the WebPeel API endpoint at api.webpeel.dev — the same infrastructure all users access. No special provisioning or cache warming was performed before testing.

Test Execution

Each URL was fetched 5 times in sequence with a 10-second pause between attempts. The median result across 5 runs was recorded. Failed attempts (timeout > 30s, HTTP 5xx, or empty response body) were counted as failures. A single failed run in 5 did not constitute a domain failure — majority (≥3/5) success was required for the domain to be marked successful.

Time Period

Testing was conducted over a 72-hour window in February 2026. Domains were retested at three different times of day (08:00, 14:00, 22:00 UTC) to account for traffic-based anti-bot variations.


Metric Definitions

Content Completeness (98%)

Measures what percentage of the main article/product body text is returned in the extraction output, relative to the full page source. Evaluated by comparing extracted token count against a manually-verified "ground truth" extraction for each test URL.

We define completeness as: (extracted_tokens / expected_tokens) × 100, capped at 100%. Navigation menus, cookie banners, footers, and advertisement blocks are excluded from expected content. Scores below 60% were counted as "incomplete." The 98% figure reflects the average completeness score across all successfully fetched domains.

Protected Site Success Rate (97.6%)

The percentage of fetch attempts that returned usable, non-blocked content on sites with active bot protection. A "protected" site is defined as any domain using Cloudflare Bot Management, Akamai, PerimeterX, Datadome, or similar CAPTCHA/JS challenge systems — verified by checking response headers and challenge page fingerprints.

82 of the 512 test URLs were classified as protected. Of these, WebPeel successfully extracted content from 80 (97.6%). The 2 failures were sites using aggressive fingerprinting that required browser-resident sessions not replicated by our headless environment.

Average Latency (<650ms p50)

Time from API request received to first byte of structured response, measured server-side. Excludes network round-trip from client to server. The p50 (median) across all successful extractions was 396ms. The 650ms figure is our p75 — i.e., 75% of extractions complete in under 650ms. P99 is approximately 4.2 seconds (browser-rendered pages with heavy JS).

Uptime SLA (99.9%)

Measured over a 90-day rolling window using external synthetic monitoring (UptimeRobot, 1-minute intervals). The API endpoint https://api.webpeel.dev/health is checked every 60 seconds from 3 geographic regions. Downtime is counted as any period where ≥2 of 3 regions report failure. The 99.9% SLA represents our target and historical average — exact current uptime is shown on our status page.


Test Domain List

512 URLs across 32 domain categories. Domains were selected to represent a realistic distribution of content types encountered by AI agents in production — news, e-commerce, documentation, paywalled content, and protected enterprise sites.

News & Media (64 URLs)
nytimes.com theverge.com wired.com techcrunch.com bbc.com reuters.com apnews.com washingtonpost.com ft.com bloomberg.com arstechnica.com engadget.com venturebeat.com thenextweb.com
Developer Docs & Technical (80 URLs)
docs.python.org developer.mozilla.org docs.anthropic.com platform.openai.com docs.github.com stackoverflow.com npmjs.com pypi.org readthedocs.io github.com/* huggingface.co arxiv.org kubernetes.io vercel.com/docs
E-commerce (80 URLs)
amazon.com shopify.com etsy.com ebay.com bestbuy.com target.com walmart.com apple.com/shop wayfair.com newegg.com
Cloudflare / Bot-Protected (82 URLs)
stripe.com cloudflare.com discord.com notion.so figma.com linear.app vercel.com openai.com anthropic.com x.com linkedin.com glassdoor.com
Knowledge / Encyclopedia (48 URLs)
en.wikipedia.org britannica.com wikihow.com investopedia.com healthline.com webmd.com
SaaS / Enterprise (80 URLs)
salesforce.com hubspot.com zendesk.com atlassian.com datadog.com segment.com twilio.com sendgrid.com intercom.com mixpanel.com
Social / Community (48 URLs)
news.ycombinator.com reddit.com dev.to medium.com substack.com hashnode.com
Video / Rich Media (30 URLs)
youtube.com vimeo.com twitch.tv

Comparison Methodology

Disclosure: We built WebPeel. These benchmarks were run by us, not an independent third party. We have documented our methodology in full so you can reproduce the results. The test scripts are available in the benchmarks/ directory of our open-source repository.

Firecrawl

Firecrawl was tested using their hosted API (api.firecrawl.dev) with a paid plan. The same 512 URLs were submitted to POST /v1/scrape with formats: ["markdown"] — their recommended extraction endpoint. API calls were rate-limited to their documented limits. Test was run with a valid Firecrawl API key during February 2026.

Tool Content Completeness Protected Site Success Avg Latency (p50)
WebPeel 98% 97.6% 396ms
Firecrawl 82% 65% 1,240ms
Raw HTTP fetch 56% 35% 180ms

Raw HTTP Fetch

"Raw HTTP fetch" refers to a simple fetch(url) call with a standard browser User-Agent and no special headers or rendering. This represents the baseline of what you'd get using curl or a naive Node.js/Python request, without any bot mitigation, JavaScript rendering, or readability processing. It is included as a baseline, not as a competitive comparison.

Content Completeness Scoring

For each URL, a "ground truth" extraction was created by manually identifying the main body content (article text, product description, documentation body) and recording its token count. Each tool's extraction was then scored by comparing its output token count against ground truth — navigation menus, ads, footers, and sidebars were excluded from scoring. Scores were normalized 0–100%.

Protected Site Classification

A domain was classified as "protected" if its HTTP response headers included evidence of Cloudflare Bot Management, Akamai mPulse, PerimeterX, Datadome, or similar systems — or if it returned a CAPTCHA/JS challenge page on first request with a plain User-Agent.


Limitations & Caveats

These results represent a point-in-time measurement. Anti-bot systems update continuously, and success rates on specific domains may have changed since this test was run. We re-run benchmarks quarterly and update this page accordingly.

Firecrawl's performance may vary based on plan tier, concurrency, and their ongoing infrastructure improvements. Our results reflect a single test period and are not an authoritative or permanent characterization of their product.

Content completeness is inherently subjective — different use cases value different types of content. Our methodology prioritizes main body text as an approximation for AI agent use cases.


Questions about this methodology? Contact us or open an issue on GitHub.