Changelog — WebPeel

v0.21.39

Latest

March 16, 2026

Self-hosted AI research: WebPeel AI on Hetzner (Ollama + Qwen3 0.6B) — free-tier users get AI-powered research synthesis with zero configuration
SearXNG search integration: Self-hosted meta-search engine as Stage 0 provider — aggregates Google, Bing, Brave, Startpage, and more via residential IP
New /v1/research endpoint: Lightweight research that chains search → fetch → compile with self-hosted LLM
Search enrichment: New ?enrich=N parameter fetches top N search result pages server-side with 4s timeout
Domain hints for extraction: Structured extraction now overlays domain-specific hints for better field matching
Enterprise tiers: Admin burst bypass, enterprise tier (50K weekly / 2K burst / 500 rate limits)
Search reliability: Proxy-routed DDG search, provider lockout fix with decay stats, reverse search order (direct-first)
Research LLM optimization: Compact prompts, think:false mode — synthesis from 26s down to 8.5s
OOM prevention: Guards against out-of-memory in search enrichment and browser escalation

v0.21.16

March 15, 2026

Structured extraction improvements: Version, downloads, and population patterns; better field:value separator matching; NPM extractor now reliably extracts version, weekly_downloads, and license
GitHub token refresh: Refreshed expired GITHUB_TOKEN on Render — resolves "Bad credentials" errors

v0.21.15

March 15, 2026

Token savings fix: tokenSavingsPercent now shows realistic values for domain extractors (was always 0); estimates based on 7× content ratio
HN comment items: HackerNews comment pages now return proper titles ("Comment on: {story}")
Pipeline improvements: Reliability improvements across the extraction pipeline

v0.21.14

March 15, 2026

GitHub API auth: GitHub extractor now uses GITHUB_TOKEN env var — raises rate limit from 60/hr to 5,000/hr; fixes low token counts on Render's shared IPs

v0.21.13

March 15, 2026

PyPI title fix: PyPI extractor now returns a proper title field (name version format)
Env var restore: Restored all required Render env vars that were accidentally wiped (fixes deploys since v0.21.5)

v0.21.12

March 15, 2026

Structured extraction polish: Emoji stripping from extractor output, new director field for movies, more accurate confidence scoring (0–1)
Content quality improvements: Richer domain extractors with smarter fallbacks; actionable error messages with docs links

v0.21.11

March 15, 2026

PeelTLS H2 fix: Fixed double-decompression bug on HTTP/2 responses; added forceHttp1 option
Search relevance scoring: relevanceScore (0–1) now returned on search results — BM25 score against query
CLI progress indicators: Better feedback for long-running fetches

v0.21.9–10

March 15, 2026

Complete anti-bot stealth upgrade: 9 independent patches to TLS fingerprinting, header ordering, and timing randomization
Full test verification pass: All tests run and fixed after stealth upgrade

v0.21.8

March 15, 2026

Proxy routing for search and browser: --proxy now applies to search providers and headless browser rendering
Shared proxy-config module: Centralized proxy parsing across all transport layers

v0.21.7

March 14, 2026

Smart heuristic extraction: Page-type-specific heuristics for product, article, recipe, job listing, and more
Improved error classification: Errors now categorized as blocked, timeout, parse_error, or network_error with actionable hints

v0.21.6

March 14, 2026

Dashboard activity log: Color-coded latency (green <3s, yellow 3–8s, red >8s) with "Normal for web scraping" hint
Docs visual upgrade: Improved code blocks, better mobile layout, fixed sidebar states

v0.21.5

March 14, 2026

Structured JSON extraction (POST /v1/extract): LLM-powered extraction with JSON Schema or natural language prompts. BYOK — bring your own OpenAI-compatible key
Heuristic auto-extraction (GET /v1/extract/auto): Zero-LLM extraction with page-type detection. Returns confidence score (0–1)
Documentation upgrade: Extract API fully documented

v0.21.0

March 13, 2026

Transcript export (GET /v1/transcript/export): Download YouTube transcripts as SRT, TXT, Markdown, or JSON
API key scoping: Create scoped API keys with per-endpoint permissions
Shareable playground links: Share a playground request via URL

v0.18.5

March 6, 2026

Usage counter fix: Dashboard now correctly counts playground/session fetches (was showing 0)
Product Hunt extractor: RSS-based extraction returning 2,000+ tokens (was 101 via search snippet)
Substack handler: Helpful guidance for root URL, individual posts extract cleanly
Plan badge fix: Correctly capitalized everywhere (Free Plan, Pro Plan, Max Plan)
CLI webpeel auth command: Set and verify API key in one step
CLI webpeel status command: Machine-readable auth + usage check for AI agents
apiKey settable via config: webpeel config set apiKey <key> now works
E2E test suite expanded: 17 tests including 8 trending sites

v0.18.4

March 6, 2026

Session recovery after API deploy restarts (email-based fallback)
Stripe Customer Portal: POST /v1/billing/portal endpoint
Email alerts: Nodemailer (free SMTP) replacing Resend

v0.18.3

March 6, 2026

Stripe billing portal backend endpoint
Replaced Resend with Nodemailer for free email alerts
Dashboard billing page now calls real Stripe portal API

v0.18.2

March 6, 2026

WebPeel class: new WebPeel({ apiKey }) OOP wrapper with fetch, search, crawl, map, extract methods
MCP parity fix: Hosted and local MCP servers now match (6 tools each)
Extractive summarization: TF-IDF sentence scoring instead of naive word truncation
4 new doc pages: Getting Started, Authentication, Fetch, Screenshot
Changelog page created: v0.18.0 and v0.18.1 entries
Dashboard dark mode audit fixes: Billing page, onboarding banner, Getting Started tracking
API fixes: Watch interval aliases, usage API key auth, CORS credentials conditional, generic 404, health no version
Landing page: Dead Discord link → support email, upgrade buttons → billing page, pricing card accuracy

v0.18.1

March 2026

NEW: Real-time email alerts when API usage crosses configurable threshold
NEW: Token count now stored and displayed in Activity log for every request
NEW: Usage alert threshold setting in Dashboard Settings
NEW: Alert preferences API endpoints (GET/PUT /v1/user/alert-preferences)
FIX: Playground stats row now correctly shows tokens and savings from API response

v0.18.0

March 2026

NEW: MCP server consolidated from 20 tools to 7 clean public tools
NEW: webpeel, webpeel_read, webpeel_see, webpeel_find, webpeel_extract, webpeel_monitor, webpeel_act
NEW: Smart NL intent parser — describe your task in plain English, MCP routes automatically
NEW: question= param on /v1/fetch — ask a question, get an AI answer about the page
NEW: summary= param on /v1/fetch — get a concise extractive summary
NEW: POST /v1/screenshot dispatch — mode param selects basic, browser, or design-analysis
DEPRECATED: POST /v1/agent → now returns 410 Gone (use /v1/fetch instead)
All 20 legacy MCP tool names still routed for backward compatibility

v0.17.16

February 28, 2026

31 domain extractors — Added 20 new extractors: Amazon, Medium, Substack, AllRecipes, IMDB, LinkedIn, PyPI, Dev.to, Craigslist, NYTimes, BBC, CNN, Spotify, TikTok, Pinterest, and more
Interactive demo — Landing page now includes pre-baked demo results for HN, Wikipedia, YouTube, and example.com
Loading skeletons — Dashboard pages show pulse animation skeletons while data loads
Google OAuth documentation — Added AUTH_TRUST_HOST=true fix and redirect URI setup guide
CI fix — Screenshot test updated for test user credentials

v0.17.15

February 28, 2026

Honest landing page stats — Replaced inflated claims with real benchmarks: 97.6% success rate, 200ms average response time, 41 URLs tested
21 domain extractors — Expanded from 11 to 21 with structured data for more sites
Site logo updated — Diamond symbol replaced with paper/peel SVG icon matching the dashboard
Screenshot fixes — Improved full-page screenshot reliability

v0.17.14

February 28, 2026

Search fallback chain fixed — DuckDuckGo blocks Render's cloud IP at TCP level (ConnectTimeoutError). Added ddgConnectionBlocked detection to skip DDG and go straight to Bing+Ecosia via StealthSearchProvider
Bing URL decoding — All bing.com/ck/a redirect URLs now properly base64-decoded to real URLs instead of deduplicating to 1 result
Warm browser pool — Module-level cached browser via getStealthBrowser() + browser.newContext() per request (~50ms vs 1-3s cold start). Search dropped from 27s to 4.4s
YouTube deep understanding — getYouTubeTranscript() returns chapters, key points, summary, word count, description, and publish date. Chapter parsing from description timestamps
?detail=brief mode — Truncates to 500 words, extracts TL;DR, adds X-Detail-Mode and X-Token-Estimate headers

v0.17.13

February 28, 2026

Firefox search fallback — Added Firefox-based search fallback for environments where DuckDuckGo blocks cloud IPs

v0.17.12

February 28, 2026

Search diagnostics — Added search fallback logging to diagnose cloud IP blocking issues

v0.17.11

February 28, 2026

Search results parsing fix — API returns data.web but dashboard expected results. Fixed the response format mismatch

v0.17.10

February 28, 2026

JWT auth on ALL endpoints — 6 endpoints fixed: search, MCP, quick-answer, deep-fetch, answer, YouTube. Previously only fetch/screenshot required auth
Playground tabs — Added Search and Screenshot tabs alongside existing Fetch tab (986 lines of new UI)
Markdown pruning — content-pruner.ts strips "Load More" buttons, bare image URLs, and consecutive horizontal rules from extracted content

v0.17.9

February 28, 2026

Auth required for all endpoints — /v1/fetch, /v1/screenshot, /v1/crawl now require an API key or session. Free tier still gets 500/week but must register.
Usage tracking fixed — All authenticated requests now appear in dashboard stats. Previously JWT session requests were silently untracked.
Playground expanded — Search and Screenshot modes added alongside Fetch.

v0.17.8

February 28, 2026

YouTube transcripts — Full caption extraction wired in. Was returning 18 words (just title), now returns 270–500+ words of actual transcript content.
Dashboard stats live — Usage logs now capture both JWT and API key requests. Activity and stats pages show real data.
Markdown cleanup — Strips "Load More", bare image URLs, and nav elements from extracted content.
Favicon — Replaced Vercel triangle with WebPeel blurple logo.

v0.17.7

February 28, 2026

API key expiration — Anthropic-style: Never, 7d, 30d, 90d, 1y options. Expired keys return 401 with key_expired error code.
Dashboard UX overhaul — 15 improvements: billing crash fix, dismissable banners, inline key rename, key prefix copy, zero-state formatting, playground auto-fill, toggle labels.
PATCH /v1/keys/:id — Rename API keys without recreating them.
JWT extended to 7 days — Dashboard sessions no longer break after 1 hour.

v0.17.6

February 27, 2026

Login 500 fix — Credentials login was crashing on refresh token UUID/TEXT mismatch.
Usage history endpoint — GET /v1/usage/history?days=7 returns daily usage for the past week.
Rate limit headers — All API responses include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.
OpenAPI /openapi.json — 301 redirect to /openapi.yaml for compatibility.
status.webpeel.dev removed — All links updated to webpeel.dev/status.

v0.17.5

February 27, 2026

OpenAPI spec: Fixed /openapi.yaml and /openapi.json (301 redirect) — now correctly served from Docker deployments.
Rate limit headers: API responses now include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset.
Status links: Replaced dead status.webpeel.dev subdomain with webpeel.dev/status.
Build fix: npm run build now copies openapi.yaml to dist/server/ so it's included in the npm package.

v0.17.4

February 27, 2026

CI build fixed: Resolved TypeScript Cannot find module errors for stealth/search/cloak imports — CI now passes.
Changelog: Added missing v0.17.2 and v0.17.3 release notes.
Navigation: Added Pricing link to site navigation.
Stealth label: Updated pricing table label from "Anti-bot bypass" to "Handles JavaScript-heavy sites".

v0.17.3

February 27, 2026

Auth fix: Fixed email/password login (was returning 500 due to password hash comparison error).
OAuth fix: Fixed OAuth dashboard connection flow.
Usage history: Added /v1/usage/history endpoint for dashboard charts.
SWR retries: Limited SWR retries to prevent thundering herd on API errors.
Refresh token: Graceful refresh token fallback when tokens expire.

v0.17.2

February 27, 2026

Version bump: Version bump for Docker deployment.
CLI bundle: CLI bundle updated with latest fixes.

v0.17.1

February 26, 2026

Package fix: Added dist/integrations to npm package files — fixes server startup crash when loading integrations.
Dockerfile: Pinned API Dockerfile to webpeel@0.17.0 (was incorrectly pinned to 0.15.2).

v0.17.0

February 26, 2026

CF Worker proxy: Cloudflare Worker integration for edge-proxied fetches with reduced latency.
Google Cache fallback: Automatically falls back to Google Cache for blocked or rate-limited pages.
Domain extractors: Added Best Buy and Walmart-specific structured data extractors for e-commerce use cases.

v0.16.1

February 25, 2026

Agent endpoint: New /v1/agent route for LLM-driven multi-step scraping tasks.
Google Search: /v1/search now supports Google Search results alongside existing providers.
Admin CLI fix: Resolved issue with admin CLI commands not applying tier overrides correctly.

v0.16.0

February 25, 2026

SSE streaming: Server-Sent Events support on /v1/fetch for streaming partial results as pages load.
TypeScript SDK: First-class TypeScript SDK published to npm with full type definitions and JSDoc.
API polish: Consistent error envelopes, improved rate-limit headers (X-RateLimit-*), and OpenAPI 3.1 spec updates.

v0.15.2

February 25, 2026

QA noise filter: Suppresses low-quality quick-answer results below confidence threshold instead of returning them.
Graceful browser errors: Browser pool failures now return HTTP errors instead of crashing the request.
Lite mode: New lite: true option skips browser rendering for a faster, cheaper extraction path.

v0.15.1

February 24, 2026

JSON-LD extraction: Structured data from <script type="application/ld+json"> blocks is now parsed and surfaced in the response.
Zero-token safety net: Prevents over-extraction from triggering token budget errors — returns partial content gracefully.
UI chrome removal: Improved heuristics to strip nav, footers, and cookie banners before content scoring.

v0.14.6

February 24, 2026

46% speed improvement: Budget mode now short-circuits extraction on large pages once enough high-quality content is found, cutting median latency from 1.9s to 1.0s.

v0.14.5

February 24, 2026

npm package: Bundled all quick-answer and confidence scoring fixes from v0.14.4 into the published npm package (prior release was missing compiled output).

v0.14.4

February 24, 2026

Architecture refactor: peel() is now a 19-line pipeline orchestrator backed by 8 discrete testable stages in pipeline.ts.
Fetcher split: 1,857-line fetcher.ts split into http-fetch.ts, browser-pool.ts, and browser-fetch.ts — each with a single responsibility.
Quick answer: pattern extraction: Direct infobox extraction before BM25 — factual queries (who/when/what) now return precise answers. 6/6 Wikipedia factual tests pass.
Quick answer: fixed confidence: Confidence score now reflects real certainty (0.7–0.95) instead of always returning 1.0.
REST API parity: /v1/fetch now supports all peel() options: budget, question, readable, stealth, screenshot, maxTokens, selector, fullPage, raw.
Better error messages: HTTP errors now include meaningful status text (HTTP/2 strips reason phrases — we add them back).
Debug logging: 41 previously silent catch {} blocks now emit DEBUG=1 diagnostics.
OpenAPI spec: Added budget, question, readable, fullPage, raw, wait, timeout parameters.

v0.14.0

February 24, 2026

YouTube transcript extraction: No API key needed — supports all URL formats (youtu.be, /watch, /shorts).
Domain extractors: Twitter/X, Reddit, GitHub, Hacker News — native APIs, no scraping required.
LLM-free quick answer: BM25 scoring engine, --question flag — fast answers without LLM inference cost.
Readability engine: Reader Mode for AI agents via --readable flag — strips chrome, keeps content.
Auto-extract structured data: Pricing tables, products, contacts, articles, API docs — auto-detected by page type.
Deep fetch intelligence: Relevance scoring, deduplication, comparison mode for multi-URL fetches.
URL watch / monitoring: webpeel watch <url> with webhook delivery on content change.
18 MCP tools (was 13) — adds webpeel_youtube, webpeel_quick_answer, webpeel_deep_fetch, webpeel_auto_extract, webpeel_watch.
927 tests (was 693) — expanded coverage for all new features.
CLI: grouped help, error suggestions, clean JSON output.
MCP: crisp tool descriptions, smart defaults.

v0.13.3

February 20, 2026

Deep Research Agent: webpeel research "<query>" — autonomous 5-phase pipeline: Search → Fetch → BM25 Extract → Follow Links → Synthesize. MCP tool included.
Content Pruner v2: Two-pass architecture (semantic removal + density scoring). Real-world token savings: 15-33% on most sites. Enabled by default.
BM25 Relevance Scoring: computeRelevanceScore() for document-level relevance. Used by research agent and --focus filtering.
Python SDK: pip install webpeel — full client with sync + async support. scrape, search, batch, crawl, map, extract, screenshot, research.
Smart Infinite Scroll: --scroll-extract auto-detects content loading via height stabilization. Stops when page is fully loaded.
Token Efficiency Pipeline: BM25 query filtering (--focus), smart chunking (--chunk), content pruning. Combined: up to 77% savings.
654 Node.js + 39 Python = 693 tests. 13 MCP tools. 7 CSS schemas.

v0.11.0

February 19, 2026

JSON Schema Extraction: --extract-schema for structured LLM extraction. Accepts example objects (auto-converted) or full JSON Schema. Firecrawl-compatible POST /v1/extract.
Remote Hosted MCP: https://api.webpeel.dev/v2/mcp — no npm install needed. One-click Cursor deep-link, Claude Desktop & Windsurf configs.
MCP HTTP Transport: Streamable HTTP transport (MCP_HTTP_MODE=true), in addition to default stdio.
Proxy Support: --proxy http://host:port or --proxy socks5://user:pass@host:port. Works with HTTP and browser rendering.
538 tests, 0 type errors.

v0.10.0

February 19, 2026

Browser profiles: webpeel profile create|list|show|delete — persist cookies & auth across sessions. Use --profile <name> on fetch/search.
CSS schema extraction: 6 bundled schemas (Booking.com, Amazon, eBay, Yelp, Walmart, HN) — auto-detected by domain. Use --schema <name> or --list-schemas.
LLM extraction: --llm-extract [instruction] — structured data from any page using OpenAI-compatible BYOK. Includes per-request cost tracking.
Hotel search: webpeel hotels "destination" --checkin DATE --sort price — multi-source parallel search. Expedia now works via Stealth v2.
Stealth v2: Fixed viewport fingerprint, __pwInitScripts injection artifacts, and Chrome branding leaks. PerimeterX bypass confirmed reliable.
Challenge detection end-to-end: detectChallenge() now runs after every fetch. 17 new signals added. PerimeterX/DataDome errors include vendor + confidence fields.
PostgreSQL TLS secure-by-default: DB_SSL_REJECT_UNAUTHORIZED defaults to true. Timing oracle and OAuth per-IP rate limiter also fixed.
Title cleaning: strips Google Travel price suffixes, "Opens in new window", and ad prefix artifacts.
Multi-cluster extraction: improved listings on diverse pages (HN threads, mixed-media).

v0.9.0

February 18, 2026

Stealth 10/10: Dynamic challenge detection (Cloudflare, PerimeterX, Akamai, DataDome, Incapsula) with confidence scoring and auto-escalation
Site-aware search: webpeel search --site ebay "query" — 27 sites across 7 categories, no URL knowledge needed
Agent mode: --agent flag sets JSON output, token budget, extraction, and silent mode in one shot
Listing extraction: --extract-all auto-detects product cards, search results, and repeated patterns
Full escalation cascade: simple → browser → stealth → stealth+wait → return with warning
28 stealth domains (Amazon, eBay, Walmart, Nike, LinkedIn, and more) with auto-detection
Updated user agents (Chrome 132-136) with dynamic Sec-CH-UA header generation
webpeel sites command to list all supported site templates
agentMode option in library API + MCP default 4,000 token budget
368 tests passing

v0.8.1

February 17, 2026

npm publish fix — corrected package entry points and exports
All MIT references updated to AGPL-3.0 across site, README, and docs
Homepage version badge updated to v0.8.1
API docs OpenAPI version updated

v0.8.0

February 17, 2026

License changed from MIT to AGPL-3.0 — protects the project while staying open source. Commercial licensing available.
Premium server architecture — SWR cache, domain intelligence, and parallel race strategy now server-only (not shipped in npm package)
Hook-based strategy pattern for extensible fetch pipeline
Stale-while-revalidate caching with 30s revalidation timeout guard
Domain intelligence learns which sites need browser/stealth from traffic history
Chrome 131 impersonation headers for better anti-detection
GFM table conversion, enhanced content cleaning (paywall gates, chat widgets)
Benchmark v7: 100% success rate, 373ms median, 92.3% quality
Dashboard security headers hardened (HSTS, X-XSS-Protection, Permissions-Policy)

v0.7.0

February 14, 2026

Interactive Playground with code generation
Firecrawl-compatible /v2/scrape, /v2/crawl, /v2/map endpoints
Full documentation site with API reference, CLI, MCP, SDKs docs
AI Agent endpoint (/v1/agent) for autonomous web research
Answer endpoint (/v1/answer) — search + LLM synthesis with citations
Brave Search BYOK support
Hosted MCP endpoint (single URL setup)

v0.6.0

February 14, 2026

Branding extraction endpoint
Change tracking / content monitoring
SSE streaming for agent and crawl
Batch scraping with webhooks

v0.5.0

February 14, 2026

Search endpoint with DuckDuckGo
Crawl endpoint with depth control
Map endpoint for URL discovery

v0.4.0

February 13, 2026

MCP server with 13 tools
Python SDK (zero deps)
LangChain & LlamaIndex integrations

v0.3.0

February 13, 2026

Stealth mode with anti-detection
Smart escalation (HTTP → browser → stealth)
CLI tool

v0.2.0

February 12, 2026

Browser rendering with Playwright
Multiple output formats (markdown, HTML, text, JSON)

v0.1.0

February 12, 2026

Initial release — HTTP fetch with content extraction