Changelog
Release notes and product updates for WebPeel.
v0.21.39
Latest
- Self-hosted AI research: WebPeel AI on Hetzner (Ollama + Qwen3 0.6B) — free-tier users get AI-powered research synthesis with zero configuration
- SearXNG search integration: Self-hosted meta-search engine as Stage 0 provider — aggregates Google, Bing, Brave, Startpage, and more via residential IP
- New
/v1/researchendpoint: Lightweight research that chains search → fetch → compile with self-hosted LLM - Search enrichment: New
?enrich=Nparameter fetches top N search result pages server-side with 4s timeout - Domain hints for extraction: Structured extraction now overlays domain-specific hints for better field matching
- Enterprise tiers: Admin burst bypass, enterprise tier (50K weekly / 2K burst / 500 rate limits)
- Search reliability: Proxy-routed DDG search, provider lockout fix with decay stats, reverse search order (direct-first)
- Research LLM optimization: Compact prompts,
think:falsemode — synthesis from 26s down to 8.5s - OOM prevention: Guards against out-of-memory in search enrichment and browser escalation
v0.21.16
- Structured extraction improvements: Version, downloads, and population patterns; better field:value separator matching; NPM extractor now reliably extracts version, weekly_downloads, and license
- GitHub token refresh: Refreshed expired GITHUB_TOKEN on Render — resolves "Bad credentials" errors
v0.21.15
- Token savings fix:
tokenSavingsPercentnow shows realistic values for domain extractors (was always 0); estimates based on 7× content ratio - HN comment items: HackerNews comment pages now return proper titles ("Comment on: {story}")
- Pipeline improvements: Reliability improvements across the extraction pipeline
v0.21.14
- GitHub API auth: GitHub extractor now uses
GITHUB_TOKENenv var — raises rate limit from 60/hr to 5,000/hr; fixes low token counts on Render's shared IPs
v0.21.13
- PyPI title fix: PyPI extractor now returns a proper title field (
name versionformat) - Env var restore: Restored all required Render env vars that were accidentally wiped (fixes deploys since v0.21.5)
v0.21.12
- Structured extraction polish: Emoji stripping from extractor output, new
directorfield for movies, more accurateconfidencescoring (0–1) - Content quality improvements: Richer domain extractors with smarter fallbacks; actionable error messages with docs links
v0.21.11
- PeelTLS H2 fix: Fixed double-decompression bug on HTTP/2 responses; added
forceHttp1option - Search relevance scoring:
relevanceScore(0–1) now returned on search results — BM25 score against query - CLI progress indicators: Better feedback for long-running fetches
v0.21.9–10
- Complete anti-bot stealth upgrade: 9 independent patches to TLS fingerprinting, header ordering, and timing randomization
- Full test verification pass: All tests run and fixed after stealth upgrade
v0.21.8
- Proxy routing for search and browser:
--proxynow applies to search providers and headless browser rendering - Shared proxy-config module: Centralized proxy parsing across all transport layers
v0.21.7
- Smart heuristic extraction: Page-type-specific heuristics for product, article, recipe, job listing, and more
- Improved error classification: Errors now categorized as
blocked,timeout,parse_error, ornetwork_errorwith actionable hints
v0.21.6
- Dashboard activity log: Color-coded latency (green <3s, yellow 3–8s, red >8s) with "Normal for web scraping" hint
- Docs visual upgrade: Improved code blocks, better mobile layout, fixed sidebar states
v0.21.5
- Structured JSON extraction (
POST /v1/extract): LLM-powered extraction with JSON Schema or natural language prompts. BYOK — bring your own OpenAI-compatible key - Heuristic auto-extraction (
GET /v1/extract/auto): Zero-LLM extraction with page-type detection. Returnsconfidencescore (0–1) - Documentation upgrade: Extract API fully documented
v0.21.0
- Transcript export (
GET /v1/transcript/export): Download YouTube transcripts as SRT, TXT, Markdown, or JSON - API key scoping: Create scoped API keys with per-endpoint permissions
- Shareable playground links: Share a playground request via URL
v0.18.5
- Usage counter fix: Dashboard now correctly counts playground/session fetches (was showing 0)
- Product Hunt extractor: RSS-based extraction returning 2,000+ tokens (was 101 via search snippet)
- Substack handler: Helpful guidance for root URL, individual posts extract cleanly
- Plan badge fix: Correctly capitalized everywhere (Free Plan, Pro Plan, Max Plan)
- CLI
webpeel authcommand: Set and verify API key in one step - CLI
webpeel statuscommand: Machine-readable auth + usage check for AI agents apiKeysettable via config:webpeel config set apiKey <key>now works- E2E test suite expanded: 17 tests including 8 trending sites
v0.18.4
- Session recovery after API deploy restarts (email-based fallback)
- Stripe Customer Portal:
POST /v1/billing/portalendpoint - Email alerts: Nodemailer (free SMTP) replacing Resend
v0.18.3
- Stripe billing portal backend endpoint
- Replaced Resend with Nodemailer for free email alerts
- Dashboard billing page now calls real Stripe portal API
v0.18.2
- WebPeel class:
new WebPeel({ apiKey })OOP wrapper withfetch,search,crawl,map,extractmethods - MCP parity fix: Hosted and local MCP servers now match (6 tools each)
- Extractive summarization: TF-IDF sentence scoring instead of naive word truncation
- 4 new doc pages: Getting Started, Authentication, Fetch, Screenshot
- Changelog page created: v0.18.0 and v0.18.1 entries
- Dashboard dark mode audit fixes: Billing page, onboarding banner, Getting Started tracking
- API fixes: Watch interval aliases, usage API key auth, CORS credentials conditional, generic 404, health no version
- Landing page: Dead Discord link → support email, upgrade buttons → billing page, pricing card accuracy
v0.18.1
- NEW: Real-time email alerts when API usage crosses configurable threshold
- NEW: Token count now stored and displayed in Activity log for every request
- NEW: Usage alert threshold setting in Dashboard Settings
- NEW: Alert preferences API endpoints (
GET/PUT /v1/user/alert-preferences) - FIX: Playground stats row now correctly shows tokens and savings from API response
v0.18.0
- NEW: MCP server consolidated from 20 tools to 7 clean public tools
- NEW:
webpeel,webpeel_read,webpeel_see,webpeel_find,webpeel_extract,webpeel_monitor,webpeel_act - NEW: Smart NL intent parser — describe your task in plain English, MCP routes automatically
- NEW:
question=param on/v1/fetch— ask a question, get an AI answer about the page - NEW:
summary=param on/v1/fetch— get a concise extractive summary - NEW:
POST /v1/screenshotdispatch —modeparam selects basic, browser, or design-analysis - DEPRECATED:
POST /v1/agent→ now returns 410 Gone (use/v1/fetchinstead) - All 20 legacy MCP tool names still routed for backward compatibility
v0.17.16
- 31 domain extractors — Added 20 new extractors: Amazon, Medium, Substack, AllRecipes, IMDB, LinkedIn, PyPI, Dev.to, Craigslist, NYTimes, BBC, CNN, Spotify, TikTok, Pinterest, and more
- Interactive demo — Landing page now includes pre-baked demo results for HN, Wikipedia, YouTube, and example.com
- Loading skeletons — Dashboard pages show pulse animation skeletons while data loads
- Google OAuth documentation — Added
AUTH_TRUST_HOST=truefix and redirect URI setup guide - CI fix — Screenshot test updated for test user credentials
v0.17.15
- Honest landing page stats — Replaced inflated claims with real benchmarks: 97.6% success rate, 200ms average response time, 41 URLs tested
- 21 domain extractors — Expanded from 11 to 21 with structured data for more sites
- Site logo updated — Diamond symbol replaced with paper/peel SVG icon matching the dashboard
- Screenshot fixes — Improved full-page screenshot reliability
v0.17.14
- Search fallback chain fixed — DuckDuckGo blocks Render's cloud IP at TCP level (
ConnectTimeoutError). AddedddgConnectionBlockeddetection to skip DDG and go straight to Bing+Ecosia via StealthSearchProvider - Bing URL decoding — All
bing.com/ck/aredirect URLs now properly base64-decoded to real URLs instead of deduplicating to 1 result - Warm browser pool — Module-level cached browser via
getStealthBrowser()+browser.newContext()per request (~50ms vs 1-3s cold start). Search dropped from 27s to 4.4s - YouTube deep understanding —
getYouTubeTranscript()returns chapters, key points, summary, word count, description, and publish date. Chapter parsing from description timestamps ?detail=briefmode — Truncates to 500 words, extracts TL;DR, addsX-Detail-ModeandX-Token-Estimateheaders
v0.17.13
- Firefox search fallback — Added Firefox-based search fallback for environments where DuckDuckGo blocks cloud IPs
v0.17.12
- Search diagnostics — Added search fallback logging to diagnose cloud IP blocking issues
v0.17.11
- Search results parsing fix — API returns
data.webbut dashboard expectedresults. Fixed the response format mismatch
v0.17.10
- JWT auth on ALL endpoints — 6 endpoints fixed: search, MCP, quick-answer, deep-fetch, answer, YouTube. Previously only fetch/screenshot required auth
- Playground tabs — Added Search and Screenshot tabs alongside existing Fetch tab (986 lines of new UI)
- Markdown pruning —
content-pruner.tsstrips "Load More" buttons, bare image URLs, and consecutive horizontal rules from extracted content
v0.17.9
- Auth required for all endpoints — /v1/fetch, /v1/screenshot, /v1/crawl now require an API key or session. Free tier still gets 500/week but must register.
- Usage tracking fixed — All authenticated requests now appear in dashboard stats. Previously JWT session requests were silently untracked.
- Playground expanded — Search and Screenshot modes added alongside Fetch.
v0.17.8
- YouTube transcripts — Full caption extraction wired in. Was returning 18 words (just title), now returns 270–500+ words of actual transcript content.
- Dashboard stats live — Usage logs now capture both JWT and API key requests. Activity and stats pages show real data.
- Markdown cleanup — Strips "Load More", bare image URLs, and nav elements from extracted content.
- Favicon — Replaced Vercel triangle with WebPeel blurple logo.
v0.17.7
- API key expiration — Anthropic-style: Never, 7d, 30d, 90d, 1y options. Expired keys return 401 with
key_expirederror code. - Dashboard UX overhaul — 15 improvements: billing crash fix, dismissable banners, inline key rename, key prefix copy, zero-state formatting, playground auto-fill, toggle labels.
- PATCH /v1/keys/:id — Rename API keys without recreating them.
- JWT extended to 7 days — Dashboard sessions no longer break after 1 hour.
v0.17.6
- Login 500 fix — Credentials login was crashing on refresh token UUID/TEXT mismatch.
- Usage history endpoint —
GET /v1/usage/history?days=7returns daily usage for the past week. - Rate limit headers — All API responses include
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset. - OpenAPI /openapi.json — 301 redirect to /openapi.yaml for compatibility.
- status.webpeel.dev removed — All links updated to webpeel.dev/status.
v0.17.5
- OpenAPI spec: Fixed
/openapi.yamland/openapi.json(301 redirect) — now correctly served from Docker deployments. - Rate limit headers: API responses now include
X-RateLimit-Limit,X-RateLimit-Remaining, andX-RateLimit-Reset. - Status links: Replaced dead
status.webpeel.devsubdomain withwebpeel.dev/status. - Build fix:
npm run buildnow copiesopenapi.yamltodist/server/so it's included in the npm package.
v0.17.4
- CI build fixed: Resolved TypeScript
Cannot find moduleerrors for stealth/search/cloak imports — CI now passes. - Changelog: Added missing v0.17.2 and v0.17.3 release notes.
- Navigation: Added Pricing link to site navigation.
- Stealth label: Updated pricing table label from "Anti-bot bypass" to "Handles JavaScript-heavy sites".
v0.17.3
- Auth fix: Fixed email/password login (was returning 500 due to password hash comparison error).
- OAuth fix: Fixed OAuth dashboard connection flow.
- Usage history: Added
/v1/usage/historyendpoint for dashboard charts. - SWR retries: Limited SWR retries to prevent thundering herd on API errors.
- Refresh token: Graceful refresh token fallback when tokens expire.
v0.17.2
- Version bump: Version bump for Docker deployment.
- CLI bundle: CLI bundle updated with latest fixes.
v0.17.1
- Package fix: Added
dist/integrationsto npm package files — fixes server startup crash when loading integrations. - Dockerfile: Pinned API Dockerfile to
webpeel@0.17.0(was incorrectly pinned to 0.15.2).
v0.17.0
- CF Worker proxy: Cloudflare Worker integration for edge-proxied fetches with reduced latency.
- Google Cache fallback: Automatically falls back to Google Cache for blocked or rate-limited pages.
- Domain extractors: Added Best Buy and Walmart-specific structured data extractors for e-commerce use cases.
v0.16.1
- Agent endpoint: New
/v1/agentroute for LLM-driven multi-step scraping tasks. - Google Search:
/v1/searchnow supports Google Search results alongside existing providers. - Admin CLI fix: Resolved issue with admin CLI commands not applying tier overrides correctly.
v0.16.0
- SSE streaming: Server-Sent Events support on
/v1/fetchfor streaming partial results as pages load. - TypeScript SDK: First-class TypeScript SDK published to npm with full type definitions and JSDoc.
- API polish: Consistent error envelopes, improved rate-limit headers (
X-RateLimit-*), and OpenAPI 3.1 spec updates.
v0.15.2
- QA noise filter: Suppresses low-quality quick-answer results below confidence threshold instead of returning them.
- Graceful browser errors: Browser pool failures now return HTTP errors instead of crashing the request.
- Lite mode: New
lite: trueoption skips browser rendering for a faster, cheaper extraction path.
v0.15.1
- JSON-LD extraction: Structured data from
<script type="application/ld+json">blocks is now parsed and surfaced in the response. - Zero-token safety net: Prevents over-extraction from triggering token budget errors — returns partial content gracefully.
- UI chrome removal: Improved heuristics to strip nav, footers, and cookie banners before content scoring.
v0.14.6
- 46% speed improvement: Budget mode now short-circuits extraction on large pages once enough high-quality content is found, cutting median latency from 1.9s to 1.0s.
v0.14.5
- npm package: Bundled all quick-answer and confidence scoring fixes from v0.14.4 into the published npm package (prior release was missing compiled output).
v0.14.4
- Architecture refactor:
peel()is now a 19-line pipeline orchestrator backed by 8 discrete testable stages inpipeline.ts. - Fetcher split: 1,857-line
fetcher.tssplit intohttp-fetch.ts,browser-pool.ts, andbrowser-fetch.ts— each with a single responsibility. - Quick answer: pattern extraction: Direct infobox extraction before BM25 — factual queries (who/when/what) now return precise answers. 6/6 Wikipedia factual tests pass.
- Quick answer: fixed confidence: Confidence score now reflects real certainty (0.7–0.95) instead of always returning 1.0.
- REST API parity:
/v1/fetchnow supports allpeel()options:budget,question,readable,stealth,screenshot,maxTokens,selector,fullPage,raw. - Better error messages: HTTP errors now include meaningful status text (HTTP/2 strips reason phrases — we add them back).
- Debug logging: 41 previously silent
catch {}blocks now emitDEBUG=1diagnostics. - OpenAPI spec: Added
budget,question,readable,fullPage,raw,wait,timeoutparameters.
v0.14.0
- YouTube transcript extraction: No API key needed — supports all URL formats (
youtu.be,/watch,/shorts). - Domain extractors: Twitter/X, Reddit, GitHub, Hacker News — native APIs, no scraping required.
- LLM-free quick answer: BM25 scoring engine,
--questionflag — fast answers without LLM inference cost. - Readability engine: Reader Mode for AI agents via
--readableflag — strips chrome, keeps content. - Auto-extract structured data: Pricing tables, products, contacts, articles, API docs — auto-detected by page type.
- Deep fetch intelligence: Relevance scoring, deduplication, comparison mode for multi-URL fetches.
- URL watch / monitoring:
webpeel watch <url>with webhook delivery on content change. - 18 MCP tools (was 13) — adds
webpeel_youtube,webpeel_quick_answer,webpeel_deep_fetch,webpeel_auto_extract,webpeel_watch. - 927 tests (was 693) — expanded coverage for all new features.
- CLI: grouped help, error suggestions, clean JSON output.
- MCP: crisp tool descriptions, smart defaults.
v0.13.3
- Deep Research Agent:
webpeel research "<query>"— autonomous 5-phase pipeline: Search → Fetch → BM25 Extract → Follow Links → Synthesize. MCP tool included. - Content Pruner v2: Two-pass architecture (semantic removal + density scoring). Real-world token savings: 15-33% on most sites. Enabled by default.
- BM25 Relevance Scoring:
computeRelevanceScore()for document-level relevance. Used by research agent and--focusfiltering. - Python SDK:
pip install webpeel— full client with sync + async support. scrape, search, batch, crawl, map, extract, screenshot, research. - Smart Infinite Scroll:
--scroll-extractauto-detects content loading via height stabilization. Stops when page is fully loaded. - Token Efficiency Pipeline: BM25 query filtering (
--focus), smart chunking (--chunk), content pruning. Combined: up to 77% savings. - 654 Node.js + 39 Python = 693 tests. 13 MCP tools. 7 CSS schemas.
v0.11.0
- JSON Schema Extraction:
--extract-schemafor structured LLM extraction. Accepts example objects (auto-converted) or full JSON Schema. Firecrawl-compatiblePOST /v1/extract. - Remote Hosted MCP:
https://api.webpeel.dev/v2/mcp— no npm install needed. One-click Cursor deep-link, Claude Desktop & Windsurf configs. - MCP HTTP Transport: Streamable HTTP transport (
MCP_HTTP_MODE=true), in addition to default stdio. - Proxy Support:
--proxy http://host:portor--proxy socks5://user:pass@host:port. Works with HTTP and browser rendering. - 538 tests, 0 type errors.
v0.10.0
- Browser profiles:
webpeel profile create|list|show|delete— persist cookies & auth across sessions. Use--profile <name>on fetch/search. - CSS schema extraction: 6 bundled schemas (Booking.com, Amazon, eBay, Yelp, Walmart, HN) — auto-detected by domain. Use
--schema <name>or--list-schemas. - LLM extraction:
--llm-extract [instruction]— structured data from any page using OpenAI-compatible BYOK. Includes per-request cost tracking. - Hotel search:
webpeel hotels "destination" --checkin DATE --sort price— multi-source parallel search. Expedia now works via Stealth v2. - Stealth v2: Fixed viewport fingerprint,
__pwInitScriptsinjection artifacts, and Chrome branding leaks. PerimeterX bypass confirmed reliable. - Challenge detection end-to-end:
detectChallenge()now runs after every fetch. 17 new signals added. PerimeterX/DataDome errors includevendor+confidencefields. - PostgreSQL TLS secure-by-default:
DB_SSL_REJECT_UNAUTHORIZEDdefaults totrue. Timing oracle and OAuth per-IP rate limiter also fixed. - Title cleaning: strips Google Travel price suffixes, "Opens in new window", and ad prefix artifacts.
- Multi-cluster extraction: improved listings on diverse pages (HN threads, mixed-media).
v0.9.0
- Stealth 10/10: Dynamic challenge detection (Cloudflare, PerimeterX, Akamai, DataDome, Incapsula) with confidence scoring and auto-escalation
- Site-aware search:
webpeel search --site ebay "query"— 27 sites across 7 categories, no URL knowledge needed - Agent mode:
--agentflag sets JSON output, token budget, extraction, and silent mode in one shot - Listing extraction:
--extract-allauto-detects product cards, search results, and repeated patterns - Full escalation cascade: simple → browser → stealth → stealth+wait → return with warning
- 28 stealth domains (Amazon, eBay, Walmart, Nike, LinkedIn, and more) with auto-detection
- Updated user agents (Chrome 132-136) with dynamic Sec-CH-UA header generation
webpeel sitescommand to list all supported site templatesagentModeoption in library API + MCP default 4,000 token budget- 368 tests passing
v0.8.1
- npm publish fix — corrected package entry points and exports
- All MIT references updated to AGPL-3.0 across site, README, and docs
- Homepage version badge updated to v0.8.1
- API docs OpenAPI version updated
v0.8.0
- License changed from MIT to AGPL-3.0 — protects the project while staying open source. Commercial licensing available.
- Premium server architecture — SWR cache, domain intelligence, and parallel race strategy now server-only (not shipped in npm package)
- Hook-based strategy pattern for extensible fetch pipeline
- Stale-while-revalidate caching with 30s revalidation timeout guard
- Domain intelligence learns which sites need browser/stealth from traffic history
- Chrome 131 impersonation headers for better anti-detection
- GFM table conversion, enhanced content cleaning (paywall gates, chat widgets)
- Benchmark v7: 100% success rate, 373ms median, 92.3% quality
- Dashboard security headers hardened (HSTS, X-XSS-Protection, Permissions-Policy)
v0.7.0
- Interactive Playground with code generation
- Firecrawl-compatible
/v2/scrape,/v2/crawl,/v2/mapendpoints - Full documentation site with API reference, CLI, MCP, SDKs docs
- AI Agent endpoint (
/v1/agent) for autonomous web research - Answer endpoint (
/v1/answer) — search + LLM synthesis with citations - Brave Search BYOK support
- Hosted MCP endpoint (single URL setup)
v0.6.0
- Branding extraction endpoint
- Change tracking / content monitoring
- SSE streaming for agent and crawl
- Batch scraping with webhooks
v0.5.0
- Search endpoint with DuckDuckGo
- Crawl endpoint with depth control
- Map endpoint for URL discovery
v0.4.0
- MCP server with 13 tools
- Python SDK (zero deps)
- LangChain & LlamaIndex integrations
v0.3.0
- Stealth mode with anti-detection
- Smart escalation (HTTP → browser → stealth)
- CLI tool
v0.2.0
- Browser rendering with Playwright
- Multiple output formats (markdown, HTML, text, JSON)
v0.1.0
- Initial release — HTTP fetch with content extraction