Changelog

Release notes and product updates for WebPeel.

v0.21.39
Latest
March 16, 2026
  • Self-hosted AI research: WebPeel AI on Hetzner (Ollama + Qwen3 0.6B) — free-tier users get AI-powered research synthesis with zero configuration
  • SearXNG search integration: Self-hosted meta-search engine as Stage 0 provider — aggregates Google, Bing, Brave, Startpage, and more via residential IP
  • New /v1/research endpoint: Lightweight research that chains search → fetch → compile with self-hosted LLM
  • Search enrichment: New ?enrich=N parameter fetches top N search result pages server-side with 4s timeout
  • Domain hints for extraction: Structured extraction now overlays domain-specific hints for better field matching
  • Enterprise tiers: Admin burst bypass, enterprise tier (50K weekly / 2K burst / 500 rate limits)
  • Search reliability: Proxy-routed DDG search, provider lockout fix with decay stats, reverse search order (direct-first)
  • Research LLM optimization: Compact prompts, think:false mode — synthesis from 26s down to 8.5s
  • OOM prevention: Guards against out-of-memory in search enrichment and browser escalation
v0.21.16
March 15, 2026
  • Structured extraction improvements: Version, downloads, and population patterns; better field:value separator matching; NPM extractor now reliably extracts version, weekly_downloads, and license
  • GitHub token refresh: Refreshed expired GITHUB_TOKEN on Render — resolves "Bad credentials" errors
v0.21.15
March 15, 2026
  • Token savings fix: tokenSavingsPercent now shows realistic values for domain extractors (was always 0); estimates based on 7× content ratio
  • HN comment items: HackerNews comment pages now return proper titles ("Comment on: {story}")
  • Pipeline improvements: Reliability improvements across the extraction pipeline
v0.21.14
March 15, 2026
  • GitHub API auth: GitHub extractor now uses GITHUB_TOKEN env var — raises rate limit from 60/hr to 5,000/hr; fixes low token counts on Render's shared IPs
v0.21.13
March 15, 2026
  • PyPI title fix: PyPI extractor now returns a proper title field (name version format)
  • Env var restore: Restored all required Render env vars that were accidentally wiped (fixes deploys since v0.21.5)
v0.21.12
March 15, 2026
  • Structured extraction polish: Emoji stripping from extractor output, new director field for movies, more accurate confidence scoring (0–1)
  • Content quality improvements: Richer domain extractors with smarter fallbacks; actionable error messages with docs links
v0.21.11
March 15, 2026
  • PeelTLS H2 fix: Fixed double-decompression bug on HTTP/2 responses; added forceHttp1 option
  • Search relevance scoring: relevanceScore (0–1) now returned on search results — BM25 score against query
  • CLI progress indicators: Better feedback for long-running fetches
v0.21.9–10
March 15, 2026
  • Complete anti-bot stealth upgrade: 9 independent patches to TLS fingerprinting, header ordering, and timing randomization
  • Full test verification pass: All tests run and fixed after stealth upgrade
v0.21.8
March 15, 2026
  • Proxy routing for search and browser: --proxy now applies to search providers and headless browser rendering
  • Shared proxy-config module: Centralized proxy parsing across all transport layers
v0.21.7
March 14, 2026
  • Smart heuristic extraction: Page-type-specific heuristics for product, article, recipe, job listing, and more
  • Improved error classification: Errors now categorized as blocked, timeout, parse_error, or network_error with actionable hints
v0.21.6
March 14, 2026
  • Dashboard activity log: Color-coded latency (green <3s, yellow 3–8s, red >8s) with "Normal for web scraping" hint
  • Docs visual upgrade: Improved code blocks, better mobile layout, fixed sidebar states
v0.21.5
March 14, 2026
  • Structured JSON extraction (POST /v1/extract): LLM-powered extraction with JSON Schema or natural language prompts. BYOK — bring your own OpenAI-compatible key
  • Heuristic auto-extraction (GET /v1/extract/auto): Zero-LLM extraction with page-type detection. Returns confidence score (0–1)
  • Documentation upgrade: Extract API fully documented
v0.21.0
March 13, 2026
  • Transcript export (GET /v1/transcript/export): Download YouTube transcripts as SRT, TXT, Markdown, or JSON
  • API key scoping: Create scoped API keys with per-endpoint permissions
  • Shareable playground links: Share a playground request via URL
v0.18.5
March 6, 2026
  • Usage counter fix: Dashboard now correctly counts playground/session fetches (was showing 0)
  • Product Hunt extractor: RSS-based extraction returning 2,000+ tokens (was 101 via search snippet)
  • Substack handler: Helpful guidance for root URL, individual posts extract cleanly
  • Plan badge fix: Correctly capitalized everywhere (Free Plan, Pro Plan, Max Plan)
  • CLI webpeel auth command: Set and verify API key in one step
  • CLI webpeel status command: Machine-readable auth + usage check for AI agents
  • apiKey settable via config: webpeel config set apiKey <key> now works
  • E2E test suite expanded: 17 tests including 8 trending sites
v0.18.4
March 6, 2026
  • Session recovery after API deploy restarts (email-based fallback)
  • Stripe Customer Portal: POST /v1/billing/portal endpoint
  • Email alerts: Nodemailer (free SMTP) replacing Resend
v0.18.3
March 6, 2026
  • Stripe billing portal backend endpoint
  • Replaced Resend with Nodemailer for free email alerts
  • Dashboard billing page now calls real Stripe portal API
v0.18.2
March 6, 2026
  • WebPeel class: new WebPeel({ apiKey }) OOP wrapper with fetch, search, crawl, map, extract methods
  • MCP parity fix: Hosted and local MCP servers now match (6 tools each)
  • Extractive summarization: TF-IDF sentence scoring instead of naive word truncation
  • 4 new doc pages: Getting Started, Authentication, Fetch, Screenshot
  • Changelog page created: v0.18.0 and v0.18.1 entries
  • Dashboard dark mode audit fixes: Billing page, onboarding banner, Getting Started tracking
  • API fixes: Watch interval aliases, usage API key auth, CORS credentials conditional, generic 404, health no version
  • Landing page: Dead Discord link → support email, upgrade buttons → billing page, pricing card accuracy
v0.18.1
March 2026
  • NEW: Real-time email alerts when API usage crosses configurable threshold
  • NEW: Token count now stored and displayed in Activity log for every request
  • NEW: Usage alert threshold setting in Dashboard Settings
  • NEW: Alert preferences API endpoints (GET/PUT /v1/user/alert-preferences)
  • FIX: Playground stats row now correctly shows tokens and savings from API response
v0.18.0
March 2026
  • NEW: MCP server consolidated from 20 tools to 7 clean public tools
  • NEW: webpeel, webpeel_read, webpeel_see, webpeel_find, webpeel_extract, webpeel_monitor, webpeel_act
  • NEW: Smart NL intent parser — describe your task in plain English, MCP routes automatically
  • NEW: question= param on /v1/fetch — ask a question, get an AI answer about the page
  • NEW: summary= param on /v1/fetch — get a concise extractive summary
  • NEW: POST /v1/screenshot dispatch — mode param selects basic, browser, or design-analysis
  • DEPRECATED: POST /v1/agent → now returns 410 Gone (use /v1/fetch instead)
  • All 20 legacy MCP tool names still routed for backward compatibility
v0.17.16
February 28, 2026
  • 31 domain extractors — Added 20 new extractors: Amazon, Medium, Substack, AllRecipes, IMDB, LinkedIn, PyPI, Dev.to, Craigslist, NYTimes, BBC, CNN, Spotify, TikTok, Pinterest, and more
  • Interactive demo — Landing page now includes pre-baked demo results for HN, Wikipedia, YouTube, and example.com
  • Loading skeletons — Dashboard pages show pulse animation skeletons while data loads
  • Google OAuth documentation — Added AUTH_TRUST_HOST=true fix and redirect URI setup guide
  • CI fix — Screenshot test updated for test user credentials
v0.17.15
February 28, 2026
  • Honest landing page stats — Replaced inflated claims with real benchmarks: 97.6% success rate, 200ms average response time, 41 URLs tested
  • 21 domain extractors — Expanded from 11 to 21 with structured data for more sites
  • Site logo updated — Diamond symbol replaced with paper/peel SVG icon matching the dashboard
  • Screenshot fixes — Improved full-page screenshot reliability
v0.17.14
February 28, 2026
  • Search fallback chain fixed — DuckDuckGo blocks Render's cloud IP at TCP level (ConnectTimeoutError). Added ddgConnectionBlocked detection to skip DDG and go straight to Bing+Ecosia via StealthSearchProvider
  • Bing URL decoding — All bing.com/ck/a redirect URLs now properly base64-decoded to real URLs instead of deduplicating to 1 result
  • Warm browser pool — Module-level cached browser via getStealthBrowser() + browser.newContext() per request (~50ms vs 1-3s cold start). Search dropped from 27s to 4.4s
  • YouTube deep understandinggetYouTubeTranscript() returns chapters, key points, summary, word count, description, and publish date. Chapter parsing from description timestamps
  • ?detail=brief mode — Truncates to 500 words, extracts TL;DR, adds X-Detail-Mode and X-Token-Estimate headers
v0.17.13
February 28, 2026
  • Firefox search fallback — Added Firefox-based search fallback for environments where DuckDuckGo blocks cloud IPs
v0.17.12
February 28, 2026
  • Search diagnostics — Added search fallback logging to diagnose cloud IP blocking issues
v0.17.11
February 28, 2026
  • Search results parsing fix — API returns data.web but dashboard expected results. Fixed the response format mismatch
v0.17.10
February 28, 2026
  • JWT auth on ALL endpoints — 6 endpoints fixed: search, MCP, quick-answer, deep-fetch, answer, YouTube. Previously only fetch/screenshot required auth
  • Playground tabs — Added Search and Screenshot tabs alongside existing Fetch tab (986 lines of new UI)
  • Markdown pruningcontent-pruner.ts strips "Load More" buttons, bare image URLs, and consecutive horizontal rules from extracted content
v0.17.9
February 28, 2026
  • Auth required for all endpoints — /v1/fetch, /v1/screenshot, /v1/crawl now require an API key or session. Free tier still gets 500/week but must register.
  • Usage tracking fixed — All authenticated requests now appear in dashboard stats. Previously JWT session requests were silently untracked.
  • Playground expanded — Search and Screenshot modes added alongside Fetch.
v0.17.8
February 28, 2026
  • YouTube transcripts — Full caption extraction wired in. Was returning 18 words (just title), now returns 270–500+ words of actual transcript content.
  • Dashboard stats live — Usage logs now capture both JWT and API key requests. Activity and stats pages show real data.
  • Markdown cleanup — Strips "Load More", bare image URLs, and nav elements from extracted content.
  • Favicon — Replaced Vercel triangle with WebPeel blurple logo.
v0.17.7
February 28, 2026
  • API key expiration — Anthropic-style: Never, 7d, 30d, 90d, 1y options. Expired keys return 401 with key_expired error code.
  • Dashboard UX overhaul — 15 improvements: billing crash fix, dismissable banners, inline key rename, key prefix copy, zero-state formatting, playground auto-fill, toggle labels.
  • PATCH /v1/keys/:id — Rename API keys without recreating them.
  • JWT extended to 7 days — Dashboard sessions no longer break after 1 hour.
v0.17.6
February 27, 2026
  • Login 500 fix — Credentials login was crashing on refresh token UUID/TEXT mismatch.
  • Usage history endpointGET /v1/usage/history?days=7 returns daily usage for the past week.
  • Rate limit headers — All API responses include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.
  • OpenAPI /openapi.json — 301 redirect to /openapi.yaml for compatibility.
  • status.webpeel.dev removed — All links updated to webpeel.dev/status.
v0.17.5
February 27, 2026
  • OpenAPI spec: Fixed /openapi.yaml and /openapi.json (301 redirect) — now correctly served from Docker deployments.
  • Rate limit headers: API responses now include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset.
  • Status links: Replaced dead status.webpeel.dev subdomain with webpeel.dev/status.
  • Build fix: npm run build now copies openapi.yaml to dist/server/ so it's included in the npm package.
v0.17.4
February 27, 2026
  • CI build fixed: Resolved TypeScript Cannot find module errors for stealth/search/cloak imports — CI now passes.
  • Changelog: Added missing v0.17.2 and v0.17.3 release notes.
  • Navigation: Added Pricing link to site navigation.
  • Stealth label: Updated pricing table label from "Anti-bot bypass" to "Handles JavaScript-heavy sites".
v0.17.3
February 27, 2026
  • Auth fix: Fixed email/password login (was returning 500 due to password hash comparison error).
  • OAuth fix: Fixed OAuth dashboard connection flow.
  • Usage history: Added /v1/usage/history endpoint for dashboard charts.
  • SWR retries: Limited SWR retries to prevent thundering herd on API errors.
  • Refresh token: Graceful refresh token fallback when tokens expire.
v0.17.2
February 27, 2026
  • Version bump: Version bump for Docker deployment.
  • CLI bundle: CLI bundle updated with latest fixes.
v0.17.1
February 26, 2026
  • Package fix: Added dist/integrations to npm package files — fixes server startup crash when loading integrations.
  • Dockerfile: Pinned API Dockerfile to webpeel@0.17.0 (was incorrectly pinned to 0.15.2).
v0.17.0
February 26, 2026
  • CF Worker proxy: Cloudflare Worker integration for edge-proxied fetches with reduced latency.
  • Google Cache fallback: Automatically falls back to Google Cache for blocked or rate-limited pages.
  • Domain extractors: Added Best Buy and Walmart-specific structured data extractors for e-commerce use cases.
v0.16.1
February 25, 2026
  • Agent endpoint: New /v1/agent route for LLM-driven multi-step scraping tasks.
  • Google Search: /v1/search now supports Google Search results alongside existing providers.
  • Admin CLI fix: Resolved issue with admin CLI commands not applying tier overrides correctly.
v0.16.0
February 25, 2026
  • SSE streaming: Server-Sent Events support on /v1/fetch for streaming partial results as pages load.
  • TypeScript SDK: First-class TypeScript SDK published to npm with full type definitions and JSDoc.
  • API polish: Consistent error envelopes, improved rate-limit headers (X-RateLimit-*), and OpenAPI 3.1 spec updates.
v0.15.2
February 25, 2026
  • QA noise filter: Suppresses low-quality quick-answer results below confidence threshold instead of returning them.
  • Graceful browser errors: Browser pool failures now return HTTP errors instead of crashing the request.
  • Lite mode: New lite: true option skips browser rendering for a faster, cheaper extraction path.
v0.15.1
February 24, 2026
  • JSON-LD extraction: Structured data from <script type="application/ld+json"> blocks is now parsed and surfaced in the response.
  • Zero-token safety net: Prevents over-extraction from triggering token budget errors — returns partial content gracefully.
  • UI chrome removal: Improved heuristics to strip nav, footers, and cookie banners before content scoring.
v0.14.6
February 24, 2026
  • 46% speed improvement: Budget mode now short-circuits extraction on large pages once enough high-quality content is found, cutting median latency from 1.9s to 1.0s.
v0.14.5
February 24, 2026
  • npm package: Bundled all quick-answer and confidence scoring fixes from v0.14.4 into the published npm package (prior release was missing compiled output).
v0.14.4
February 24, 2026
  • Architecture refactor: peel() is now a 19-line pipeline orchestrator backed by 8 discrete testable stages in pipeline.ts.
  • Fetcher split: 1,857-line fetcher.ts split into http-fetch.ts, browser-pool.ts, and browser-fetch.ts — each with a single responsibility.
  • Quick answer: pattern extraction: Direct infobox extraction before BM25 — factual queries (who/when/what) now return precise answers. 6/6 Wikipedia factual tests pass.
  • Quick answer: fixed confidence: Confidence score now reflects real certainty (0.7–0.95) instead of always returning 1.0.
  • REST API parity: /v1/fetch now supports all peel() options: budget, question, readable, stealth, screenshot, maxTokens, selector, fullPage, raw.
  • Better error messages: HTTP errors now include meaningful status text (HTTP/2 strips reason phrases — we add them back).
  • Debug logging: 41 previously silent catch {} blocks now emit DEBUG=1 diagnostics.
  • OpenAPI spec: Added budget, question, readable, fullPage, raw, wait, timeout parameters.
v0.14.0
February 24, 2026
  • YouTube transcript extraction: No API key needed — supports all URL formats (youtu.be, /watch, /shorts).
  • Domain extractors: Twitter/X, Reddit, GitHub, Hacker News — native APIs, no scraping required.
  • LLM-free quick answer: BM25 scoring engine, --question flag — fast answers without LLM inference cost.
  • Readability engine: Reader Mode for AI agents via --readable flag — strips chrome, keeps content.
  • Auto-extract structured data: Pricing tables, products, contacts, articles, API docs — auto-detected by page type.
  • Deep fetch intelligence: Relevance scoring, deduplication, comparison mode for multi-URL fetches.
  • URL watch / monitoring: webpeel watch <url> with webhook delivery on content change.
  • 18 MCP tools (was 13) — adds webpeel_youtube, webpeel_quick_answer, webpeel_deep_fetch, webpeel_auto_extract, webpeel_watch.
  • 927 tests (was 693) — expanded coverage for all new features.
  • CLI: grouped help, error suggestions, clean JSON output.
  • MCP: crisp tool descriptions, smart defaults.
v0.13.3
February 20, 2026
  • Deep Research Agent: webpeel research "<query>" — autonomous 5-phase pipeline: Search → Fetch → BM25 Extract → Follow Links → Synthesize. MCP tool included.
  • Content Pruner v2: Two-pass architecture (semantic removal + density scoring). Real-world token savings: 15-33% on most sites. Enabled by default.
  • BM25 Relevance Scoring: computeRelevanceScore() for document-level relevance. Used by research agent and --focus filtering.
  • Python SDK: pip install webpeel — full client with sync + async support. scrape, search, batch, crawl, map, extract, screenshot, research.
  • Smart Infinite Scroll: --scroll-extract auto-detects content loading via height stabilization. Stops when page is fully loaded.
  • Token Efficiency Pipeline: BM25 query filtering (--focus), smart chunking (--chunk), content pruning. Combined: up to 77% savings.
  • 654 Node.js + 39 Python = 693 tests. 13 MCP tools. 7 CSS schemas.
v0.11.0
February 19, 2026
  • JSON Schema Extraction: --extract-schema for structured LLM extraction. Accepts example objects (auto-converted) or full JSON Schema. Firecrawl-compatible POST /v1/extract.
  • Remote Hosted MCP: https://api.webpeel.dev/v2/mcp — no npm install needed. One-click Cursor deep-link, Claude Desktop & Windsurf configs.
  • MCP HTTP Transport: Streamable HTTP transport (MCP_HTTP_MODE=true), in addition to default stdio.
  • Proxy Support: --proxy http://host:port or --proxy socks5://user:pass@host:port. Works with HTTP and browser rendering.
  • 538 tests, 0 type errors.
v0.10.0
February 19, 2026
  • Browser profiles: webpeel profile create|list|show|delete — persist cookies & auth across sessions. Use --profile <name> on fetch/search.
  • CSS schema extraction: 6 bundled schemas (Booking.com, Amazon, eBay, Yelp, Walmart, HN) — auto-detected by domain. Use --schema <name> or --list-schemas.
  • LLM extraction: --llm-extract [instruction] — structured data from any page using OpenAI-compatible BYOK. Includes per-request cost tracking.
  • Hotel search: webpeel hotels "destination" --checkin DATE --sort price — multi-source parallel search. Expedia now works via Stealth v2.
  • Stealth v2: Fixed viewport fingerprint, __pwInitScripts injection artifacts, and Chrome branding leaks. PerimeterX bypass confirmed reliable.
  • Challenge detection end-to-end: detectChallenge() now runs after every fetch. 17 new signals added. PerimeterX/DataDome errors include vendor + confidence fields.
  • PostgreSQL TLS secure-by-default: DB_SSL_REJECT_UNAUTHORIZED defaults to true. Timing oracle and OAuth per-IP rate limiter also fixed.
  • Title cleaning: strips Google Travel price suffixes, "Opens in new window", and ad prefix artifacts.
  • Multi-cluster extraction: improved listings on diverse pages (HN threads, mixed-media).
v0.9.0
February 18, 2026
  • Stealth 10/10: Dynamic challenge detection (Cloudflare, PerimeterX, Akamai, DataDome, Incapsula) with confidence scoring and auto-escalation
  • Site-aware search: webpeel search --site ebay "query" — 27 sites across 7 categories, no URL knowledge needed
  • Agent mode: --agent flag sets JSON output, token budget, extraction, and silent mode in one shot
  • Listing extraction: --extract-all auto-detects product cards, search results, and repeated patterns
  • Full escalation cascade: simple → browser → stealth → stealth+wait → return with warning
  • 28 stealth domains (Amazon, eBay, Walmart, Nike, LinkedIn, and more) with auto-detection
  • Updated user agents (Chrome 132-136) with dynamic Sec-CH-UA header generation
  • webpeel sites command to list all supported site templates
  • agentMode option in library API + MCP default 4,000 token budget
  • 368 tests passing
v0.8.1
February 17, 2026
  • npm publish fix — corrected package entry points and exports
  • All MIT references updated to AGPL-3.0 across site, README, and docs
  • Homepage version badge updated to v0.8.1
  • API docs OpenAPI version updated
v0.8.0
February 17, 2026
  • License changed from MIT to AGPL-3.0 — protects the project while staying open source. Commercial licensing available.
  • Premium server architecture — SWR cache, domain intelligence, and parallel race strategy now server-only (not shipped in npm package)
  • Hook-based strategy pattern for extensible fetch pipeline
  • Stale-while-revalidate caching with 30s revalidation timeout guard
  • Domain intelligence learns which sites need browser/stealth from traffic history
  • Chrome 131 impersonation headers for better anti-detection
  • GFM table conversion, enhanced content cleaning (paywall gates, chat widgets)
  • Benchmark v7: 100% success rate, 373ms median, 92.3% quality
  • Dashboard security headers hardened (HSTS, X-XSS-Protection, Permissions-Policy)
v0.7.0
February 14, 2026
  • Interactive Playground with code generation
  • Firecrawl-compatible /v2/scrape, /v2/crawl, /v2/map endpoints
  • Full documentation site with API reference, CLI, MCP, SDKs docs
  • AI Agent endpoint (/v1/agent) for autonomous web research
  • Answer endpoint (/v1/answer) — search + LLM synthesis with citations
  • Brave Search BYOK support
  • Hosted MCP endpoint (single URL setup)
v0.6.0
February 14, 2026
  • Branding extraction endpoint
  • Change tracking / content monitoring
  • SSE streaming for agent and crawl
  • Batch scraping with webhooks
v0.5.0
February 14, 2026
  • Search endpoint with DuckDuckGo
  • Crawl endpoint with depth control
  • Map endpoint for URL discovery
v0.4.0
February 13, 2026
  • MCP server with 13 tools
  • Python SDK (zero deps)
  • LangChain & LlamaIndex integrations
v0.3.0
February 13, 2026
  • Stealth mode with anti-detection
  • Smart escalation (HTTP → browser → stealth)
  • CLI tool
v0.2.0
February 12, 2026
  • Browser rendering with Playwright
  • Multiple output formats (markdown, HTML, text, JSON)
v0.1.0
February 12, 2026
  • Initial release — HTTP fetch with content extraction