Reader Mode v0.15

Strip every piece of page noise — navigation, ads, cookie banners, sidebars, share buttons, related-articles widgets — and get back the pure content as clean Markdown, with title, author, date, and reading time.

What It Does

Reader Mode applies a multi-signal noise removal pipeline to any webpage. Think of it as the "Reader View" button in Firefox or Safari, but for your code. The result is clean, LLM-ready Markdown with full article metadata attached.

CLI

# Enable reader mode with --readable
npx webpeel "https://techcrunch.com/2026/02/24/some-article" --readable

# JSON output — includes metadata fields
npx webpeel "https://techcrunch.com/2026/02/24/some-article" --readable --json

# Also works with browser rendering for JS-rendered pages
npx webpeel "https://medium.com/@user/article-slug" --readable --render --json

API

# Basic readable fetch
GET /v1/fetch?url=https://techcrunch.com/2026/02/24/some-article&readable=true

# With curl
curl "https://api.webpeel.dev/v1/fetch?url=https://techcrunch.com/2026/02/24/some-article&readable=true" \
  -H "Authorization: Bearer YOUR_API_KEY"

Query Parameters

Parameter Type Description
url string (required) Page URL to fetch in reader mode
readable boolean Set to true to enable reader mode
render boolean Use headless browser for JS-rendered pages

MCP

Pass readable: true to webpeel_fetch:

{
  "tool": "webpeel_fetch",
  "arguments": {
    "url": "https://techcrunch.com/2026/02/24/some-article",
    "readable": true
  }
}

How It Works

Reader Mode runs a three-stage pipeline entirely in-process — no external calls:

Stage 1 — Noise Removal (25+ patterns)

The following element types are stripped before any content scoring:

Category What gets removed
Navigation <nav>, [role=navigation], header nav bars, breadcrumb trails
Advertising .ad, .ads, .advertisement, [data-ad], iframe ads
Cookie / GDPR .cookie-banner, #consent, GDPR overlays, "Accept cookies" dialogs
Sidebars aside, [role=complementary], .sidebar, .widget-area
Share buttons .share, .social-share, .addthis, floating share bars
Related articles .related, .recommended, .you-may-also-like
Comments #comments, .comment-section, Disqus embeds
Footers <footer>, [role=contentinfo], site footers
Pop-ups / modals .modal, .popup, newsletter sign-up overlays
Sticky bars Fixed-position headers, scroll-triggered notification bars

Stage 2 — Candidate Scoring

After stripping noise, remaining block-level elements (<article>, <main>, <div>, <section>) are scored by content density — the ratio of text to HTML. The highest-scoring block is selected as the article body.

Stage 3 — Metadata Extraction

Structured metadata is extracted from <meta> tags, JSON-LD, and OpenGraph properties, then verified against visible on-page signals:

Example Output

{
  "url": "https://techcrunch.com/2026/02/24/some-article",
  "readable": true,
  "title": "The Rise of LLM-Free Web Agents",
  "author": "Jane Smith",
  "publishedAt": "2026-02-24T08:00:00Z",
  "readingTime": 4,
  "wordCount": 980,
  "content": "# The Rise of LLM-Free Web Agents\n\nFor years, building a reliable web agent meant stitching together an LLM, a browser, and a prompt. But a new wave of tools is changing that...\n\n## Why BM25 Is Enough for Most Tasks\n\nLarge language models are powerful, but they're also slow and expensive...",
  "excerpt": "For years, building a reliable web agent meant stitching together an LLM, a browser, and a prompt."
}

SDK Usage

import { peel } from 'webpeel';

const result = await peel('https://techcrunch.com/2026/02/24/some-article', {
  readable: true
});

console.log(result.title);       // "The Rise of LLM-Free Web Agents"
console.log(result.author);      // "Jane Smith"
console.log(result.publishedAt); // "2026-02-24T08:00:00Z"
console.log(result.readingTime); // 4 (minutes)
console.log(result.wordCount);   // 980
console.log(result.content);     // Clean Markdown article body
from webpeel import WebPeel

client = WebPeel()
result = client.scrape(
    "https://techcrunch.com/2026/02/24/some-article",
    readable=True
)

print(result.title)        # "The Rise of LLM-Free Web Agents"
print(result.author)       # "Jane Smith"
print(result.reading_time) # 4
print(result.word_count)   # 980
print(result.content)      # Clean Markdown

When to Use Reader Mode

Use case Use Reader Mode?
News articles, blog posts ✅ Yes — lots of surrounding noise
Documentation pages ✅ Yes — strip nav and ads
Long-form essays ✅ Yes — ideal use case
E-commerce product pages ⚠️ Partial — use --schema instead for structured data
Search result pages ❌ No — content IS the listing grid
Single-page apps (SPAs) ✅ Yes, but add --render flag
Twitter / GitHub / Reddit ❌ No — use Domain Extractors instead
💡 Combine with --focus for LLM efficiency
Stack Reader Mode with BM25 query filtering for maximum token efficiency: --readable --focus "climate impact". Reader Mode strips noise first, then BM25 keeps only the most query-relevant paragraphs. Typical savings: 40–75% fewer tokens vs. raw HTML.