Domain Extractors v0.15

Pass a Twitter/X, Reddit, GitHub, or Hacker News URL to WebPeel and get back clean, structured data — tweets, comments, repo stats, stories. No special flags. Fully automatic.

What It Does

Domain Extractors are purpose-built parsers that activate automatically when WebPeel detects a supported platform URL. Instead of scraping HTML (slow, brittle, blocked), each extractor calls the platform's own data API for reliable, structured output:

All extractors are zero-config: just fetch the URL as you normally would.

CLI

# GitHub repo — structured stats + README
npx webpeel "https://github.com/webpeel/webpeel" --json

# Reddit post + comments
npx webpeel "https://reddit.com/r/programming/comments/xyz/title" --json

# Hacker News story + comments
npx webpeel "https://news.ycombinator.com/item?id=12345" --json

# Twitter/X tweet
npx webpeel "https://x.com/webpeel_dev/status/12345" --json

# Plain text output (no --json)
npx webpeel "https://github.com/webpeel/webpeel"

The structured data appears in the domainData field of the JSON output alongside the standard content and title fields.

MCP

Domain Extractors are automatic via webpeel_fetch. No special tool name required:

{
  "tool": "webpeel_fetch",
  "arguments": {
    "url": "https://github.com/webpeel/webpeel"
  }
}

The response will include a domainData field with the structured platform data when a supported domain is detected.

Supported Platforms

Twitter / X

Extracts tweet content and engagement using the tweet embed API — no Twitter developer account needed.

Supported URL patterns

Example output

{
  "platform": "twitter",
  "tweet": {
    "id": "1234567890",
    "text": "Introducing WebPeel v0.15 — YouTube transcripts, BM25 Q&A, and more 🚀",
    "author": {
      "username": "webpeel_dev",
      "name": "WebPeel",
      "verified": false
    },
    "metrics": {
      "likes": 142,
      "retweets": 38,
      "replies": 17,
      "views": 4821
    },
    "createdAt": "2026-02-24T10:00:00Z",
    "quotedTweet": null,
    "thread": []
  }
}

Reddit

Uses Reddit's public .json API endpoint — no OAuth required, no scraping.

Supported URL patterns

Example output

{
  "platform": "reddit",
  "post": {
    "title": "WebPeel vs Firecrawl — an honest comparison",
    "author": "u/dev_tools_fan",
    "subreddit": "r/programming",
    "score": 312,
    "upvoteRatio": 0.94,
    "numComments": 47,
    "url": "https://reddit.com/r/programming/comments/xyz/...",
    "selftext": "I spent a weekend benchmarking both tools...",
    "createdAt": "2026-02-20T18:45:00Z"
  },
  "comments": [
    {
      "author": "u/alice",
      "score": 88,
      "text": "WebPeel's BM25 quick answer is genuinely useful for agent pipelines.",
      "replies": [
        {
          "author": "u/bob",
          "score": 31,
          "text": "Agreed — especially the no-LLM-key angle."
        }
      ]
    }
  ]
}

GitHub

Calls the GitHub REST API to return repo metadata, README, open issues, and recent pull requests. No GitHub token needed for public repos (60 req/hour unauthenticated; set GITHUB_TOKEN for 5000 req/hour).

Supported URL patterns

Example output

{
  "platform": "github",
  "repo": {
    "name": "webpeel",
    "fullName": "webpeel/webpeel",
    "description": "Fast web fetching for AI agents",
    "stars": 1842,
    "forks": 94,
    "openIssues": 12,
    "language": "TypeScript",
    "license": "AGPL-3.0",
    "topics": ["web-scraping", "ai", "mcp", "typescript"],
    "homepage": "https://webpeel.dev",
    "pushedAt": "2026-02-24T09:30:00Z"
  },
  "readme": "# WebPeel\n\nFast web fetching for AI agents...",
  "recentIssues": [
    { "number": 201, "title": "Support for Playwright tracing", "state": "open", "createdAt": "2026-02-22T..." }
  ],
  "recentPRs": [
    { "number": 198, "title": "feat: YouTube transcript extractor", "state": "merged", "mergedAt": "2026-02-20T..." }
  ]
}

Hacker News

Uses the official HN Firebase API — same data powering news.ycombinator.com itself.

Supported URL patterns

Example output

{
  "platform": "hackernews",
  "story": {
    "id": 42123456,
    "title": "WebPeel v0.15 – YouTube transcripts and BM25 Q&A",
    "url": "https://webpeel.dev/changelog",
    "score": 284,
    "author": "jliu",
    "numComments": 63,
    "createdAt": "2026-02-24T08:00:00Z"
  },
  "comments": [
    {
      "id": 42123500,
      "author": "pg",
      "text": "Nice. The BM25 quick answer is clever.",
      "score": 41,
      "replies": []
    }
  ]
}

SDK Usage

import { peel } from 'webpeel';

// GitHub repo
const repo = await peel('https://github.com/webpeel/webpeel');
console.log(repo.domainData.repo.stars);   // 1842
console.log(repo.domainData.readme);        // README markdown

// Reddit post
const post = await peel('https://reddit.com/r/programming/comments/xyz/title');
console.log(post.domainData.post.score);   // 312
console.log(post.domainData.comments);     // threaded comments

// Hacker News story
const hn = await peel('https://news.ycombinator.com/item?id=42123456');
console.log(hn.domainData.story.title);
console.log(hn.domainData.comments);
from webpeel import WebPeel

client = WebPeel()

# GitHub repo
repo = client.scrape("https://github.com/webpeel/webpeel")
print(repo.domain_data["repo"]["stars"])  # 1842
print(repo.domain_data["readme"])         # README markdown

# Reddit post
post = client.scrape("https://reddit.com/r/programming/comments/xyz/title")
print(post.domain_data["post"]["score"])
print(post.domain_data["comments"])
💡 GitHub rate limits
The GitHub extractor works without authentication for public repos (60 requests/hour). Set a GITHUB_TOKEN environment variable to raise the limit to 5,000 requests/hour. Generate a token at github.com/settings/tokens — no special scopes needed for public repos.