Domain Extractors v0.15
Pass a Twitter/X, Reddit, GitHub, or Hacker News URL to WebPeel and get back clean, structured data — tweets, comments, repo stats, stories. No special flags. Fully automatic.
What It Does
Domain Extractors are purpose-built parsers that activate automatically when WebPeel detects a supported platform URL. Instead of scraping HTML (slow, brittle, blocked), each extractor calls the platform's own data API for reliable, structured output:
- 🐦 Twitter/X — tweet text, engagement metrics, threads, quoted tweets
- 🟠 Reddit — post + comments via Reddit's JSON API
- 🐙 GitHub — repo stats, README, open issues, recent PRs via GitHub API
- 🟧 Hacker News — stories + threaded comments via Firebase API
All extractors are zero-config: just fetch the URL as you normally would.
CLI
# GitHub repo — structured stats + README
npx webpeel "https://github.com/webpeel/webpeel" --json
# Reddit post + comments
npx webpeel "https://reddit.com/r/programming/comments/xyz/title" --json
# Hacker News story + comments
npx webpeel "https://news.ycombinator.com/item?id=12345" --json
# Twitter/X tweet
npx webpeel "https://x.com/webpeel_dev/status/12345" --json
# Plain text output (no --json)
npx webpeel "https://github.com/webpeel/webpeel"
The structured data appears in the domainData field of the JSON output alongside the standard content and title fields.
MCP
Domain Extractors are automatic via webpeel_fetch. No special tool name required:
{
"tool": "webpeel_fetch",
"arguments": {
"url": "https://github.com/webpeel/webpeel"
}
}
The response will include a domainData field with the structured platform data when a supported domain is detected.
Supported Platforms
Twitter / X
Extracts tweet content and engagement using the tweet embed API — no Twitter developer account needed.
Supported URL patterns
https://x.com/:user/status/:idhttps://twitter.com/:user/status/:id
Example output
{
"platform": "twitter",
"tweet": {
"id": "1234567890",
"text": "Introducing WebPeel v0.15 — YouTube transcripts, BM25 Q&A, and more 🚀",
"author": {
"username": "webpeel_dev",
"name": "WebPeel",
"verified": false
},
"metrics": {
"likes": 142,
"retweets": 38,
"replies": 17,
"views": 4821
},
"createdAt": "2026-02-24T10:00:00Z",
"quotedTweet": null,
"thread": []
}
}
Uses Reddit's public .json API endpoint — no OAuth required, no scraping.
Supported URL patterns
https://reddit.com/r/:sub/comments/:id/:slughttps://www.reddit.com/r/:sub/comments/:id/:slughttps://old.reddit.com/r/:sub/comments/:id/:slug
Example output
{
"platform": "reddit",
"post": {
"title": "WebPeel vs Firecrawl — an honest comparison",
"author": "u/dev_tools_fan",
"subreddit": "r/programming",
"score": 312,
"upvoteRatio": 0.94,
"numComments": 47,
"url": "https://reddit.com/r/programming/comments/xyz/...",
"selftext": "I spent a weekend benchmarking both tools...",
"createdAt": "2026-02-20T18:45:00Z"
},
"comments": [
{
"author": "u/alice",
"score": 88,
"text": "WebPeel's BM25 quick answer is genuinely useful for agent pipelines.",
"replies": [
{
"author": "u/bob",
"score": 31,
"text": "Agreed — especially the no-LLM-key angle."
}
]
}
]
}
GitHub
Calls the GitHub REST API to return repo metadata, README, open issues, and recent pull requests. No GitHub token needed for public repos (60 req/hour unauthenticated; set GITHUB_TOKEN for 5000 req/hour).
Supported URL patterns
https://github.com/:owner/:repohttps://github.com/:owner/:repo/issues/:nhttps://github.com/:owner/:repo/pull/:n
Example output
{
"platform": "github",
"repo": {
"name": "webpeel",
"fullName": "webpeel/webpeel",
"description": "Fast web fetching for AI agents",
"stars": 1842,
"forks": 94,
"openIssues": 12,
"language": "TypeScript",
"license": "AGPL-3.0",
"topics": ["web-scraping", "ai", "mcp", "typescript"],
"homepage": "https://webpeel.dev",
"pushedAt": "2026-02-24T09:30:00Z"
},
"readme": "# WebPeel\n\nFast web fetching for AI agents...",
"recentIssues": [
{ "number": 201, "title": "Support for Playwright tracing", "state": "open", "createdAt": "2026-02-22T..." }
],
"recentPRs": [
{ "number": 198, "title": "feat: YouTube transcript extractor", "state": "merged", "mergedAt": "2026-02-20T..." }
]
}
Hacker News
Uses the official HN Firebase API — same data powering news.ycombinator.com itself.
Supported URL patterns
https://news.ycombinator.com/item?id=:idhttps://news.ycombinator.com/(front page)
Example output
{
"platform": "hackernews",
"story": {
"id": 42123456,
"title": "WebPeel v0.15 – YouTube transcripts and BM25 Q&A",
"url": "https://webpeel.dev/changelog",
"score": 284,
"author": "jliu",
"numComments": 63,
"createdAt": "2026-02-24T08:00:00Z"
},
"comments": [
{
"id": 42123500,
"author": "pg",
"text": "Nice. The BM25 quick answer is clever.",
"score": 41,
"replies": []
}
]
}
SDK Usage
import { peel } from 'webpeel';
// GitHub repo
const repo = await peel('https://github.com/webpeel/webpeel');
console.log(repo.domainData.repo.stars); // 1842
console.log(repo.domainData.readme); // README markdown
// Reddit post
const post = await peel('https://reddit.com/r/programming/comments/xyz/title');
console.log(post.domainData.post.score); // 312
console.log(post.domainData.comments); // threaded comments
// Hacker News story
const hn = await peel('https://news.ycombinator.com/item?id=42123456');
console.log(hn.domainData.story.title);
console.log(hn.domainData.comments);
from webpeel import WebPeel
client = WebPeel()
# GitHub repo
repo = client.scrape("https://github.com/webpeel/webpeel")
print(repo.domain_data["repo"]["stars"]) # 1842
print(repo.domain_data["readme"]) # README markdown
# Reddit post
post = client.scrape("https://reddit.com/r/programming/comments/xyz/title")
print(post.domain_data["post"]["score"])
print(post.domain_data["comments"])
The GitHub extractor works without authentication for public repos (60 requests/hour). Set a
GITHUB_TOKEN environment variable to raise the limit to 5,000 requests/hour. Generate a token at github.com/settings/tokens — no special scopes needed for public repos.