Documentation / MCP Server

MCP Server

Connect WebPeel to Claude Desktop, Cursor, Windsurf, OpenClaw, and other MCP-compatible tools. Give your AI assistant powerful web scraping capabilities.

What is MCP?

The Model Context Protocol (MCP) is an open standard developed by Anthropic that lets AI assistants access external tools and data sources. With WebPeel's MCP server, your AI can:

Fetch and extract content from any URL
Search the web for current information
Crawl entire websites to build knowledge bases
Extract structured data with CSS selectors or AI prompts
Monitor pages for changes
Batch process multiple URLs

Quick Start

Connect to WebPeel via MCP in two ways:

Hosted MCP endpoint — one URL, Streamable HTTP transport (recommended)
Local MCP server — run via npx and connect over stdio

Hosted MCP endpoint (recommended)

Use the hosted endpoint at https://api.webpeel.dev/mcp. Any MCP client that supports Streamable HTTP can connect with just:

{
  "url": "https://api.webpeel.dev/mcp",
  "headers": {
    "Authorization": "Bearer <key>"
  }
}

Stateless: no session management required.

Local MCP server

Run WebPeel's MCP server locally via npx — no installation required.

Start the Server

npx webpeel mcp

The local server runs on stdio and is automatically managed by your MCP client.

Configuration

Add WebPeel to your MCP client's configuration file:

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%/Claude/claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"]
    }
  }
}

Restart Claude Desktop to load the server.

Edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"]
    }
  }
}

Restart Cursor to apply changes.

Open VS Code Settings (JSON) and add to your cline.mcpServers config, or edit ~/.vscode/cline_mcp_settings.json:

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"]
    }
  }
}

Restart Cline or reload the VS Code window to load the server.

Edit ~/.windsurf/mcp_server_config.json:

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"]
    }
  }
}

Restart Windsurf to load the server.

Edit ~/.openclaw/mcp.json:

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"]
    }
  }
}

Restart OpenClaw or reload the MCP configuration.

Claude Code (CLI)

For Claude Code, use the one-liner:

claude mcp add webpeel -- npx -y webpeel mcp

Using an API Key

To use WebPeel's hosted API instead of local scraping, set the WEBPEEL_API_KEY environment variable:

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"],
      "env": {
        "WEBPEEL_API_KEY": "your-api-key-here"
      }
    }
  }
}

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"],
      "env": {
        "WEBPEEL_API_KEY": "your-api-key-here"
      }
    }
  }
}

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["webpeel", "mcp"],
      "env": {
        "WEBPEEL_API_KEY": "your-api-key-here"
      }
    }
  }
}

Smithery Installation

Install via Smithery registry (one-command setup):

npx @smithery/cli install @webpeel/mcp

Available Tools

WebPeel provides 11 core MCP tools (supported by the hosted MCP endpoint): webpeel_fetch, webpeel_search, webpeel_crawl, webpeel_map, webpeel_extract, webpeel_batch, webpeel_agent, webpeel_screenshot, webpeel_brand, webpeel_summarize, webpeel_answer.

The local MCP server may expose additional tools depending on your WebPeel version.

webpeel_fetch

Fetch a URL and return clean, AI-ready markdown content.

Parameters

url (string, required) — The URL to fetch
format (string) — Output format: markdown, text, or html (default: markdown)
render (boolean) — Force browser rendering for JavaScript-heavy sites (default: false)
stealth (boolean) — Use stealth mode to bypass bot detection (default: false)
wait (number) — Milliseconds to wait for dynamic content (only with render=true)
selector (string) — CSS selector to extract specific content (e.g., "article")
exclude (array) — CSS selectors to exclude (e.g., [".sidebar", ".ads"])
includeTags (array) — Only include these HTML tags/classes
excludeTags (array) — Remove these HTML tags/classes
images (boolean) — Extract image URLs (default: false)
maxTokens (number) — Maximum token count (truncates if exceeded)
screenshot (boolean) — Capture page screenshot (returns base64 PNG)
screenshotFullPage (boolean) — Full-page screenshot (default: viewport only)
location (string) — ISO country code for geo-targeting (e.g., "US", "DE")
headers (object) — Custom HTTP headers
actions (array) — Page actions to execute before extraction
extract (object) — Structured data extraction options

Example

// Ask your AI assistant:
"Fetch https://example.com and extract the main article"

// With stealth mode:
"Fetch https://protected-site.com with stealth mode enabled"

// Extract specific content:
"Fetch https://news.com and only show me the article content, excluding nav and ads"

webpeel_search

Search the web (DuckDuckGo by default, or Brave Search with a BYOK API key) and return results with titles, URLs, and snippets.

Parameters

query (string, required) — Search query
count (number) — Number of results (1-10, default: 5)
provider (string) — duckduckgo (default) or brave
searchApiKey (string) — Brave Search API key (required when provider="brave")

Example

// Ask your AI assistant:
"Search for 'docker containers best practices'"

// Then fetch the top result:
"Fetch the first result and summarize it"

webpeel_crawl

Crawl a website starting from a URL, following links and extracting content. Perfect for documentation sites or building knowledge bases.

Parameters

url (string, required) — Starting URL
maxPages (number) — Max pages to crawl (1-100, default: 10)
maxDepth (number) — Max crawl depth (1-5, default: 2)
allowedDomains (array) — Only crawl these domains
excludePatterns (array) — Exclude URLs matching these regex patterns
respectRobotsTxt (boolean) — Respect robots.txt (default: true)
rateLimitMs (number) — Rate limit between requests (default: 1000ms)
sitemapFirst (boolean) — Discover URLs via sitemap first (default: false)
render (boolean) — Use browser rendering for all pages
stealth (boolean) — Use stealth mode for all pages

Example

// Ask your AI assistant:
"Crawl https://docs.example.com up to 50 pages and build a summary"

webpeel_map

Discover all URLs on a domain using sitemap.xml and link crawling. Returns URLs without fetching content.

Parameters

url (string, required) — Starting URL or domain
maxUrls (number) — Max URLs to discover (1-10000, default: 5000)
includePatterns (array) — Only include URLs matching these patterns (regex)
excludePatterns (array) — Exclude URLs matching these patterns (regex)

Example

// Ask your AI assistant:
"Map all URLs on https://example.com/docs/"

webpeel_extract

Extract structured data from a webpage using CSS selectors, JSON schema, or AI-powered extraction.

Parameters

url (string, required) — URL to extract from
selectors (object) — Map field names to CSS selectors (e.g., {"title": "h1", "price": ".price"})
schema (object) — JSON schema describing expected output
prompt (string) — Natural language prompt for AI extraction (requires llmApiKey)
llmApiKey (string) — API key for LLM-powered extraction
llmModel (string) — LLM model (default: gpt-4o-mini)
llmBaseUrl (string) — LLM API base URL (default: OpenAI)
render (boolean) — Use browser rendering

Example

// Ask your AI assistant:
"Extract the product title, price, and rating from https://shop.com/product/123"

// With AI extraction:
"Extract all the features mentioned on https://example.com/product as a bullet list"

webpeel_batch

Fetch multiple URLs in batch with concurrency control.

Parameters

urls (array, required) — Array of URLs to fetch
concurrency (number) — Max concurrent fetches (1-10, default: 3)
render (boolean) — Use browser rendering for all URLs
format (string) — Output format for all URLs
selector (string) — CSS selector for content extraction

Example

// Ask your AI assistant:
"Fetch these 5 URLs and compare their main content: [url1, url2, url3, url4, url5]"

webpeel_agent

Run WebPeel’s research agent: search → fetch → analyze → answer. Supports streaming and structured output via JSON Schema.

Parameters

prompt (string, required) — Research task prompt
llmApiKey (string, required) — LLM API key (BYOK)
llmModel (string) — Model name (optional)
depth (string) — basic or thorough
topic (string) — general, news, technical, academic
maxSources (number) — Max sources to fetch (1-20)
outputSchema (object) — JSON Schema for structured output
stream (boolean) — Stream progress + chunks via SSE

Example

// Ask your AI assistant:
"Use webpeel_agent to research the current best practices for SSE in Node.js.\nDepth: thorough. Topic: technical. Return JSON with fields: summary, pitfalls, and examples."

Additional tools (local MCP server only, if enabled):

webpeel_brand

Extract branding and design system from a URL. Returns colors, fonts, and visual identity.

Parameters

url (string, required) — URL to extract branding from
render (boolean) — Use browser rendering

Example

// Ask your AI assistant:
"What colors and fonts does https://stripe.com use?"

webpeel_change_track

Track changes on a URL by generating a content fingerprint.

Parameters

url (string, required) — URL to track
render (boolean) — Use browser rendering

Example

// Ask your AI assistant:
"Generate a fingerprint for https://example.com so I can detect changes later"

webpeel_summarize

Generate an AI-powered summary of a webpage using an LLM.

Parameters

url (string, required) — URL to summarize
llmApiKey (string, required) — API key for LLM
prompt (string) — Custom summary prompt (default: "Summarize this webpage in 2-3 sentences.")
llmModel (string) — LLM model (default: gpt-4o-mini)
llmBaseUrl (string) — LLM API base URL
render (boolean) — Use browser rendering

Example

// Ask your AI assistant:
"Summarize https://longform-article.com in a few sentences"

Usage Tips

When to Use Render Mode

JavaScript-heavy sites (SPAs, React apps)
Content loaded dynamically after page load
Sites with lazy-loading images or infinite scroll

When to Use Stealth Mode

Sites protected by Cloudflare or similar bot detection
Pages that block headless browsers
Sites with aggressive rate limiting

Best Practices

Start simple — Use basic fetch first, escalate to render/stealth only if needed
Use selectors — Extract only what you need with selector and exclude
Set maxTokens — Prevent large pages from overwhelming your context window
Batch similar requests — Use webpeel_batch for multiple URLs from the same site
Respect rate limits — Add delays when crawling large sites

Troubleshooting

Server not starting

Check your MCP client's logs. Common issues:

npx not in PATH → Install Node.js v18+
WebPeel package not found → Run npx webpeel mcp manually first
JSON syntax error → Validate your config file

Tools not appearing

Restart your MCP client
Check the config file syntax
Verify the server starts: npx webpeel mcp

Fetch timing out

Use render=false for simple HTML pages (faster)
Increase wait time for slow-loading dynamic content
Check if the site blocks automated access (try stealth=true)

Next Steps

💻

MCP Server

What is MCP?

Quick Start

Hosted MCP endpoint (recommended)

Local MCP server

Start the Server

Configuration

Claude Code (CLI)

Using an API Key

Smithery Installation

Available Tools

webpeel_fetch

Parameters

Example

webpeel_search

Parameters

Example

webpeel_crawl

Parameters

Example

webpeel_map

Parameters

Example

webpeel_extract

Parameters

Example

webpeel_batch

Parameters

Example

webpeel_agent

Parameters

Example

webpeel_brand

Parameters

Example

webpeel_change_track

Parameters

Example

webpeel_summarize

Parameters

Example

Usage Tips

When to Use Render Mode

When to Use Stealth Mode

Best Practices

Troubleshooting

Server not starting

Tools not appearing

Fetch timing out

Next Steps

CLI

API Reference

Self-Hosting

GitHub