If you're feeding web pages to GPT-4, Claude, or any LLM, you're probably burning money on unnecessary tokens.
A typical web page contains 50,000+ tokens of HTML. At GPT-4 Turbo pricing ($0.01/1K tokens), that's $0.50 per page. Process 100 pages a day, and you're spending $50. Scale that to 10,000 pages, and you're at $5,000.
Here's the good news: With smart extraction, you can reduce that by 96%.
The Token Cost Problem
Real Benchmark: TechCrunch Article
96.4% reduction
Why does raw HTML use so many tokens?
- Navigation menus — 500-2000 tokens of links
- Footers — Another 500-1000 tokens
- Ads and tracking scripts — 3000-5000 tokens
- CSS classes and IDs —
<div class="flex items-center justify-between px-4 py-2 bg-gray-100">= 30 tokens - Boilerplate — Sidebars, related posts, comments
The actual article? Only 1,500-3,000 tokens. Everything else is noise.
5 Techniques to Slash Token Costs
1. Article Detection
The first step is finding the main content. WebPeel uses Mozilla's Readability algorithm to isolate the article:
import { peel } from 'webpeel';
const result = await peel('https://techcrunch.com/article', {
format: 'markdown'
});
console.log(result.tokens); // 1,890 (vs 52,340 raw)
This single step removes 90%+ of the bloat.
2. Markdown Conversion
Markdown is way more token-efficient than HTML:
| Content | HTML Tokens | Markdown Tokens |
|---|---|---|
| Link | ~45 | ~15 |
| Heading | ~30 | ~8 |
| Code block | ~80 | ~25 |
| Image | ~60 | ~12 |
Why? HTML has verbose tags, attributes, and class names. Markdown is clean syntax.
<a href="https://example.com" class="text-blue-600 hover:underline font-semibold">Read more</a>
[Read more](https://example.com)
3. Token Budget Control
Sometimes even clean markdown is too long. WebPeel lets you cap the output:
const result = await peel(url, {
maxTokens: 2000
});
// Guarantees result.content is ≤2000 tokens
// Perfect for fitting in LLM context windows
This uses smart truncation — it cuts from the end, preserving the introduction and key points.
4. Quality Scoring & Retry
Not all extractions are good. WebPeel scores extraction quality and retries with browser rendering if needed:
const result = await peel(url);
console.log(result.quality); // 0-1 score
// Quality factors:
// - Text length (too short = bad extraction)
// - Text/HTML ratio (too low = lots of boilerplate)
// - Presence of article indicators (byline, date, paragraphs)
if (result.quality < 50) {
// WebPeel auto-escalates to browser mode
}
5. Content Fingerprinting (Caching)
Web pages don't change every second. WebPeel generates a content fingerprint (hash of the HTML) so you can avoid re-processing unchanged pages:
const result = await peel(url);
console.log(result.fingerprint); // "abc123def456"
// On next fetch:
// - If fingerprint matches, return cached extraction
// - If different, re-extract
// - Saves both API calls and token costs
This is huge for monitoring use cases (checking docs, blogs, pricing pages).
Real Benchmarks
We tested WebPeel on 50 popular websites. Here are the results:
| Website | Raw HTML Tokens | Smart Extraction Tokens | Reduction |
|---|---|---|---|
| TechCrunch article | 52,340 | 1,890 | 96.4% |
| GitHub README | 8,420 | 3,120 | 62.9% |
| Medium blog post | 41,200 | 2,340 | 94.3% |
| Next.js docs page | 18,900 | 4,200 | 77.8% |
| Product page (e-commerce) | 62,100 | 1,120 | 98.2% |
| Wikipedia article | 71,400 | 5,600 | 92.2% |
Average reduction: 87.3%
💰 ROI Calculation for Businesses
Scenario: AI research tool processing 10,000 pages/month
Without smart extraction:
- Average 50,000 tokens/page
- 500M tokens/month
- At $0.01/1K tokens = $5,000/month
With smart extraction:
- Average 2,500 tokens/page (95% reduction)
- 25M tokens/month
- At $0.01/1K tokens = $250/month
Savings: $4,750/month ($57,000/year)
Implementation Guide
Step 1: Install WebPeel
npm install -g webpeel
Step 2: Use Smart Extraction by Default
import { peel } from 'webpeel';
async function fetchForLLM(url) {
const result = await peel(url, {
format: 'markdown', // Clean output
maxTokens: 3000, // Cap size
// Use result.fingerprint for change detection
});
return {
content: result.content,
tokens: result.tokens,
fingerprint: result.fingerprint
};
}
Step 3: Monitor Token Usage
const result = await fetchForLLM(url);
console.log(`Fetched: ${url}`);
console.log(`Tokens: ${result.tokens}`);
console.log(`Cost: $${(result.tokens / 1000 * 0.01).toFixed(4)}`);
// Log to analytics for cost tracking
Step 4: Batch Processing with Delays
When processing many pages, add delays to avoid rate limits:
const urls = [...]; // 1000 URLs
for (const url of urls) {
const result = await fetchForLLM(url);
// Store or process result
await saveToDatabase(result);
// Rate limit: 1 request/second
await sleep(1000);
}
Advanced: Selective Extraction
For highly structured pages, you can extract only what you need:
// Extract just the pricing from a page
const result = await peel(url, {
extract: {
title: '.plan-name',
price: '.plan-price',
features: '.feature-list li'
}
});
console.log(result.extracted);
// Result: Clean JSON, ~100 tokens
// vs. Full page markdown: ~3,000 tokens
Best Practices
- Always use markdown format — 3-5x more efficient than HTML
- Set a token budget —
maxTokensprevents bloat - Enable caching — Avoid re-processing unchanged content
- Monitor quality scores — Low scores = bad extraction = wasted tokens
- Use structured extraction when possible — JSON is more efficient than full-page markdown
- Track costs per page — Identify high-cost pages and optimize them
Common Mistakes
❌ Mistake 1: Feeding Raw HTML to LLMs
// BAD: 50K+ tokens
const html = await fetch(url).then(r => r.text());
await llm.chat(html);
✅ Fix: Use Smart Extraction
// GOOD: 2K tokens
const result = await peel(url);
await llm.chat(result.content);
❌ Mistake 2: No Token Budget
// BAD: Some pages return 20K tokens
const result = await peel(url);
✅ Fix: Set a Limit
// GOOD: Guaranteed ≤3K tokens
const result = await peel(url, { maxTokens: 3000 });
❌ Mistake 3: Re-processing Unchanged Pages
// BAD: Fetches the same page every hour
setInterval(() => fetchPage(url), 3600000);
✅ Fix: Use Content Fingerprinting
// GOOD: Only re-extracts if content changed
const result = await peel(url);
if (result.fingerprint !== lastFingerprint) {
processNewContent(result);
}
The Bottom Line
If you're building AI agents that fetch web content, smart extraction isn't optional — it's essential.
The difference between raw HTML and smart extraction is:
- 96% token reduction
- 25x cost savings
- Faster LLM responses (fewer tokens = faster generation)
- Better results (less noise = higher quality answers)
WebPeel handles all of this automatically. Install it, use format: 'markdown', and watch your token costs drop.
Ready to cut your LLM costs?
Get started with WebPeel →