fetcher-mcp
by: jae-jae
MCP server for fetch web page content using Playwright headless browser.
📌Overview
Purpose: The Fetcher MCP framework aims to facilitate the retrieval of web page content using a headless browser, specifically designed for dynamic and modern web applications.
Overview: Fetcher MCP leverages Playwright to execute JavaScript, enabling it to interact with dynamic content on web pages. By intelligently extracting the primary content and providing flexible output options, it enhances the web scraping experience with efficiency and reliability.
Key Features:
-
Intelligent Content Extraction: Automatically removes non-essential elements, focusing on the main content of web pages, thereby simplifying downstream data processing.
-
Parallel Processing: Allows for concurrent fetching of multiple URLs, significantly boosting efficiency for large-scale data retrieval operations.
-
Configurable Parameters: Offers extensive customization options for timeouts, output formats, and content extraction techniques to adapt to various web scraping scenarios.
Fetcher MCP
MCP server for fetching web page content using the Playwright headless browser.
Advantages
- JavaScript Support: Executes JavaScript to handle dynamic web content and modern web applications.
- Intelligent Content Extraction: Built-in Readability algorithm extracts the main content, removing ads, navigation, and other non-essential elements.
- Flexible Output Format: Supports both HTML and Markdown output formats.
- Parallel Processing: The
fetch_urls
tool enables concurrent fetching of multiple URLs for improved efficiency. - Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage.
- Robust Error Handling: Comprehensive error handling and logging for reliable operation.
- Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting.
Quick Start
Run directly with npx:
npx -y fetcher-mcp
First time setup — install the required browser:
npx playwright install chromium
Debug Mode
Run with the --debug
option to show the browser window for debugging:
npx -y fetcher-mcp --debug
Configuration MCP
Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"fetcher": {
"command": "npx",
"args": ["-y", "fetcher-mcp"]
}
}
}
Features
fetch_url
Retrieve web page content from a specified URL.
- Uses Playwright headless browser to parse JavaScript.
- Supports intelligent extraction of main content and conversion to Markdown.
Parameters:
url
(required): The URL to fetch.timeout
: Page loading timeout in milliseconds, default 30000 (30s).waitUntil
: When navigation is considered complete; options: 'load', 'domcontentloaded', 'networkidle', 'commit'; default 'load'.extractContent
: Whether to extract the main content, default true.maxLength
: Maximum content length in characters, no limit by default.returnHtml
: Return HTML instead of Markdown, default false.waitForNavigation
: Wait for additional navigation after load, useful for anti-bot verification, default false.navigationTimeout
: Maximum wait for additional navigation in ms, default 10000.disableMedia
: Disable media resources to save bandwidth, default true.debug
: Enable debug mode for this request, overrides server flag.
fetch_urls
Batch retrieval of web page content from multiple URLs in parallel.
- Uses multi-tab parallel fetching.
- Returns combined results separated clearly.
Parameters:
urls
(required): Array of URLs.- Other parameters same as
fetch_url
.
Tips
Handling Special Website Scenarios
- Anti-Crawler Mechanisms
UsewaitForNavigation: true
to wait for full page loading (useful for CAPTCHA, redirects). - Increase Timeout
Set longer timeouts (e.g., 60000 ms) for slow-loading pages. - Preserve Original HTML
If extraction fails, setextractContent: false
andreturnHtml: true
. - Fetch Complete Content
To get all page content, setextractContent: false
. - Return HTML Format
SetreturnHtml: true
to get HTML instead of Markdown.
Debugging and Authentication
- Enable Debug Mode Per Request
Setdebug: true
in fetch parameters to show the browser window. - Manual Login
Run with--debug
ordebug: true
to manually log in to a website before fetching. - Debug Browser Interaction
When debug is enabled, the browser window stays open for manual login before fetching content.
Development
Install Dependencies
npm install
Install Playwright Browser
npm run install-browser
Build the Server
npm run build
Debugging
Use MCP Inspector:
npm run inspector
Run server with visible browser:
node build/index.js --debug
Related Projects
- g-search-mcp: A powerful MCP server for Google search enabling parallel multi-keyword batch operations.
License
Licensed under the MIT License