MCP HubMCP Hub
jae-jae

fetcher-mcp

by: jae-jae

MCP server for fetch web page content using Playwright headless browser.

589created 19/03/2025
Visit
Playwright
web-scraping

📌Overview

Purpose: The Fetcher MCP framework aims to facilitate the retrieval of web page content using a headless browser, specifically designed for dynamic and modern web applications.

Overview: Fetcher MCP leverages Playwright to execute JavaScript, enabling it to interact with dynamic content on web pages. By intelligently extracting the primary content and providing flexible output options, it enhances the web scraping experience with efficiency and reliability.

Key Features:

  • Intelligent Content Extraction: Automatically removes non-essential elements, focusing on the main content of web pages, thereby simplifying downstream data processing.

  • Parallel Processing: Allows for concurrent fetching of multiple URLs, significantly boosting efficiency for large-scale data retrieval operations.

  • Configurable Parameters: Offers extensive customization options for timeouts, output formats, and content extraction techniques to adapt to various web scraping scenarios.


Fetcher MCP

MCP server for fetching web page content using the Playwright headless browser.

Advantages

  • JavaScript Support: Executes JavaScript to handle dynamic web content and modern web applications.
  • Intelligent Content Extraction: Built-in Readability algorithm extracts the main content, removing ads, navigation, and other non-essential elements.
  • Flexible Output Format: Supports both HTML and Markdown output formats.
  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs for improved efficiency.
  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage.
  • Robust Error Handling: Comprehensive error handling and logging for reliable operation.
  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup — install the required browser:

npx playwright install chromium

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Features

fetch_url

Retrieve web page content from a specified URL.

  • Uses Playwright headless browser to parse JavaScript.
  • Supports intelligent extraction of main content and conversion to Markdown.

Parameters:

  • url (required): The URL to fetch.
  • timeout: Page loading timeout in milliseconds, default 30000 (30s).
  • waitUntil: When navigation is considered complete; options: 'load', 'domcontentloaded', 'networkidle', 'commit'; default 'load'.
  • extractContent: Whether to extract the main content, default true.
  • maxLength: Maximum content length in characters, no limit by default.
  • returnHtml: Return HTML instead of Markdown, default false.
  • waitForNavigation: Wait for additional navigation after load, useful for anti-bot verification, default false.
  • navigationTimeout: Maximum wait for additional navigation in ms, default 10000.
  • disableMedia: Disable media resources to save bandwidth, default true.
  • debug: Enable debug mode for this request, overrides server flag.

fetch_urls

Batch retrieval of web page content from multiple URLs in parallel.

  • Uses multi-tab parallel fetching.
  • Returns combined results separated clearly.

Parameters:

  • urls (required): Array of URLs.
  • Other parameters same as fetch_url.

Tips

Handling Special Website Scenarios

  • Anti-Crawler Mechanisms
    Use waitForNavigation: true to wait for full page loading (useful for CAPTCHA, redirects).
  • Increase Timeout
    Set longer timeouts (e.g., 60000 ms) for slow-loading pages.
  • Preserve Original HTML
    If extraction fails, set extractContent: false and returnHtml: true.
  • Fetch Complete Content
    To get all page content, set extractContent: false.
  • Return HTML Format
    Set returnHtml: true to get HTML instead of Markdown.

Debugging and Authentication

  • Enable Debug Mode Per Request
    Set debug: true in fetch parameters to show the browser window.
  • Manual Login
    Run with --debug or debug: true to manually log in to a website before fetching.
  • Debug Browser Interaction
    When debug is enabled, the browser window stays open for manual login before fetching content.

Development

Install Dependencies

npm install

Install Playwright Browser

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector:

npm run inspector

Run server with visible browser:

node build/index.js --debug

Related Projects

  • g-search-mcp: A powerful MCP server for Google search enabling parallel multi-keyword batch operations.

License

Licensed under the MIT License