MCP HubMCP Hub
jae-jae

fetcher-mcp

by: jae-jae

MCP server for fetch web page content using Playwright headless browser.

589created 19/03/2025
Visit
Playwright
web-scraping

📌Overview

Purpose: The Fetcher MCP framework aims to facilitate the retrieval of web page content using a headless browser, specifically designed for dynamic and modern web applications.

Overview: Fetcher MCP leverages Playwright to execute JavaScript, enabling it to interact with dynamic content on web pages. By intelligently extracting the primary content and providing flexible output options, it enhances the web scraping experience with efficiency and reliability.

Key Features:

  • Intelligent Content Extraction: Automatically removes non-essential elements, focusing on the main content of web pages, thereby simplifying downstream data processing.

  • Parallel Processing: Allows for concurrent fetching of multiple URLs, significantly boosting efficiency for large-scale data retrieval operations.

  • Configurable Parameters: Offers extensive customization options for timeouts, output formats, and content extraction techniques to adapt to various web scraping scenarios.


Fetcher MCP

MCP server for fetching web page content using the Playwright headless browser.

Advantages

  • JavaScript Support: Handles dynamic web content and modern web applications.
  • Intelligent Content Extraction: Built-in Readability algorithm extracts main content while removing ads and non-essential elements.
  • Flexible Output Format: Supports both HTML and Markdown output formats.
  • Parallel Processing: Concurrently fetches multiple URLs for improved efficiency.
  • Resource Optimization: Blocks unnecessary resources to reduce bandwidth usage.
  • Robust Error Handling: Ensures reliable operation with comprehensive error handling and logging.
  • Configurable Parameters: Provides control over timeouts, content extraction, and output formatting.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

Debug Mode

Enable debugging with the --debug option:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

  • On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Features

fetch_url

Retrieve web page content from a specified URL using Playwright.

Parameters:

  • url: The URL of the web page (required).
  • timeout: Page loading timeout in milliseconds (default: 30000).
  • waitUntil: When navigation is complete (options: 'load', 'domcontentloaded', 'networkidle', 'commit', default: 'load').
  • extractContent: Extract main content (default: true).
  • maxLength: Maximum length of returned content (default: no limit).
  • returnHtml: Return HTML content instead of Markdown (default: false).
  • waitForNavigation: Wait for additional navigation (default: false).
  • navigationTimeout: Maximum time to wait for navigation (default: 10000).
  • disableMedia: Disable media resources (default: true).
  • debug: Enable debug mode (overrides command line flag).

fetch_urls

Batch retrieve web page content from multiple URLs in parallel.

Parameters:

  • urls: Array of URLs to fetch (required).
  • Other parameters are the same as fetch_url.

Tips

Handling Special Website Scenarios

  • Anti-Crawler Mechanisms: Use parameters to manage loading times and content extraction.
  • Preserve Original HTML Structure: Request to preserve HTML content if extraction fails.
  • Return Content as HTML: Specify if HTML format is needed instead of Markdown.

Debugging and Authentication

  • Enable Debug Mode: Activate debug mode for specific fetch operations.
  • Custom Cookies for Authentication: Allow manual login through debug mode.

Development

Install Dependencies

npm install

Install Playwright Browser

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

Enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

  • g-search-mcp: MCP server for Google search enabling batch operations.

License

Licensed under the MIT License