MCP HubMCP Hub
qpd-v

mcp-DEEPwebresearch

by: qpd-v

Enhanced MCP server for deep web research

51created 13/01/2025
Visit
web
research

📌Overview

Purpose: The MCP Deep Web Research Server aims to facilitate advanced web research by integrating intelligent search and content extraction capabilities.

Overview: This server is a specialized Model Context Protocol (MCP) server designed for real-time web research, focusing on efficient content extraction and insightful data analysis. It is a fork of an existing project, enhanced to better navigate the complexities of deep web information retrieval.

Key Features:

  • Intelligent Search Queue System: Manages batch search operations with built-in rate limiting, progress tracking, and automatic retries to ensure efficient and organized web research.

  • Enhanced Content Extraction: Employs TF-IDF scoring, keyword proximity analysis, and readability scoring, enabling thorough analysis and clean formatting of extracted web content.


MCP Deep Web Research Server (v0.3.0)

A Model Context Protocol (MCP) server for advanced web research.

Latest Changes

  • Added visit_page tool for direct webpage content extraction
  • Optimized performance within MCP timeout limits
    • Reduced default maxDepth and maxBranching parameters
    • Improved page loading efficiency
    • Enhanced error handling for timeouts

This project is a fork of mcp-webresearch, enhanced for deep web research.

Features

  • Intelligent Search Queue System

    • Batch search operations with rate limiting
    • Queue management with progress tracking
    • Error recovery and automatic retries
  • Enhanced Content Extraction

    • TF-IDF based relevance scoring
    • Keyword proximity analysis
    • Readability scoring
    • Improved HTML structure parsing
  • Core Features

    • Google search integration
    • Research session tracking
    • Markdown conversion

Prerequisites

Installation

Global Installation (Recommended)

npm install -g mcp-deepwebresearch

Local Project Installation

npm install mcp-deepwebresearch

Claude Desktop Integration

Add the following entry to your claude_desktop_config.json:

Windows

{
  "mcpServers": {
    "deepwebresearch": {
      "command": "mcp-deepwebresearch",
      "args": []
    }
  }
}

macOS

{
  "mcpServers": {
    "deepwebresearch": {
      "command": "mcp-deepwebresearch",
      "args": []
    }
  }
}

First-time Setup

After installation, run:

npx playwright install chromium

Usage

Start a chat with Claude and send a prompt for web research. Use the agentic-research prompt for deeper web research.

Tools

  1. deep_research

    • Performs comprehensive research.
    • Returns findings, progress, and timing.
  2. parallel_search

    • Multiple Google searches in parallel.
  3. visit_page

    • Extract content from a webpage.

Prompts

agentic-research

Guided prompt for thorough web research with instructions on topic exploration and citation.

Configuration Options

The server can be configured through environment variables:

  • MAX_PARALLEL_SEARCHES: Default 5
  • SEARCH_DELAY_MS: Default 200
  • MAX_RETRIES: Default 3
  • TIMEOUT_MS: Default 55000
  • LOG_LEVEL: Default 'info'

Error Handling

Common Issues

  1. Rate Limiting

    • Solution: Adjust SEARCH_DELAY_MS or MAX_PARALLEL_SEARCHES.
  2. Network Timeouts

    • Solution: Ensure requests complete within the timeout.
  3. Browser Issues

    • Solution: Check Playwright installation.

Development

Setup

pnpm install
pnpm build

Testing

pnpm test

Code Quality

pnpm lint

Contributing

  1. Fork the repository.
  2. Create a feature branch.
  3. Commit changes.
  4. Push to the branch.
  5. Open a Pull Request.

Coding Standards

  • Follow TypeScript best practices.
  • Document new features and APIs.

Requirements

  • Node.js >= 18
  • Playwright

Verified Platforms

  • macOS
  • Windows

License

MIT

Author

qpd-v