mcp-DEEPwebresearch
by: qpd-v
Enhanced MCP server for deep web research
📌Overview
Purpose: The MCP Deep Web Research Server aims to facilitate advanced web research by integrating intelligent search and content extraction capabilities.
Overview: This server is a specialized Model Context Protocol (MCP) server designed for real-time web research, focusing on efficient content extraction and insightful data analysis. It is a fork of an existing project, enhanced to better navigate the complexities of deep web information retrieval.
Key Features:
-
Intelligent Search Queue System: Manages batch search operations with built-in rate limiting, progress tracking, and automatic retries to ensure efficient and organized web research.
-
Enhanced Content Extraction: Employs TF-IDF scoring, keyword proximity analysis, and readability scoring, enabling thorough analysis and clean formatting of extracted web content.
MCP Deep Web Research Server (v0.3.0)
A Model Context Protocol (MCP) server for advanced web research.
Latest Changes
- Added visit_page tool for direct webpage content extraction
- Optimized performance to work within MCP timeout limits
- Reduced default maxDepth and maxBranching parameters
- Improved page loading efficiency
- Added timeout checks throughout the process
- Enhanced error handling for timeouts
This project is a fork of mcp-webresearch by mzxrai, enhanced with additional features for deep web research capabilities. We're grateful to the original creators for their foundational work.
Bring real-time info into Claude with intelligent search queuing, enhanced content extraction, and deep research capabilities.
Features
-
Intelligent Search Queue System
- Batch search operations with rate limiting
- Queue management with progress tracking
- Error recovery and automatic retries
- Search result deduplication
-
Enhanced Content Extraction
- TF-IDF based relevance scoring
- Keyword proximity analysis
- Content section weighting
- Readability scoring
- Improved HTML structure parsing
- Structured data extraction
- Better content cleaning and formatting
-
Core Features
- Google search integration
- Webpage content extraction
- Research session tracking
- Markdown conversion with improved formatting
Prerequisites
- Node.js >= 18 (includes
npm
andnpx
) - Claude Desktop app
Installation
Global Installation (Recommended)
# Install globally using npm
npm install -g mcp-deepwebresearch
# Or using yarn
yarn global add mcp-deepwebresearch
# Or using pnpm
pnpm add -g mcp-deepwebresearch
Local Project Installation
# Using npm
npm install mcp-deepwebresearch
# Using yarn
yarn add mcp-deepwebresearch
# Using pnpm
pnpm add mcp-deepwebresearch
Claude Desktop Integration
After installing the package, add this entry to your claude_desktop_config.json
:
Windows
{
"mcpServers": {
"deepwebresearch": {
"command": "mcp-deepwebresearch",
"args": []
}
}
}
Location: %APPDATA%\Claude\claude_desktop_config.json
macOS
{
"mcpServers": {
"deepwebresearch": {
"command": "mcp-deepwebresearch",
"args": []
}
}
}
Location: ~/Library/Application Support/Claude/claude_desktop_config.json
This config allows Claude Desktop to automatically start the web research MCP server when needed.
First-time Setup
After installation, run this command to install required browser dependencies:
npx playwright install chromium
Usage
Start a chat with Claude and send a prompt that would benefit from web research. For a prebuilt prompt customized for deeper web research, use the agentic-research
prompt available in Claude Desktop via the Paperclip icon → Choose an integration → deepwebresearch → agentic-research.
Tools
-
deep_research
Performs comprehensive research with content analysis.
Arguments:
{ topic: string; maxDepth?: number; // default: 2 maxBranching?: number; // default: 3 timeout?: number; // default: 55000 (55 seconds) minRelevanceScore?: number; // default: 0.7 }
Returns:
{ findings: { mainTopics: Array<{name: string, importance: number}>; keyInsights: Array<{text: string, confidence: number}>; sources: Array<{url: string, credibilityScore: number}>; }; progress: { completedSteps: number; totalSteps: number; processedUrls: number; }; timing: { started: string; completed?: string; duration?: number; operations?: { parallelSearch?: number; deduplication?: number; topResultsProcessing?: number; remainingResultsProcessing?: number; total?: number; }; }; }
-
parallel_search
Performs multiple Google searches in parallel with intelligent queuing.
Arguments:
{ queries: string[], maxParallel?: number }
Note: maxParallel is limited to 5 to ensure reliable performance.
-
visit_page
Visit a webpage and extract its content.
Arguments:
{ url: string }
Returns:
{ url: string; title: string; content: string; // Markdown formatted content }
Prompts
agentic-research
A guided research prompt that helps Claude conduct thorough web research. The prompt instructs Claude to:
- Start with broad searches to understand the topic landscape
- Prioritize high-quality, authoritative sources
- Iteratively refine the research direction based on findings
- Keep you informed and let you guide the research interactively
- Always cite sources with URLs
Configuration Options
The server can be configured through environment variables:
MAX_PARALLEL_SEARCHES
: Maximum number of concurrent searches (default: 5)SEARCH_DELAY_MS
: Delay between searches in milliseconds (default: 200)MAX_RETRIES
: Number of retry attempts for failed requests (default: 3)TIMEOUT_MS
: Request timeout in milliseconds (default: 55000)LOG_LEVEL
: Logging level (default: 'info')
Error Handling
Common Issues
-
Rate Limiting
Symptom: "Too many requests" error
Solution: IncreaseSEARCH_DELAY_MS
or decreaseMAX_PARALLEL_SEARCHES
-
Network Timeouts
Symptom: "Request timed out" error
Solution: Ensure requests complete within the 60-second MCP timeout -
Browser Issues
Symptom: "Browser failed to launch" error
Solution: Ensure Playwright is properly installed (npx playwright install
)
Debugging
This is beta software. If you run into issues:
-
Check Claude Desktop's MCP logs:
# On macOS tail -n 20 -f ~/Library/Logs/Claude/mcp*.log # On Windows (PowerShell) Get-Content -Path "$env:APPDATA\Claude\logs\mcp*.log" -Tail 20 -Wait
-
Enable debug logging:
export LOG_LEVEL=debug
Development
Setup
pnpm install
pnpm build
pnpm watch
pnpm dev
Testing
pnpm test
pnpm test:watch
pnpm test:coverage
Code Quality
pnpm lint
pnpm lint:fix
pnpm type-check
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Coding Standards
- Follow TypeScript best practices
- Maintain test coverage above 80%
- Document new features and APIs
- Update CHANGELOG.md for significant changes
- Follow semantic versioning
Performance Considerations
- Use batch operations where possible
- Implement proper error handling and retries
- Consider memory usage with large datasets
- Cache results when appropriate
- Use streaming for large content
Requirements
- Node.js >= 18
- Playwright (automatically installed as a dependency)
Verified Platforms
- macOS
- Windows
License
MIT
Credits
This project builds upon the excellent work of mcp-webresearch by mzxrai. The original codebase provided the foundation for our enhanced features and capabilities.
Author
qpd-v