📌Overview

Purpose: The Crawl4AI MCP Server aims to enhance AI assistant systems by providing advanced searching capabilities and website content understanding tailored for large language models (LLMs).

Overview: This server is based on the Model Context Protocol (MCP) and offers robust features for intelligent information retrieval. It combines multi-engine search and smart content extraction to efficiently fetch and interpret internet data, converting web contents into formats optimal for LLM processing.

Key Features:

Multi-Engine Search: Supports both DuckDuckGo and Google, allowing comprehensive result acquisition without the need for API keys for the default option.
LLM-Optimized Content Extraction: Smartly filters out non-essential information, retaining critical content for enhanced understanding and application in AI systems.
Various Output Formats: Provides multiple output options including Markdown with citations for easy referencing and information traceability.
High-Performance Asynchronous Design: Built on FastMCP for efficient, scalable performance in information retrieval tasks.

Crawl4AI MCP Server

这是一个基于MCP (Model Context Protocol)的智能信息获取服务器，为AI助手系统提供强大的搜索能力和面向LLM（大语言模型）优化的网页内容理解功能。

特性

强大的多引擎搜索能力，支持DuckDuckGo和Google
面向LLM优化的网页内容提取，智能过滤非核心内容
自动识别并保留关键内容，集中信息价值
多种输出格式，支持引用溯源
基于FastMCP的高性能异步设计

安装

安装步骤

确保您的系统满足以下要求:
- Python >= 3.9
- 建议使用专门的虚拟环境

克隆仓库并进入项目目录:

git clone https://github.com/yourusername/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server

创建并激活虚拟环境:

python -m venv crawl4ai_env
source crawl4ai_env/bin/activate  # Linux/Mac
# 或
.\crawl4ai_env\Scripts\activate  # Windows

安装依赖:
```
pip install -r requirements.txt
```
安装playwright浏览器:
```
playwright install
```

通过Smithery安装

可以通过Smithery将Crawl4AI MCP的Claude桌面端服务自动配置至您本地的Claude伸展中心:

npx -y @smithery/cli install @weidwonder/crawl4ai-mcp-server --client claude

使用方法

网络搜索工具

支持多个搜索引擎:

DuckDuckGo搜索（默认，无需API密钥）
Google搜索（需要配置API密钥）

输入参数说明:

query: 搜索查询字符串
num_results: 返回结果数量（默认10）
engine: 搜索引擎选择（"duckduckgo"、"google"或"all"）

示例

# DuckDuckGo搜索（默认）
{
    "query": "python programming",
    "num_results": 5
}

# 使用Google搜索
{
    "query": "python programming",
    "num_results": 5,
    "engine": "google"
}

网页内容理解工具

提供智能内容提取和格式转换:

markdown_with_citations（默认，包含内联引用）
fit_markdown（经过LLM优化的精简内容）
raw_markdown（基础HTML转换为Markdown）

示例

{
    "url": "https://example.com",
    "format": "markdown_with_citations"
}

LLM内容优化策略

自动识别并保留文章主体和关键信息段落
噪音过滤，剔除无关内容
保留URL引用，支持信息溯源
长度优化，过滤无效片段
默认输出为对LLM理解友好的markdown_with_citations格式

配置说明

复制配置示例文件:
```
cp config_demo.json config.json
```

在config.json中配置Google搜索的API密钥:

{
    "google": {
        "api_key": "your-google-api-key",
        "cse_id": "your-google-cse-id"
    }
}

更新日志

2025.02.08: 添加搜索功能，支持DuckDuckGo和Google搜索
2025.02.07: 重构项目结构，优化依赖管理
2025.02.07: 优化内容过滤配置

许可证

MIT License

贡献

欢迎提交Issue和Pull Request!

作者

Owner: weidwonder
Coder: Claude Sonnet 3.5

致谢

感谢所有为项目做出贡献的开发者，尤其感谢Crawl4ai项目提供的优秀网页内容提取技术支持。