📌Overview

Purpose: Crawl4AI is designed to facilitate intelligent web crawling and AI-powered content analysis, providing users with in-depth analysis of specific web content.

Overview: The Crawl4AI MCP Server offers a robust framework for targeted web crawling, enabling users to extract, analyze, and process content using Claude AI models. Tailored for researchers, content creators, and business analysts, it distinguishes itself from general search engines by delivering precise insights into selected websites.

Key Features:

Customizable Web Crawling: Allows users to define crawling depth and content selectors, ensuring focused data extraction specific to user needs.
AI-Powered Content Analysis: Utilizes Claude AI models to perform various tasks such as summarization, fact extraction, and deeper content analysis, enhancing data insights.

Crawl4AI MCP Server

This is not a functional MCP; it was a learning tool for future development.

Crawl4AI MCP Server is a Model-Controller-Processor server designed for intelligent web crawling and AI-powered content analysis. It provides a simple API for crawling websites and processing the content using Claude AI models.

Who Benefits from Crawl4AI?

Crawl4AI is designed for individuals and organizations needing targeted, in-depth analysis of specific web content. Unlike general search engines or AI assistants that provide broad coverage, Crawl4AI offers deeper insights into content you specifically want to analyze.

Ideal for:

Researchers extracting structured information from specific websites or academic resources
Content creators analyzing competitor content or industry trends within specific domains
Data analysts processing web data for business intelligence
Developers building applications requiring web content analysis capabilities
Digital marketers analyzing industry websites, blogs, or competitor content
Business analysts gathering industry-specific information from multiple sources
Knowledge workers synthesizing information from specific web domains

User Benefits

Targeted depth over breadth: Comprehensive analysis of specific websites that matter to you
Customizable crawling: Control crawl depth, content extraction, and processing parameters
Programmatic integration: Easily incorporate analysis into applications and workflows
Flexible AI processing: Summarize, extract facts, analyze deeply, or generate questions
Privacy and control: Run the server locally to keep searches internal
Cost efficiency: Use your own Claude API key with control over token usage
Automation potential: Schedule regular crawls to track content changes over time
Customized AI prompting: Tailor AI analysis to your specific needs
Content transformation: Turn unstructured web data into structured, actionable insights

Crawl4AI bridges simple web scraping and sophisticated AI analysis for meaningful extraction of web insights.

Features

Web crawling with customizable depth and content selectors
Respects robots.txt directives
Content extraction and AI-powered processing with Claude models
Simple REST API interface
Configurable via command line or environment variables
Detailed logging

Installation

Clone this repository:

git clone https://github.com/yourusername/crawl4ai-mcp.git
cd crawl4ai-mcp

Install dependencies:
```
npm install
```
Create a .env file and add your Anthropic API key:
```
ANTHROPIC_API_KEY=your_api_key_here
```

Usage

Starting the Server

Start server with default settings:

npm start

Or with options:

npm start -- --port 4000 --debug

Available options:

--port <number>: Server port (default: 3000)
--debug: Enable debug logging

API Endpoints

Crawl a Website

POST /api/crawl

Request body example:

{
  "url": "https://example.com",
  "depth": 2,
  "selector": "main",
  "aiProcessing": {
    "task": "summarize",
    "model": "claude-3-sonnet-20240229"
  }
}

Parameters:

url (required): URL to start crawling from
depth (optional): How many levels deep to crawl (default: 1)
selector (optional): CSS selector for content extraction (default: "body")
aiProcessing (optional): AI processing configuration
- task: Processing type: summarize, extract, analyze, questions
- model: Claude model to use (default: "claude-3-sonnet-20240229")

Health Check

GET /api/healthcheck

Returns server status and version.

AI Processing Tasks

summarize: Generate a summary of the crawled content
extract: Extract factual information
analyze: Deep analysis of content, arguments, and quality
questions: Generate questions and answers based on content

Configuration

Configure the server using environment variables:

PORT: Server port (default: 3000)
ANTHROPIC_API_KEY: Your Anthropic API key
DEBUG: Set to "true" to enable debug logging

Example

Crawl a website and summarize its content with curl:

curl -X POST http://localhost:3000/api/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "depth": 1,
    "aiProcessing": {
      "task": "summarize"
    }
  }'

License

MIT License

Acknowledgements

This project uses the following libraries:

Express
Puppeteer
Cheerio
Winston
@anthropic-ai/sdk

Crawl4AI-MCP