MCP HubMCP Hub
web-agent-master

google-search

by: web-agent-master

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.

210created 04/03/2025
Visit
Playwright
Node.js

πŸ“ŒOverview

Purpose: To provide a Playwright-based Node.js tool that allows users to execute Google searches directly and bypass anti-scraping mechanisms seamlessly.

Overview: This tool enables local execution of Google searches, functioning either as a command-line interface or as a Model Context Protocol (MCP) server. It offers real-time search capabilities that can be integrated into AI assistants, ensuring that results are retrieved efficiently and without restrictions imposed by external APIs.

Key Features:

  • Local SERP API Alternative: Eliminates the need for paid API services by executing searches directly on the user's machine.

  • Advanced Anti-Bot Detection Bypass Techniques: Employs intelligent methods like browser fingerprint management and state preservation to mimic real user behavior, minimizing the risk of being blocked by search engines.

  • MCP Server Integration: Integrates smoothly with AI assistants such as Claude, providing them with immediate access to search functionalities without requiring additional API keys.

  • Completely Open Source and Free: Fully customizable and extensible, allowing users to modify the code per their needs without any usage restrictions.


Google Search Tool

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches and extract results. It can be used directly as a command-line tool or as a Model Context Protocol (MCP) server to provide real-time search capabilities to AI assistants like Claude.

Key Features

  • Local SERP API alternative: no need for paid API services; searches executed locally.
  • Advanced anti-bot detection bypass:
    • Intelligent browser fingerprint management simulating real user behavior.
    • Automatic saving and restoration of browser state to reduce verification frequency.
    • Automatic headless/headed mode switching.
    • Randomization of device and locale settings.
  • Raw HTML retrieval of search result pages for analysis and debugging.
  • Automatic full-page screenshot capture when saving HTML content.
  • MCP server integration to support AI assistants like Claude without additional API keys.
  • Completely open source and free, with no usage restrictions.

Technical Features

  • Developed with TypeScript for type safety.
  • Browser automation based on Playwright supporting multiple browsers.
  • Command-line support for search keywords.
  • MCP server support for AI integration.
  • Returns search results with title, link, and snippet.
  • Option to retrieve raw HTML of result pages.
  • JSON format output.
  • Supports both headless and headed modes.
  • Detailed logging and robust error handling.
  • Browser state saving and restoration to avoid anti-bot detection.

Installation

git clone https://github.com/web-agent-master/google-search.git
cd google-search

# Install dependencies
npm install
# or yarn
yarn
# or pnpm
pnpm install

# Compile TypeScript
npm run build
# or yarn
yarn build
# or pnpm
pnpm build

# Link package globally (required for MCP functionality)
npm link
# or yarn
yarn link
# or pnpm
pnpm link

Windows Environment Notes

  • .cmd files included for Windows Command Prompt and PowerShell compatibility.
  • Log files stored in system temp directory (instead of Unix /tmp).
  • Windows-specific process signal handling ensures proper server shutdown.
  • Cross-platform file path handling supports Windows separators.

Usage

Command Line Tool

# Basic search
google-search "search keywords"

# With options
google-search --limit 5 --timeout 60000 --no-headless "search keywords"

# Using npx
npx google-search-cli "search keywords"

# Development mode
pnpm dev "search keywords"

# Debug mode (shows browser)
pnpm debug "search keywords"

# Get raw HTML of search results
google-search "search keywords" --get-html

# Get and save HTML
google-search "search keywords" --get-html --save-html

# Get and save HTML to specific file
google-search "search keywords" --get-html --save-html --html-output "./output.html"

Command Line Options

  • -l, --limit <number>: Limit number of results (default: 10)
  • -t, --timeout <number>: Timeout in ms (default: 60000)
  • --no-headless: Show browser UI (for debugging)
  • --remote-debugging-port <number>: Remote debugging port (default: 9222)
  • --state-file <path>: Browser state file (default: ./browser-state.json)
  • --no-save-state: Do not save browser state
  • --get-html: Get raw HTML instead of parsed results
  • --save-html: Save HTML to file (with --get-html)
  • --html-output <path>: Specify HTML output file (with --get-html and --save-html)
  • -V, --version: Show version
  • -h, --help: Show help

Output Example

{
  "query": "deepseek",
  "results": [
    {
      "title": "DeepSeek",
      "link": "https://www.deepseek.com/",
      "snippet": "DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1..."
    },
    {
      "title": "deepseek-ai/DeepSeek-V3",
      "link": "https://github.com/deepseek-ai/DeepSeek-V3",
      "snippet": "We present DeepSeek-V3, a strong Mixture-of-Experts language model."
    }
  ]
}

HTML Output Example

With --get-html:

{
  "query": "playwright automation",
  "url": "https://www.google.com/",
  "originalHtmlLength": 1291733,
  "cleanedHtmlLength": 456789,
  "htmlPreview": "<!DOCTYPE html><html lang=\"zh-CN\">..."
}

With --get-html --save-html:

{
  "query": "playwright automation",
  "url": "https://www.google.com/",
  "originalHtmlLength": 1292241,
  "cleanedHtmlLength": 458976,
  "savedPath": "./google-search-html/playwright_automation-2025-04-06T03-30-06-852Z.html",
  "screenshotPath": "./google-search-html/playwright_automation-2025-04-06T03-30-06-852Z.png",
  "htmlPreview": "<!DOCTYPE html><html lang=\"zh-CN\">..."
}

MCP Server

Enables AI assistants like Claude to use Google search in real time.

pnpm build

Integration with Claude Desktop

  1. Edit Claude Desktop config file:
  • Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json (e.g., C:\Users\username\AppData\Roaming\Claude\claude_desktop_config.json)
  1. Add MCP server configuration and restart Claude.

Example config:

{
  "mcpServers": {
    "google-search": {
      "command": "npx",
      "args": ["google-search-mcp"]
    }
  }
}

Windows alternatives:

Using cmd.exe with npx:

{
  "mcpServers": {
    "google-search": {
      "command": "cmd.exe",
      "args": ["/c", "npx", "google-search-mcp"]
    }
  }
}

Using node with full script path:

{
  "mcpServers": {
    "google-search": {
      "command": "node",
      "args": ["C:/path/to/your/google-search/dist/src/mcp-server.js"]
    }
  }
}

(Replace path with actual install location.)

After setup, you can request searches in Claude, e.g., "search for the latest AI research".

Project Structure

google-search/
β”œβ”€β”€ package.json         # Configuration and dependencies
β”œβ”€β”€ tsconfig.json        # TypeScript config
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ index.ts         # Entry file
β”‚   β”œβ”€β”€ search.ts        # Search implementation with Playwright
β”‚   β”œβ”€β”€ mcp-server.ts    # MCP server
β”‚   └── types.ts         # Type definitions
β”œβ”€β”€ dist/                # Compiled JS files
β”œβ”€β”€ bin/
β”‚   └── google-search    # CLI entry script
β”œβ”€β”€ README.md            # Documentation
└── .gitignore           # Git ignore file

Technology Stack

  • TypeScript for development
  • Node.js runtime
  • Playwright for browser automation
  • Commander for CLI argument parsing
  • Model Context Protocol (MCP) for AI integration
  • MCP SDK for server implementation
  • Zod for schema validation
  • pnpm for package management

Development Guide

Run in project root:

pnpm install              # Install dependencies
pnpm run postinstall      # Install Playwright browsers
pnpm build                # Compile TypeScript
pnpm clean                # Clean compiled output

CLI Development

pnpm dev "search keywords"    # Development mode
pnpm debug "search keywords"  # Debug mode with UI
pnpm start "search keywords"  # Run compiled
pnpm test                     # Run tests

MCP Server Development

pnpm mcp           # Run MCP server in dev mode
pnpm mcp:build     # Run compiled MCP server

Error Handling

  • Friendly messages on browser startup failure
  • Automatic error return for network issues
  • Detailed logs on parsing failures
  • Graceful exit with info on timeout

Notes

General

  • For learning and research only.
  • Comply with Google's terms and policies.
  • Avoid frequent requests to prevent blocking.
  • Some regions may require proxy to access Google.
  • Playwright browsers auto-download on first use.

State Files

  • Contain cookies and storage data; keep secure.
  • Improve search success and reduce bot verification.

MCP Server

  • Requires Node.js v16 or higher.
  • Use absolute paths when configuring Claude Desktop MCP server.
  • Ensure Claude Desktop is updated.

Windows-Specific

  • May require admin rights to install Playwright browsers.
  • Run terminal as administrator if permission issues arise.
  • Allow Windows Firewall access for browsers.
  • Browser state saved in user home directory as .google-search-browser-state.json.
  • Logs stored in system temp directory under google-search-logs.

Comparison with Commercial SERP APIs

Advantages over paid APIs (e.g., SerpAPI):

  • Completely free, no API fees.
  • Local execution with no third-party dependencies.
  • Protects privacy β€” no query logging.
  • Fully open source, customizable and extensible.
  • No usage limits or frequency restrictions.
  • Native MCP integration for AI assistants like Claude.