google-search
by: web-agent-master
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.
πOverview
Purpose: To provide a Playwright-based Node.js tool that allows users to execute Google searches directly and bypass anti-scraping mechanisms seamlessly.
Overview: This tool enables local execution of Google searches, functioning either as a command-line interface or as a Model Context Protocol (MCP) server. It offers real-time search capabilities that can be integrated into AI assistants, ensuring that results are retrieved efficiently and without restrictions imposed by external APIs.
Key Features:
-
Local SERP API Alternative: Eliminates the need for paid API services by executing searches directly on the user's machine.
-
Advanced Anti-Bot Detection Bypass Techniques: Employs intelligent methods like browser fingerprint management and state preservation to mimic real user behavior, minimizing the risk of being blocked by search engines.
-
MCP Server Integration: Integrates smoothly with AI assistants such as Claude, providing them with immediate access to search functionalities without requiring additional API keys.
-
Completely Open Source and Free: Fully customizable and extensible, allowing users to modify the code per their needs without any usage restrictions.
Google Search Tool
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches and extract results. It can be used directly as a command-line tool or as a Model Context Protocol (MCP) server to provide real-time search capabilities to AI assistants like Claude.
Key Features
- Local SERP API alternative: no need for paid API services; searches executed locally.
- Advanced anti-bot detection bypass:
- Intelligent browser fingerprint management simulating real user behavior.
- Automatic saving and restoration of browser state to reduce verification frequency.
- Automatic headless/headed mode switching.
- Randomization of device and locale settings.
- Raw HTML retrieval of search result pages for analysis and debugging.
- Automatic full-page screenshot capture when saving HTML content.
- MCP server integration to support AI assistants like Claude without additional API keys.
- Completely open source and free, with no usage restrictions.
Technical Features
- Developed with TypeScript for type safety.
- Browser automation based on Playwright supporting multiple browsers.
- Command-line support for search keywords.
- MCP server support for AI integration.
- Returns search results with title, link, and snippet.
- Option to retrieve raw HTML of result pages.
- JSON format output.
- Supports both headless and headed modes.
- Detailed logging and robust error handling.
- Browser state saving and restoration to avoid anti-bot detection.
Installation
git clone https://github.com/web-agent-master/google-search.git
cd google-search
# Install dependencies
npm install
# or yarn
yarn
# or pnpm
pnpm install
# Compile TypeScript
npm run build
# or yarn
yarn build
# or pnpm
pnpm build
# Link package globally (required for MCP functionality)
npm link
# or yarn
yarn link
# or pnpm
pnpm link
Windows Environment Notes
.cmd
files included for Windows Command Prompt and PowerShell compatibility.- Log files stored in system temp directory (instead of Unix
/tmp
). - Windows-specific process signal handling ensures proper server shutdown.
- Cross-platform file path handling supports Windows separators.
Usage
Command Line Tool
# Basic search
google-search "search keywords"
# With options
google-search --limit 5 --timeout 60000 --no-headless "search keywords"
# Using npx
npx google-search-cli "search keywords"
# Development mode
pnpm dev "search keywords"
# Debug mode (shows browser)
pnpm debug "search keywords"
# Get raw HTML of search results
google-search "search keywords" --get-html
# Get and save HTML
google-search "search keywords" --get-html --save-html
# Get and save HTML to specific file
google-search "search keywords" --get-html --save-html --html-output "./output.html"
Command Line Options
-l, --limit <number>
: Limit number of results (default: 10)-t, --timeout <number>
: Timeout in ms (default: 60000)--no-headless
: Show browser UI (for debugging)--remote-debugging-port <number>
: Remote debugging port (default: 9222)--state-file <path>
: Browser state file (default: ./browser-state.json)--no-save-state
: Do not save browser state--get-html
: Get raw HTML instead of parsed results--save-html
: Save HTML to file (with--get-html
)--html-output <path>
: Specify HTML output file (with--get-html
and--save-html
)-V, --version
: Show version-h, --help
: Show help
Output Example
{
"query": "deepseek",
"results": [
{
"title": "DeepSeek",
"link": "https://www.deepseek.com/",
"snippet": "DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1..."
},
{
"title": "deepseek-ai/DeepSeek-V3",
"link": "https://github.com/deepseek-ai/DeepSeek-V3",
"snippet": "We present DeepSeek-V3, a strong Mixture-of-Experts language model."
}
]
}
HTML Output Example
With --get-html
:
{
"query": "playwright automation",
"url": "https://www.google.com/",
"originalHtmlLength": 1291733,
"cleanedHtmlLength": 456789,
"htmlPreview": "<!DOCTYPE html><html lang=\"zh-CN\">..."
}
With --get-html --save-html
:
{
"query": "playwright automation",
"url": "https://www.google.com/",
"originalHtmlLength": 1292241,
"cleanedHtmlLength": 458976,
"savedPath": "./google-search-html/playwright_automation-2025-04-06T03-30-06-852Z.html",
"screenshotPath": "./google-search-html/playwright_automation-2025-04-06T03-30-06-852Z.png",
"htmlPreview": "<!DOCTYPE html><html lang=\"zh-CN\">..."
}
MCP Server
Enables AI assistants like Claude to use Google search in real time.
pnpm build
Integration with Claude Desktop
- Edit Claude Desktop config file:
- Mac:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
(e.g.,C:\Users\username\AppData\Roaming\Claude\claude_desktop_config.json
)
- Add MCP server configuration and restart Claude.
Example config:
{
"mcpServers": {
"google-search": {
"command": "npx",
"args": ["google-search-mcp"]
}
}
}
Windows alternatives:
Using cmd.exe
with npx:
{
"mcpServers": {
"google-search": {
"command": "cmd.exe",
"args": ["/c", "npx", "google-search-mcp"]
}
}
}
Using node with full script path:
{
"mcpServers": {
"google-search": {
"command": "node",
"args": ["C:/path/to/your/google-search/dist/src/mcp-server.js"]
}
}
}
(Replace path with actual install location.)
After setup, you can request searches in Claude, e.g., "search for the latest AI research".
Project Structure
google-search/
βββ package.json # Configuration and dependencies
βββ tsconfig.json # TypeScript config
βββ src/
β βββ index.ts # Entry file
β βββ search.ts # Search implementation with Playwright
β βββ mcp-server.ts # MCP server
β βββ types.ts # Type definitions
βββ dist/ # Compiled JS files
βββ bin/
β βββ google-search # CLI entry script
βββ README.md # Documentation
βββ .gitignore # Git ignore file
Technology Stack
- TypeScript for development
- Node.js runtime
- Playwright for browser automation
- Commander for CLI argument parsing
- Model Context Protocol (MCP) for AI integration
- MCP SDK for server implementation
- Zod for schema validation
- pnpm for package management
Development Guide
Run in project root:
pnpm install # Install dependencies
pnpm run postinstall # Install Playwright browsers
pnpm build # Compile TypeScript
pnpm clean # Clean compiled output
CLI Development
pnpm dev "search keywords" # Development mode
pnpm debug "search keywords" # Debug mode with UI
pnpm start "search keywords" # Run compiled
pnpm test # Run tests
MCP Server Development
pnpm mcp # Run MCP server in dev mode
pnpm mcp:build # Run compiled MCP server
Error Handling
- Friendly messages on browser startup failure
- Automatic error return for network issues
- Detailed logs on parsing failures
- Graceful exit with info on timeout
Notes
General
- For learning and research only.
- Comply with Google's terms and policies.
- Avoid frequent requests to prevent blocking.
- Some regions may require proxy to access Google.
- Playwright browsers auto-download on first use.
State Files
- Contain cookies and storage data; keep secure.
- Improve search success and reduce bot verification.
MCP Server
- Requires Node.js v16 or higher.
- Use absolute paths when configuring Claude Desktop MCP server.
- Ensure Claude Desktop is updated.
Windows-Specific
- May require admin rights to install Playwright browsers.
- Run terminal as administrator if permission issues arise.
- Allow Windows Firewall access for browsers.
- Browser state saved in user home directory as
.google-search-browser-state.json
. - Logs stored in system temp directory under
google-search-logs
.
Comparison with Commercial SERP APIs
Advantages over paid APIs (e.g., SerpAPI):
- Completely free, no API fees.
- Local execution with no third-party dependencies.
- Protects privacy β no query logging.
- Fully open source, customizable and extensible.
- No usage limits or frequency restrictions.
- Native MCP integration for AI assistants like Claude.