MCP HubMCP Hub
NON906

omniparser-autogui-mcp

by: NON906

Automatic operation of on-screen GUI.

28created 26/02/2025
Visit
GUI
automation

📌Overview

Purpose: To analyze the screen using OmniParser and automate GUI operations through an MCP server, specifically confirmed for use on Windows platforms.

Overview: The omniparser-autogui-mcp serves as a bridge between screen analysis and automatic GUI interaction. Utilizing the capabilities of the OmniParser, it provides a seamless experience for users looking to automate tasks via screen content recognition and interaction.

Key Features:

  • Screen Analysis: Leverages OmniParser to accurately analyze content on the screen, enabling automated responses and actions based on real-time data.

  • GUI Automation: Automatically operates graphical user interfaces, allowing users to streamline repetitive tasks and enhance productivity without manual input.


omniparser-autogui-mcp

(日本語版はこちら)

This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.

License notes

This is MIT license, excluding submodules and sub-packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (see the OmniParser repository for details).

Installation

  1. Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
  • (On non-Windows systems, use export instead of set.)
  • (To enable langchain_example.py, run uv sync --extra langchain instead.)
  1. Add this to your claude_desktop_config.json:
{
  "mcpServers": {
    "omniparser_autogui_mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "D:\\CLONED_PATH\\omniparser-autogui-mcp",
        "run",
        "omniparser-autogui-mcp"
      ],
      "env": {
        "PYTHONIOENCODING": "utf-8",
        "OCR_LANG": "en"
      }
    }
  }
}
  • Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with your cloned directory path.

Environment variables for additional configuration

  • OMNI_PARSER_BACKEND_LOAD
    Set to 1 if it does not work with other clients (such as LibreChat).

  • TARGET_WINDOW_NAME
    Specify the window name to operate on a specific window. If not specified, operates on the entire screen.

  • OMNI_PARSER_SERVER
    Specify the address and port (e.g., 127.0.0.1:8000) if OmniParser processing is done on another device.
    The server can be started with uv run omniparserserver.

  • SSE_HOST, SSE_PORT
    If specified, communication uses SSE instead of stdio.

  • SOM_MODEL_PATH, CAPTION_MODEL_NAME, CAPTION_MODEL_PATH, OMNI_PARSER_DEVICE, BOX_TRESHOLD
    OmniParser-specific configurations; usually not necessary.

Usage Examples

  • Search for "MCP server" in the on-screen browser.