omniparser-autogui-mcp
by: NON906
Automatic operation of on-screen GUI.
📌Overview
Purpose: To analyze the screen using OmniParser and automate GUI operations through an MCP server, specifically confirmed for use on Windows platforms.
Overview: The omniparser-autogui-mcp serves as a bridge between screen analysis and automatic GUI interaction. Utilizing the capabilities of the OmniParser, it provides a seamless experience for users looking to automate tasks via screen content recognition and interaction.
Key Features:
-
Screen Analysis: Leverages OmniParser to accurately analyze content on the screen, enabling automated responses and actions based on real-time data.
-
GUI Automation: Automatically operates graphical user interfaces, allowing users to streamline repetitive tasks and enhance productivity without manual input.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, excluding submodules and sub-packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (see the OmniParser repository for details).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
- (On non-Windows systems, use
export
instead ofset
.) - (To enable
langchain_example.py
, runuv sync --extra langchain
instead.)
- Add this to your
claude_desktop_config.json
:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
- Replace
D:\\CLONED_PATH\\omniparser-autogui-mcp
with your cloned directory path.
Environment variables for additional configuration
-
OMNI_PARSER_BACKEND_LOAD
Set to1
if it does not work with other clients (such as LibreChat). -
TARGET_WINDOW_NAME
Specify the window name to operate on a specific window. If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
Specify the address and port (e.g.,127.0.0.1:8000
) if OmniParser processing is done on another device.
The server can be started withuv run omniparserserver
. -
SSE_HOST
,SSE_PORT
If specified, communication uses SSE instead of stdio. -
SOM_MODEL_PATH
,CAPTION_MODEL_NAME
,CAPTION_MODEL_PATH
,OMNI_PARSER_DEVICE
,BOX_TRESHOLD
OmniParser-specific configurations; usually not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.