mcp-image-recognition
by: mario-andreschak
An MCP server that provides image recognition 👀 capabilities using Anthropic and OpenAI vision APIs
📌Overview
Purpose: To provide an image recognition server that utilizes advanced vision APIs from Anthropic and OpenAI for analyzing and describing images.
Overview: The MCP Image Recognition Server is a versatile framework that allows users to process images and extract detailed descriptions using state-of-the-art AI technologies. It supports various image formats and can be configured to use different AI providers depending on the requirements.
Key Features:
-
Image Description: Utilizes Anthropic Claude Vision or OpenAI GPT-4 Vision to generate detailed descriptions of provided images.
-
Multiple Image Formats: Supports JPEG, PNG, GIF, and WebP formats, allowing flexibility in image processing.
-
Configurable Providers: Users can define primary and fallback AI providers, enhancing reliability in image recognition tasks.
-
Multiple Input Options: Accepts both Base64-encoded images and file-based image inputs for convenience.
-
Optional OCR Support: Integrates Tesseract OCR for optional text extraction from images, offering additional functionality for users needing text data.
MCP Image Recognition Server
An MCP server providing image recognition capabilities using Anthropic and OpenAI vision APIs.
Features
- Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
Requirements
- Python 3.8 or higher
- Tesseract OCR (optional, required for text extraction)
Installation
-
Clone the repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git cd mcp-image-recognition
-
Create and configure your environment file:
cp .env.example .env
-
Build the project:
build.bat
Usage
Running the Server
Spawn the server using Python:
python -m image_recognition_server.server
Or using a batch file:
run.bat server
Available Tools
-
describe_image
- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
-
describe_image_from_file
- Input: Path to an image file
- Output: Detailed description of the image
Environment Configuration
ANTHROPIC_API_KEY
: Your Anthropic API key.OPENAI_API_KEY
: Your OpenAI API key.VISION_PROVIDER
: Primary vision provider (anthropic
oropenai
).FALLBACK_PROVIDER
: Optional fallback provider.LOG_LEVEL
: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR
: Enable Tesseract OCR text extraction (true
orfalse
).OPENAI_MODEL
: OpenAI Model (default:gpt-4o-mini
).OPENAI_BASE_URL
: Optional custom base URL for OpenAI API.OPENAI_TIMEOUT
: Optional custom timeout (in seconds) for OpenAI API.
Default Models
- Anthropic:
claude-3.5-sonnet-beta
- OpenAI:
gpt-4o-mini
Development
Running Tests
Run all tests:
run.bat test
Run specific test suite:
run.bat test server
run.bat test anthropic
run.bat test openai
Docker Support
Build the Docker image:
docker build -t mcp-image-recognition .
Run the container:
docker run -it --env-file .env mcp-image-recognition
License
MIT License - see LICENSE file for details.
Release History
- 0.1.2: Improved OCR error handling and added comprehensive test coverage for OCR functionality
- 0.1.1: Added Tesseract OCR support for text extraction from images
- 0.1.0: Initial release with Anthropic and OpenAI vision support