MCP HubMCP Hub
mario-andreschak

mcp-image-recognition

by: mario-andreschak

An MCP server that provides image recognition 👀 capabilities using Anthropic and OpenAI vision APIs

10created 20/02/2025
Visit
vision
AI

📌Overview

Purpose: To provide an image recognition server that utilizes advanced vision APIs from Anthropic and OpenAI for analyzing and describing images.

Overview: The MCP Image Recognition Server is a versatile framework that allows users to process images and extract detailed descriptions using state-of-the-art AI technologies. It supports various image formats and can be configured to use different AI providers depending on the requirements.

Key Features:

  • Image Description: Utilizes Anthropic Claude Vision or OpenAI GPT-4 Vision to generate detailed descriptions of provided images.

  • Multiple Image Formats: Supports JPEG, PNG, GIF, and WebP formats, allowing flexibility in image processing.

  • Configurable Providers: Users can define primary and fallback AI providers, enhancing reliability in image recognition tasks.

  • Multiple Input Options: Accepts both Base64-encoded images and file-based image inputs for convenience.

  • Optional OCR Support: Integrates Tesseract OCR for optional text extraction from images, offering additional functionality for users needing text data.


MCP Image Recognition Server

An MCP server providing image recognition capabilities using Anthropic and OpenAI vision APIs.

Features

  • Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
  • Support for multiple image formats (JPEG, PNG, GIF, WebP)
  • Configurable primary and fallback providers
  • Base64 and file-based image input support
  • Optional text extraction using Tesseract OCR

Requirements

  • Python 3.8 or higher
  • Tesseract OCR (optional, required for text extraction)

Installation

  1. Clone the repository:

    git clone https://github.com/mario-andreschak/mcp-image-recognition.git
    cd mcp-image-recognition
    
  2. Create and configure your environment file:

    cp .env.example .env
    
  3. Build the project:

    build.bat
    

Usage

Running the Server

Spawn the server using Python:

python -m image_recognition_server.server

Or using a batch file:

run.bat server

Available Tools

  1. describe_image

    • Input: Base64-encoded image data and MIME type
    • Output: Detailed description of the image
  2. describe_image_from_file

    • Input: Path to an image file
    • Output: Detailed description of the image

Environment Configuration

  • ANTHROPIC_API_KEY: Your Anthropic API key.
  • OPENAI_API_KEY: Your OpenAI API key.
  • VISION_PROVIDER: Primary vision provider (anthropic or openai).
  • FALLBACK_PROVIDER: Optional fallback provider.
  • LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR).
  • ENABLE_OCR: Enable Tesseract OCR text extraction (true or false).
  • OPENAI_MODEL: OpenAI Model (default: gpt-4o-mini).
  • OPENAI_BASE_URL: Optional custom base URL for OpenAI API.
  • OPENAI_TIMEOUT: Optional custom timeout (in seconds) for OpenAI API.

Default Models

  • Anthropic: claude-3.5-sonnet-beta
  • OpenAI: gpt-4o-mini

Development

Running Tests

Run all tests:

run.bat test

Run specific test suite:

run.bat test server
run.bat test anthropic
run.bat test openai

Docker Support

Build the Docker image:

docker build -t mcp-image-recognition .

Run the container:

docker run -it --env-file .env mcp-image-recognition

License

MIT License - see LICENSE file for details.

Release History

  • 0.1.2: Improved OCR error handling and added comprehensive test coverage for OCR functionality
  • 0.1.1: Added Tesseract OCR support for text extraction from images
  • 0.1.0: Initial release with Anthropic and OpenAI vision support