📌Overview

Purpose: To provide an efficient implementation of a Model Context Protocol (MCP) server using txtai that enables semantic search and AI-driven text processing via a standardized interface.

Overview: The Embedding MCP Server harnesses the power of txtai, an all-in-one embeddings database designed to facilitate semantic search, knowledge graph construction, and language model workflows. It combines multiple databases into a unified solution, allowing users to manage their knowledge in an accessible and efficient manner.

Key Features:

Unified Vector Database: Integrates vector indexes, graph networks, and relational databases, creating a comprehensive platform for data management and retrieval.
Semantic Search: Enables users to search for information based on meaning rather than keywords, enhancing the relevance of search results.
Knowledge Graph Integration: Automatically builds and queries knowledge graphs from data, allowing for improved navigation and understanding of relationships between concepts.
Portable Knowledge Bases: Users can save and share entire knowledge bases as compressed archives, making data management more flexible.
Extensible Pipeline System: Supports processing of various data types, including text, documents, audio, images, and video through a standardized API.
Local-first Architecture: Allows full functionality to be executed locally without reliance on external services, enhancing data privacy and control.

Embedding MCP Server

A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.

The Power of txtai: All-in-one Embeddings Database

This project leverages txtai, an all-in-one embeddings database for retrieval-augmented generation (RAG) leveraging semantic search, knowledge graph construction, and language model workflows. Key advantages include:

Unified Vector Database: Combines vector indexes, graph networks, and relational databases in a single platform.
Semantic Search: Find information based on meaning, not just keywords.
Knowledge Graph Integration: Automatically build and query knowledge graphs from your data.
Portable Knowledge Bases: Save entire knowledge bases as compressed archives (.tar.gz) for easy sharing and loading.
Extensible Pipeline System: Process text, documents, audio, images, and video through a unified API.
Local-first Architecture: Run all processes locally without sending data to external services.

How It Works

The project includes a knowledge base builder tool and an MCP server.

The knowledge base builder is a command-line interface for creating and managing knowledge bases.
The MCP server provides a standardized interface to access the knowledge base.

A knowledge base can be built either using the builder tool or directly via txtai's Python programming interface. The knowledge base can be a folder or a compressed .tar.gz archive. The MCP server loads it transparently.

1. Build a Knowledge Base with kb_builder

The kb_builder command-line tool lets you:

Process documents from various input sources (files, directories, JSON).
Extract text and create embeddings.
Automatically build knowledge graphs.
Export portable knowledge bases.

Note: kb_builder is provided for convenience and may have limited functionality.

2. Start the MCP Server

The MCP server exposes:

Semantic search capabilities.
Knowledge graph querying and visualization.
Text processing pipelines (e.g., summarization, extraction).
Full compliance with the Model Context Protocol.

Installation

Recommended: Using uv with Python 3.10+

We recommend using uv with Python 3.10+ for consistent dependency management:

pip install -U uv
uv venv --python=3.10  # Or other supported 3.x versions
source .venv/bin/activate

# Install package
uv pip install kb-mcp-server

Note: The transformers package is pinned to version 4.49.0 to avoid deprecation warnings in newer versions.

Using conda

conda create -n embedding-mcp python=3.10
conda activate embedding-mcp
pip install kb-mcp-server

From Source

conda create -n embedding-mcp python=3.10
conda activate embedding-mcp

git clone https://github.com/Geeksfino/kb-mcp-server.git
cd kb-mcp-server

pip install -e .

Using uv (Faster Alternative)

pip install uv
uv venv
source .venv/bin/activate

# Install from PyPI
uv pip install kb-mcp-server

# Or install from source for development
uv pip install -e .

Using uvx (No Installation Required)

uvx enables running packages directly without installation:

uvx --from kb-mcp-server@0.3.0 kb-mcp-server --embeddings /path/to/knowledge_base

uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --config config.yml

uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "Your search query"

Command Line Usage

Building a Knowledge Base

Using PyPI Installed Commands

kb-build --input /path/to/documents --config config.yml
kb-build --input /path/to/new_documents --update
kb-build --input /path/to/documents --export my_knowledge_base.tar.gz

kb-search /path/to/knowledge_base "What is machine learning?"
kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10

Using uvx (No Installation Required)

uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --config config.yml
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/new_documents --update
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --export my_knowledge_base.tar.gz

uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "What is machine learning?"
uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10

Using the Python Module

python -m kb_builder build --input /path/to/documents --config config.yml
python -m kb_builder build --input /path/to/new_documents --update
python -m kb_builder build --input /path/to/documents --export my_knowledge_base.tar.gz

Using Convenience Scripts

./scripts/kb_build.sh /path/to/documents technical_docs
./scripts/kb_build.sh /path/to/documents /path/to/my_config.yml
./scripts/kb_build.sh /path/to/documents technical_docs --update

./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?"
./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?" --graph

Run ./scripts/kb_build.sh --help or ./scripts/kb_search.sh --help for options.

Starting the MCP Server

Using PyPI Installed Command

kb-mcp-server --embeddings /path/to/knowledge_base_folder
kb-mcp-server --embeddings /path/to/knowledge_base.tar.gz

Using uvx

uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base_folder
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base.tar.gz

Using the Python Module

python -m txtai_mcp_server --embeddings /path/to/knowledge_base_folder
python -m txtai_mcp_server --embeddings /path/to/knowledge_base.tar.gz

MCP Server Configuration

The MCP server is configured via environment variables or command-line arguments (no YAML files):

kb-mcp-server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Or using uvx
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Or Python module
python -m txtai_mcp_server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Using environment variables
export TXTAI_EMBEDDINGS=/path/to/knowledge_base
export MCP_SSE_HOST=0.0.0.0
export MCP_SSE_PORT=8000
python -m txtai_mcp_server

Common options:

--embeddings: Path to the knowledge base (required)
--host: Host address (default: localhost)
--port: Port number (default: 8000)
--transport: Transport type (sse or stdio, default: stdio)
--enable-causal-boost: Enable causal boost for enhanced relevance
--causal-config: Path to custom causal boost config YAML

Configuring LLM Clients to Use the MCP Server

Create an MCP configuration file (e.g., mcp_config.json) to connect your LLM client.

Example: Using the server directly with virtual environment

{
  "mcpServers": {
    "kb-server": {
      "command": "/your/home/project/.venv/bin/kb-mcp-server",
      "args": [
        "--embeddings",
        "/path/to/knowledge_base.tar.gz"
      ],
      "cwd": "/path/to/working/directory"
    }
  }
}

Example: Using system default Python

{
  "rag-server": {
    "command": "python3",
    "args": [
      "-m",
      "txtai_mcp_server",
      "--embeddings",
      "/path/to/knowledge_base.tar.gz",
      "--enable-causal-boost"
    ],
    "cwd": "/path/to/working/directory"
  }
}

Example: Using uvx (requires uvx installed and in PATH)

{
  "mcpServers": {
    "kb-server": {
      "command": "uvx",
      "args": [
        "kb-mcp-server@0.2.6",
        "--embeddings", "/path/to/knowledge_base",
        "--host", "localhost",
        "--port", "8000"
      ],
      "cwd": "/path/to/working/directory"
    }
  }
}

For macOS GUI apps (e.g., Claude Desktop), you may need to add uvx to the system PATH via a launchd environment plist and restart the system.

Advanced Knowledge Base Configuration

Knowledge bases built by txtai require a YAML config that controls embedding and pipeline behavior. This config is used only when building knowledge bases, not for MCP server runtime.

Example config snippet:

path: ~/.txtai/embeddings
writable: true

content:
  path: sqlite:///~/.txtai/content.db

embeddings:
  path: sentence-transformers/nli-mpnet-base-v2
  backend: faiss
  gpu: true
  batch: 32
  normalize: true

  scoring: hybrid
  hybridalpha: 0.75

pipeline:
  workers: 2
  queue: 100
  timeout: 300

extractor:
  path: distilbert-base-cased-distilled-squad
  maxlength: 512
  minscore: 0.3

graph:
  backend: sqlite
  path: ~/.txtai/graph.db
  similarity: 0.75
  limit: 10

Configuration Templates

The repository contains templates for various use cases:

Storage and backend:
- memory.yml: In-memory vectors (for development).
- sqlite-faiss.yml: SQLite + FAISS.
- postgres-pgvector.yml: PostgreSQL + pgvector (production).
Domain-specific:
- base.yml: Base template.
- code_repositories.yml
- data_science.yml
- general_knowledge.yml
- research_papers.yml
- technical_docs.yml

Example usage:

python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/technical_docs.yml

Advanced Features

Knowledge Graph Capabilities

Automatic graph construction from documents.
Graph traversal to explore related concepts.
Path finding to discover connections.
Community detection for clustering related data.

Causal Boosting Mechanism

Enhances search relevance by detecting and prioritizing causal relationships:

Recognizes causal patterns in queries and documents.
Supports multiple languages with automatic pattern selection.
Configurable boost multipliers for different causal match types.
Improves responses to "why" and "how" queries by surfacing explanatory content.

Causal boost behavior is customizable via YAML configuration.

License

MIT License – see LICENSE file for details.