MCP HubMCP Hub
Geeksfino

kb-mcp-server

by: Geeksfino

Build a knowledge base into a tar.gz and give it to this MCP server, and it is ready to serve.

23created 27/02/2025
Visit
knowledge
compression

📌Overview

Purpose: To provide an efficient implementation of a Model Context Protocol (MCP) server using txtai that enables semantic search and AI-driven text processing via a standardized interface.

Overview: The Embedding MCP Server harnesses the power of txtai, an all-in-one embeddings database designed to facilitate semantic search, knowledge graph construction, and language model workflows. It combines multiple databases into a unified solution, allowing users to manage their knowledge in an accessible and efficient manner.

Key Features:

  • Unified Vector Database: Integrates vector indexes, graph networks, and relational databases, creating a comprehensive platform for data management and retrieval.

  • Semantic Search: Enables users to search for information based on meaning rather than keywords, enhancing the relevance of search results.

  • Knowledge Graph Integration: Automatically builds and queries knowledge graphs from data, allowing for improved navigation and understanding of relationships between concepts.

  • Portable Knowledge Bases: Users can save and share entire knowledge bases as compressed archives, making data management more flexible.

  • Extensible Pipeline System: Supports processing of various data types, including text, documents, audio, images, and video through a standardized API.

  • Local-first Architecture: Allows full functionality to be executed locally without reliance on external services, enhancing data privacy and control.


Embedding MCP Server

A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.

The Power of txtai: All-in-one Embeddings Database

This project leverages txtai, an all-in-one embeddings database for retrieval-augmented generation (RAG) leveraging semantic search, knowledge graph construction, and language model workflows. Key advantages include:

  • Unified Vector Database: Combines vector indexes, graph networks, and relational databases in a single platform.
  • Semantic Search: Find information based on meaning, not just keywords.
  • Knowledge Graph Integration: Automatically build and query knowledge graphs from your data.
  • Portable Knowledge Bases: Save entire knowledge bases as compressed archives (.tar.gz) for easy sharing and loading.
  • Extensible Pipeline System: Process text, documents, audio, images, and video through a unified API.
  • Local-first Architecture: Run all processes locally without sending data to external services.

How It Works

The project includes a knowledge base builder tool and an MCP server.

  • The knowledge base builder is a command-line interface for creating and managing knowledge bases.
  • The MCP server provides a standardized interface to access the knowledge base.

A knowledge base can be built either using the builder tool or directly via txtai's Python programming interface. The knowledge base can be a folder or a compressed .tar.gz archive. The MCP server loads it transparently.

1. Build a Knowledge Base with kb_builder

The kb_builder command-line tool lets you:

  • Process documents from various input sources (files, directories, JSON).
  • Extract text and create embeddings.
  • Automatically build knowledge graphs.
  • Export portable knowledge bases.

Note: kb_builder is provided for convenience and may have limited functionality.

2. Start the MCP Server

The MCP server exposes:

  • Semantic search capabilities.
  • Knowledge graph querying and visualization.
  • Text processing pipelines (e.g., summarization, extraction).
  • Full compliance with the Model Context Protocol.

Installation

Recommended: Using uv with Python 3.10+

We recommend using uv with Python 3.10+ for consistent dependency management:

pip install -U uv
uv venv --python=3.10  # Or other supported 3.x versions
source .venv/bin/activate

# Install package
uv pip install kb-mcp-server

Note: The transformers package is pinned to version 4.49.0 to avoid deprecation warnings in newer versions.

Using conda

conda create -n embedding-mcp python=3.10
conda activate embedding-mcp
pip install kb-mcp-server

From Source

conda create -n embedding-mcp python=3.10
conda activate embedding-mcp

git clone https://github.com/Geeksfino/kb-mcp-server.git
cd kb-mcp-server

pip install -e .

Using uv (Faster Alternative)

pip install uv
uv venv
source .venv/bin/activate

# Install from PyPI
uv pip install kb-mcp-server

# Or install from source for development
uv pip install -e .

Using uvx (No Installation Required)

uvx enables running packages directly without installation:

uvx --from kb-mcp-server@0.3.0 kb-mcp-server --embeddings /path/to/knowledge_base

uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --config config.yml

uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "Your search query"

Command Line Usage

Building a Knowledge Base

Using PyPI Installed Commands

kb-build --input /path/to/documents --config config.yml
kb-build --input /path/to/new_documents --update
kb-build --input /path/to/documents --export my_knowledge_base.tar.gz

kb-search /path/to/knowledge_base "What is machine learning?"
kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10

Using uvx (No Installation Required)

uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --config config.yml
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/new_documents --update
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --export my_knowledge_base.tar.gz

uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "What is machine learning?"
uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10

Using the Python Module

python -m kb_builder build --input /path/to/documents --config config.yml
python -m kb_builder build --input /path/to/new_documents --update
python -m kb_builder build --input /path/to/documents --export my_knowledge_base.tar.gz

Using Convenience Scripts

./scripts/kb_build.sh /path/to/documents technical_docs
./scripts/kb_build.sh /path/to/documents /path/to/my_config.yml
./scripts/kb_build.sh /path/to/documents technical_docs --update

./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?"
./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?" --graph

Run ./scripts/kb_build.sh --help or ./scripts/kb_search.sh --help for options.

Starting the MCP Server

Using PyPI Installed Command

kb-mcp-server --embeddings /path/to/knowledge_base_folder
kb-mcp-server --embeddings /path/to/knowledge_base.tar.gz

Using uvx

uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base_folder
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base.tar.gz

Using the Python Module

python -m txtai_mcp_server --embeddings /path/to/knowledge_base_folder
python -m txtai_mcp_server --embeddings /path/to/knowledge_base.tar.gz

MCP Server Configuration

The MCP server is configured via environment variables or command-line arguments (no YAML files):

kb-mcp-server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Or using uvx
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Or Python module
python -m txtai_mcp_server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Using environment variables
export TXTAI_EMBEDDINGS=/path/to/knowledge_base
export MCP_SSE_HOST=0.0.0.0
export MCP_SSE_PORT=8000
python -m txtai_mcp_server

Common options:

  • --embeddings: Path to the knowledge base (required)
  • --host: Host address (default: localhost)
  • --port: Port number (default: 8000)
  • --transport: Transport type (sse or stdio, default: stdio)
  • --enable-causal-boost: Enable causal boost for enhanced relevance
  • --causal-config: Path to custom causal boost config YAML

Configuring LLM Clients to Use the MCP Server

Create an MCP configuration file (e.g., mcp_config.json) to connect your LLM client.

Example: Using the server directly with virtual environment

{
  "mcpServers": {
    "kb-server": {
      "command": "/your/home/project/.venv/bin/kb-mcp-server",
      "args": [
        "--embeddings",
        "/path/to/knowledge_base.tar.gz"
      ],
      "cwd": "/path/to/working/directory"
    }
  }
}

Example: Using system default Python

{
  "rag-server": {
    "command": "python3",
    "args": [
      "-m",
      "txtai_mcp_server",
      "--embeddings",
      "/path/to/knowledge_base.tar.gz",
      "--enable-causal-boost"
    ],
    "cwd": "/path/to/working/directory"
  }
}

Example: Using uvx (requires uvx installed and in PATH)

{
  "mcpServers": {
    "kb-server": {
      "command": "uvx",
      "args": [
        "kb-mcp-server@0.2.6",
        "--embeddings", "/path/to/knowledge_base",
        "--host", "localhost",
        "--port", "8000"
      ],
      "cwd": "/path/to/working/directory"
    }
  }
}

For macOS GUI apps (e.g., Claude Desktop), you may need to add uvx to the system PATH via a launchd environment plist and restart the system.

Advanced Knowledge Base Configuration

Knowledge bases built by txtai require a YAML config that controls embedding and pipeline behavior. This config is used only when building knowledge bases, not for MCP server runtime.

Example config snippet:

path: ~/.txtai/embeddings
writable: true

content:
  path: sqlite:///~/.txtai/content.db

embeddings:
  path: sentence-transformers/nli-mpnet-base-v2
  backend: faiss
  gpu: true
  batch: 32
  normalize: true

  scoring: hybrid
  hybridalpha: 0.75

pipeline:
  workers: 2
  queue: 100
  timeout: 300

extractor:
  path: distilbert-base-cased-distilled-squad
  maxlength: 512
  minscore: 0.3

graph:
  backend: sqlite
  path: ~/.txtai/graph.db
  similarity: 0.75
  limit: 10

Configuration Templates

The repository contains templates for various use cases:

  • Storage and backend:

    • memory.yml: In-memory vectors (for development).
    • sqlite-faiss.yml: SQLite + FAISS.
    • postgres-pgvector.yml: PostgreSQL + pgvector (production).
  • Domain-specific:

    • base.yml: Base template.
    • code_repositories.yml
    • data_science.yml
    • general_knowledge.yml
    • research_papers.yml
    • technical_docs.yml

Example usage:

python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/technical_docs.yml

Advanced Features

Knowledge Graph Capabilities

  • Automatic graph construction from documents.
  • Graph traversal to explore related concepts.
  • Path finding to discover connections.
  • Community detection for clustering related data.

Causal Boosting Mechanism

Enhances search relevance by detecting and prioritizing causal relationships:

  • Recognizes causal patterns in queries and documents.
  • Supports multiple languages with automatic pattern selection.
  • Configurable boost multipliers for different causal match types.
  • Improves responses to "why" and "how" queries by surfacing explanatory content.

Causal boost behavior is customizable via YAML configuration.

License

MIT License – see LICENSE file for details.