kb-mcp-server
by: Geeksfino
Build a knowledge base into a tar.gz and give it to this MCP server, and it is ready to serve.
📌Overview
Purpose: To provide an efficient implementation of a Model Context Protocol (MCP) server using txtai that enables semantic search and AI-driven text processing via a standardized interface.
Overview: The Embedding MCP Server harnesses the power of txtai, an all-in-one embeddings database designed to facilitate semantic search, knowledge graph construction, and language model workflows. It combines multiple databases into a unified solution, allowing users to manage their knowledge in an accessible and efficient manner.
Key Features:
-
Unified Vector Database: Integrates vector indexes, graph networks, and relational databases, creating a comprehensive platform for data management and retrieval.
-
Semantic Search: Enables users to search for information based on meaning rather than keywords, enhancing the relevance of search results.
-
Knowledge Graph Integration: Automatically builds and queries knowledge graphs from data, allowing for improved navigation and understanding of relationships between concepts.
-
Portable Knowledge Bases: Users can save and share entire knowledge bases as compressed archives, making data management more flexible.
-
Extensible Pipeline System: Supports processing of various data types, including text, documents, audio, images, and video through a standardized API.
-
Local-first Architecture: Allows full functionality to be executed locally without reliance on external services, enhancing data privacy and control.
Embedding MCP Server
A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.
The Power of txtai: All-in-one Embeddings Database
This project leverages txtai, an all-in-one embeddings database for retrieval-augmented generation (RAG) leveraging semantic search, knowledge graph construction, and language model workflows. Key advantages include:
- Unified Vector Database: Combines vector indexes, graph networks, and relational databases in a single platform.
- Semantic Search: Find information based on meaning, not just keywords.
- Knowledge Graph Integration: Automatically build and query knowledge graphs from your data.
- Portable Knowledge Bases: Save entire knowledge bases as compressed archives (.tar.gz) for easy sharing and loading.
- Extensible Pipeline System: Process text, documents, audio, images, and video through a unified API.
- Local-first Architecture: Run all processes locally without sending data to external services.
How It Works
The project includes a knowledge base builder tool and an MCP server.
- The knowledge base builder is a command-line interface for creating and managing knowledge bases.
- The MCP server provides a standardized interface to access the knowledge base.
A knowledge base can be built either using the builder tool or directly via txtai's Python programming interface. The knowledge base can be a folder or a compressed .tar.gz
archive. The MCP server loads it transparently.
1. Build a Knowledge Base with kb_builder
The kb_builder
command-line tool lets you:
- Process documents from various input sources (files, directories, JSON).
- Extract text and create embeddings.
- Automatically build knowledge graphs.
- Export portable knowledge bases.
Note: kb_builder
is provided for convenience and may have limited functionality.
2. Start the MCP Server
The MCP server exposes:
- Semantic search capabilities.
- Knowledge graph querying and visualization.
- Text processing pipelines (e.g., summarization, extraction).
- Full compliance with the Model Context Protocol.
Installation
Recommended: Using uv with Python 3.10+
We recommend using uv with Python 3.10+ for consistent dependency management:
pip install -U uv
uv venv --python=3.10 # Or other supported 3.x versions
source .venv/bin/activate
# Install package
uv pip install kb-mcp-server
Note: The
transformers
package is pinned to version 4.49.0 to avoid deprecation warnings in newer versions.
Using conda
conda create -n embedding-mcp python=3.10
conda activate embedding-mcp
pip install kb-mcp-server
From Source
conda create -n embedding-mcp python=3.10
conda activate embedding-mcp
git clone https://github.com/Geeksfino/kb-mcp-server.git
cd kb-mcp-server
pip install -e .
Using uv (Faster Alternative)
pip install uv
uv venv
source .venv/bin/activate
# Install from PyPI
uv pip install kb-mcp-server
# Or install from source for development
uv pip install -e .
Using uvx (No Installation Required)
uvx
enables running packages directly without installation:
uvx --from kb-mcp-server@0.3.0 kb-mcp-server --embeddings /path/to/knowledge_base
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --config config.yml
uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "Your search query"
Command Line Usage
Building a Knowledge Base
Using PyPI Installed Commands
kb-build --input /path/to/documents --config config.yml
kb-build --input /path/to/new_documents --update
kb-build --input /path/to/documents --export my_knowledge_base.tar.gz
kb-search /path/to/knowledge_base "What is machine learning?"
kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10
Using uvx (No Installation Required)
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --config config.yml
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/new_documents --update
uvx --from kb-mcp-server@0.3.0 kb-build --input /path/to/documents --export my_knowledge_base.tar.gz
uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "What is machine learning?"
uvx --from kb-mcp-server@0.3.0 kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10
Using the Python Module
python -m kb_builder build --input /path/to/documents --config config.yml
python -m kb_builder build --input /path/to/new_documents --update
python -m kb_builder build --input /path/to/documents --export my_knowledge_base.tar.gz
Using Convenience Scripts
./scripts/kb_build.sh /path/to/documents technical_docs
./scripts/kb_build.sh /path/to/documents /path/to/my_config.yml
./scripts/kb_build.sh /path/to/documents technical_docs --update
./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?"
./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?" --graph
Run ./scripts/kb_build.sh --help
or ./scripts/kb_search.sh --help
for options.
Starting the MCP Server
Using PyPI Installed Command
kb-mcp-server --embeddings /path/to/knowledge_base_folder
kb-mcp-server --embeddings /path/to/knowledge_base.tar.gz
Using uvx
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base_folder
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base.tar.gz
Using the Python Module
python -m txtai_mcp_server --embeddings /path/to/knowledge_base_folder
python -m txtai_mcp_server --embeddings /path/to/knowledge_base.tar.gz
MCP Server Configuration
The MCP server is configured via environment variables or command-line arguments (no YAML files):
kb-mcp-server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000
# Or using uvx
uvx kb-mcp-server@0.2.6 --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000
# Or Python module
python -m txtai_mcp_server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000
# Using environment variables
export TXTAI_EMBEDDINGS=/path/to/knowledge_base
export MCP_SSE_HOST=0.0.0.0
export MCP_SSE_PORT=8000
python -m txtai_mcp_server
Common options:
--embeddings
: Path to the knowledge base (required)--host
: Host address (default: localhost)--port
: Port number (default: 8000)--transport
: Transport type (sse
orstdio
, default: stdio)--enable-causal-boost
: Enable causal boost for enhanced relevance--causal-config
: Path to custom causal boost config YAML
Configuring LLM Clients to Use the MCP Server
Create an MCP configuration file (e.g., mcp_config.json
) to connect your LLM client.
Example: Using the server directly with virtual environment
{
"mcpServers": {
"kb-server": {
"command": "/your/home/project/.venv/bin/kb-mcp-server",
"args": [
"--embeddings",
"/path/to/knowledge_base.tar.gz"
],
"cwd": "/path/to/working/directory"
}
}
}
Example: Using system default Python
{
"rag-server": {
"command": "python3",
"args": [
"-m",
"txtai_mcp_server",
"--embeddings",
"/path/to/knowledge_base.tar.gz",
"--enable-causal-boost"
],
"cwd": "/path/to/working/directory"
}
}
Example: Using uvx (requires uvx installed and in PATH)
{
"mcpServers": {
"kb-server": {
"command": "uvx",
"args": [
"kb-mcp-server@0.2.6",
"--embeddings", "/path/to/knowledge_base",
"--host", "localhost",
"--port", "8000"
],
"cwd": "/path/to/working/directory"
}
}
}
For macOS GUI apps (e.g., Claude Desktop), you may need to add uvx to the system PATH via a launchd environment plist and restart the system.
Advanced Knowledge Base Configuration
Knowledge bases built by txtai require a YAML config that controls embedding and pipeline behavior. This config is used only when building knowledge bases, not for MCP server runtime.
Example config snippet:
path: ~/.txtai/embeddings
writable: true
content:
path: sqlite:///~/.txtai/content.db
embeddings:
path: sentence-transformers/nli-mpnet-base-v2
backend: faiss
gpu: true
batch: 32
normalize: true
scoring: hybrid
hybridalpha: 0.75
pipeline:
workers: 2
queue: 100
timeout: 300
extractor:
path: distilbert-base-cased-distilled-squad
maxlength: 512
minscore: 0.3
graph:
backend: sqlite
path: ~/.txtai/graph.db
similarity: 0.75
limit: 10
Configuration Templates
The repository contains templates for various use cases:
-
Storage and backend:
memory.yml
: In-memory vectors (for development).sqlite-faiss.yml
: SQLite + FAISS.postgres-pgvector.yml
: PostgreSQL + pgvector (production).
-
Domain-specific:
base.yml
: Base template.code_repositories.yml
data_science.yml
general_knowledge.yml
research_papers.yml
technical_docs.yml
Example usage:
python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/technical_docs.yml
Advanced Features
Knowledge Graph Capabilities
- Automatic graph construction from documents.
- Graph traversal to explore related concepts.
- Path finding to discover connections.
- Community detection for clustering related data.
Causal Boosting Mechanism
Enhances search relevance by detecting and prioritizing causal relationships:
- Recognizes causal patterns in queries and documents.
- Supports multiple languages with automatic pattern selection.
- Configurable boost multipliers for different causal match types.
- Improves responses to "why" and "how" queries by surfacing explanatory content.
Causal boost behavior is customizable via YAML configuration.
License
MIT License – see LICENSE file for details.