Use locally-hosted Ollama models for CortexDB embedding generation and entity extraction.
Ollama Provider
Run CortexDB's embedding and entity-extraction pipelines entirely on your own hardware using Ollama. No API keys, no data leaving your network.
Overview
Ollama is a local inference engine that runs open-source models on consumer hardware. This integration configures CortexDB to use Ollama for:
- Embedding generation — convert text to vectors for semantic search
- Entity extraction — extract entities and relationships from ingested episodes using local chat models (Llama 3, Mistral, etc.)
Installation
pip install cortexdb-ollama
Make sure Ollama is running locally:
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull an embedding model
ollama pull nomic-embed-text
# Ollama serves on http://localhost:11434 by default
Configuration
| Environment Variable | Default | Description |
|---|---|---|
| CORTEX_OLLAMA_URL | http://localhost:11434 | Ollama server URL |
| CORTEX_OLLAMA_EMBED_MODEL | nomic-embed-text | Embedding model name |
Embedding Models
| Model | Dimensions | Description |
|---|---|---|
| nomic-embed-text | 768 | High-quality, general-purpose embeddings |
| mxbai-embed-large | 1024 | Strong performance on retrieval benchmarks |
| all-minilm | 384 | Lightweight and fast |
| snowflake-arctic-embed | 1024 | Optimized for retrieval tasks |
Usage Example
from cortexdb_ollama import OllamaEmbeddingProvider, OllamaConfig
# Use defaults (localhost:11434, nomic-embed-text)
async with OllamaEmbeddingProvider() as provider:
embedding = await provider.embed_query("What is event sourcing?")
print(f"Dimension: {provider.dimension}") # 768
# Batch embedding
embeddings = await provider.embed([
"Event sourcing stores all changes as events.",
"CQRS separates reads from writes.",
])
Custom Configuration
config = OllamaConfig(
base_url="http://gpu-server:11434",
embed_model="mxbai-embed-large",
)
provider = OllamaEmbeddingProvider(config)
Self-Hosted Setup
Ollama is self-hosted by design. For production deployments:
# Run on a dedicated GPU server
OLLAMA_HOST=0.0.0.0:11434 ollama serve
# Pre-pull models at deploy time
ollama pull nomic-embed-text
ollama pull llama3
Point CortexDB at your Ollama instance:
export CORTEX_OLLAMA_URL=http://gpu-server:11434
export CORTEX_OLLAMA_EMBED_MODEL=nomic-embed-text
Switching Providers
To switch CortexDB from the default OpenAI embeddings to Ollama, update your configuration:
from cortexdb_ollama import OllamaEmbeddingProvider
# Replace your existing embedding provider
provider = OllamaEmbeddingProvider()
All CortexDB embedding providers implement the same interface (embed, embed_query, dimension, model_name), so switching is a one-line change.