Use locally-hosted Ollama models for CortexDB embedding generation and entity extraction.
Ollama Provider
Run CortexDB's embedding and entity-extraction pipelines entirely on your own hardware using Ollama. No API keys, no data leaving your network.
Overview
Ollama is a local inference engine that runs open-source models on consumer hardware. This integration configures CortexDB to use Ollama for:
- Embedding generation — convert text to vectors for semantic search
- Entity extraction — extract entities and relationships from ingested episodes using local chat models (Llama 3, Mistral, etc.)
Installation
pip install cortexdbai[ollama]
Make sure Ollama is running locally:
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull an embedding model
ollama pull nomic-embed-text
# Ollama serves on http://localhost:11434 by default
Configuration
| Environment Variable | Default | Description |
|---|---|---|
| CORTEX_OLLAMA_URL | http://localhost:11434 | Ollama server URL |
| CORTEX_OLLAMA_EMBED_MODEL | nomic-embed-text | Embedding model name |
Embedding Models
| Model | Dimensions | Description |
|---|---|---|
| nomic-embed-text | 768 | High-quality, general-purpose embeddings |
| mxbai-embed-large | 1024 | Strong performance on retrieval benchmarks |
| all-minilm | 384 | Lightweight and fast |
| snowflake-arctic-embed | 1024 | Optimized for retrieval tasks |
Usage Example
from cortexdb_ollama import OllamaEmbeddingProvider, OllamaConfig
# Use defaults (localhost:11434, nomic-embed-text)
async with OllamaEmbeddingProvider() as provider:
embedding = await provider.embed_query("What is event sourcing?")
print(f"Dimension: {provider.dimension}") # 768
# Batch embedding
embeddings = await provider.embed([
"Event sourcing stores all changes as events.",
"CQRS separates reads from writes.",
])
Custom Configuration
config = OllamaConfig(
base_url="http://gpu-server:11434",
embed_model="mxbai-embed-large",
)
provider = OllamaEmbeddingProvider(config)
Self-Hosted Setup
Ollama is self-hosted by design. For production deployments:
# Run on a dedicated GPU server
OLLAMA_HOST=0.0.0.0:11434 ollama serve
# Pre-pull models at deploy time
ollama pull nomic-embed-text
ollama pull llama3
Point CortexDB at your Ollama instance:
export CORTEX_OLLAMA_URL=http://gpu-server:11434
export CORTEX_OLLAMA_EMBED_MODEL=nomic-embed-text
Switching Providers
To switch CortexDB from the default OpenAI embeddings to Ollama, update your configuration:
from cortexdb_ollama import OllamaEmbeddingProvider
# Replace your existing embedding provider
provider = OllamaEmbeddingProvider()
All CortexDB embedding providers implement the same interface (embed, embed_query, dimension, model_name), so switching is a one-line change.
Under the Hood
When using the Ollama provider, the SDK translates your calls into REST API requests against the CortexDB and Ollama endpoints.
Storing a memory (remember)
# SDK: cortex.remember("Event sourcing stores all changes as events.")
curl -X POST https://api.cortexdb.ai/v1/remember \
-H "Authorization: Bearer $CORTEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Event sourcing stores all changes as events.",
"tenant_id": "my-app"
}'
# Returns: { "event_id": "evt_abc123" }
Retrieving context (recall)
# SDK: result = cortex.recall("What is event sourcing?")
# result.context, result.confidence, result.latency_ms
curl -X POST https://api.cortexdb.ai/v1/recall \
-H "Authorization: Bearer $CORTEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What is event sourcing?",
"tenant_id": "my-app"
}'
# Returns: { "context": "...", "confidence": 0.92, "latency_ms": 45 }
Generating embeddings (Ollama)
# The provider calls Ollama's embedding endpoint
curl -X POST http://localhost:11434/api/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"prompt": "What is event sourcing?"
}'
# Returns: { "embedding": [0.123, -0.456, ...] }