Use locally-hosted Ollama models for CortexDB embedding generation and entity extraction.

Ollama Provider

Run CortexDB's embedding and entity-extraction pipelines entirely on your own hardware using Ollama. No API keys, no data leaving your network.

Overview

Ollama is a local inference engine that runs open-source models on consumer hardware. This integration configures CortexDB to use Ollama for:

  • Embedding generation — convert text to vectors for semantic search
  • Entity extraction — extract entities and relationships from ingested episodes using local chat models (Llama 3, Mistral, etc.)

Installation

pip install cortexdb-ollama

Make sure Ollama is running locally:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull an embedding model
ollama pull nomic-embed-text

# Ollama serves on http://localhost:11434 by default

Configuration

| Environment Variable | Default | Description | |---|---|---| | CORTEX_OLLAMA_URL | http://localhost:11434 | Ollama server URL | | CORTEX_OLLAMA_EMBED_MODEL | nomic-embed-text | Embedding model name |

Embedding Models

| Model | Dimensions | Description | |---|---|---| | nomic-embed-text | 768 | High-quality, general-purpose embeddings | | mxbai-embed-large | 1024 | Strong performance on retrieval benchmarks | | all-minilm | 384 | Lightweight and fast | | snowflake-arctic-embed | 1024 | Optimized for retrieval tasks |

Usage Example

from cortexdb_ollama import OllamaEmbeddingProvider, OllamaConfig

# Use defaults (localhost:11434, nomic-embed-text)
async with OllamaEmbeddingProvider() as provider:
    embedding = await provider.embed_query("What is event sourcing?")
    print(f"Dimension: {provider.dimension}")  # 768

    # Batch embedding
    embeddings = await provider.embed([
        "Event sourcing stores all changes as events.",
        "CQRS separates reads from writes.",
    ])

Custom Configuration

config = OllamaConfig(
    base_url="http://gpu-server:11434",
    embed_model="mxbai-embed-large",
)
provider = OllamaEmbeddingProvider(config)

Self-Hosted Setup

Ollama is self-hosted by design. For production deployments:

# Run on a dedicated GPU server
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# Pre-pull models at deploy time
ollama pull nomic-embed-text
ollama pull llama3

Point CortexDB at your Ollama instance:

export CORTEX_OLLAMA_URL=http://gpu-server:11434
export CORTEX_OLLAMA_EMBED_MODEL=nomic-embed-text

Switching Providers

To switch CortexDB from the default OpenAI embeddings to Ollama, update your configuration:

from cortexdb_ollama import OllamaEmbeddingProvider

# Replace your existing embedding provider
provider = OllamaEmbeddingProvider()

All CortexDB embedding providers implement the same interface (embed, embed_query, dimension, model_name), so switching is a one-line change.