Use Groq's ultra-fast LPU inference for CortexDB entity extraction.

Groq Provider

Leverage Groq's LPU hardware for ultra-fast entity extraction in CortexDB. Groq delivers the fastest inference speeds available, making it ideal for high-throughput ingestion pipelines.

Overview

Groq provides inference hardware (LPU) that delivers tokens at exceptional speed. This integration configures CortexDB to use Groq for:

Entity extraction — extract entities and relationships using Llama 3.3, Mixtral, and other fast chat models

Note

Groq does not currently offer a native embedding endpoint. This provider is forward-compatible and will support embeddings once Groq announces the feature. For embeddings, pair Groq with another provider such as Together AI or Ollama.

Installation

pip install cortexdbai[groq]

Configuration

| Environment Variable | Default | Description | |---|---|---| | CORTEX_GROQ_API_KEY | Required | Groq API key | | CORTEX_GROQ_MODEL | llama-3.3-70b-versatile | Chat model for entity extraction | | CORTEX_GROQ_EMBED_MODEL | (none) | Embedding model (when available) |

Chat Models for Entity Extraction

| Model | Context Window | Description | |---|---|---| | llama-3.3-70b-versatile | 128K | Best quality, very fast on Groq LPU | | mixtral-8x7b-32768 | 32K | Excellent MoE model, strong reasoning | | llama-3.1-8b-instant | 128K | Fastest, good for simple extraction |

Usage Example

from cortexdb_groq import GroqConfig

# Configure for entity extraction
config = GroqConfig(
    api_key="your-groq-api-key",
    chat_model="llama-3.3-70b-versatile",
)

Forward-Compatible Embedding Usage

When Groq adds embedding support, the provider will work immediately:

from cortexdb_groq import GroqEmbeddingProvider, GroqConfig

config = GroqConfig(
    api_key="your-groq-api-key",
    embed_model="future-embedding-model",
)
provider = GroqEmbeddingProvider(config=config, dimension=768)
embedding = await provider.embed_query("test")

Switching Providers

Groq uses an OpenAI-compatible API. To switch CortexDB's entity extraction to Groq:

from cortexdb_groq import GroqConfig

config = GroqConfig(api_key="your-groq-api-key")
# Pass config.chat_model to your CortexDB entity extraction pipeline

For embeddings, combine Groq (entity extraction) with another provider (embedding generation) for a best-of-both setup.

Under the Hood

When using the Groq provider, the SDK translates your calls into REST API requests against the CortexDB and Groq endpoints.

Storing a memory (remember)

# SDK: cortex.remember("Alice fixed the auth bug in PR #234.")
curl -X POST https://api.cortexdb.ai/v1/remember \
  -H "Authorization: Bearer $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Alice fixed the auth bug in PR #234.",
    "tenant_id": "my-app"
  }'
# Returns: { "event_id": "evt_abc123" }

Retrieving context (recall)

# SDK: result = cortex.recall("Who fixed the auth bug?")
# result.context, result.confidence, result.latency_ms
curl -X POST https://api.cortexdb.ai/v1/recall \
  -H "Authorization: Bearer $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who fixed the auth bug?",
    "tenant_id": "my-app"
  }'
# Returns: { "context": "...", "confidence": 0.95, "latency_ms": 38 }

Entity extraction (Groq)

# The provider calls Groq's chat completion endpoint for entity extraction
curl -X POST https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "system", "content": "Extract entities and relationships..."},
      {"role": "user", "content": "Alice fixed the auth bug in PR #234."}
    ]
  }'