Use DeepInfra's serverless inference for CortexDB embeddings and entity extraction.
DeepInfra Provider
Access a wide catalog of open-source models through DeepInfra's serverless inference platform with pay-per-token pricing.
Overview
DeepInfra provides serverless and dedicated inference for open-source models. This integration configures CortexDB to use DeepInfra for:
- Embedding generation — high-quality open-source embedding models
- Entity extraction — Llama, Mistral, and other chat models for relationship extraction
Installation
pip install cortexdbai[deepinfra]
Configuration
| Environment Variable | Default | Description |
|---|---|---|
| CORTEX_DEEPINFRA_API_KEY | Required | DeepInfra API key |
| CORTEX_DEEPINFRA_EMBED_MODEL | BAAI/bge-large-en-v1.5 | Embedding model |
| CORTEX_DEEPINFRA_CHAT_MODEL | meta-llama/Meta-Llama-3.1-70B-Instruct | Chat model |
Embedding Models
| Model | Dimensions | Description |
|---|---|---|
| BAAI/bge-large-en-v1.5 | 1024 | High-quality general-purpose embeddings by BAAI |
| sentence-transformers/all-MiniLM-L6-v2 | 384 | Lightweight, fast sentence embeddings |
| intfloat/e5-large-v2 | 1024 | Strong multilingual embedding model |
Usage Example
from cortexdb_deepinfra import DeepInfraEmbeddingProvider, DeepInfraConfig
config = DeepInfraConfig(api_key="your-deepinfra-api-key")
async with DeepInfraEmbeddingProvider(config=config) as provider:
embedding = await provider.embed_query("What is event sourcing?")
print(f"Dimension: {provider.dimension}") # 1024
embeddings = await provider.embed([
"Event sourcing stores all changes as events.",
"CQRS separates reads from writes.",
])
Using a Different Model
config = DeepInfraConfig(
api_key="your-deepinfra-api-key",
embed_model="sentence-transformers/all-MiniLM-L6-v2",
)
provider = DeepInfraEmbeddingProvider(config=config)
# provider.dimension == 384
Switching Providers
To switch CortexDB from the default OpenAI embeddings to DeepInfra:
from cortexdb_deepinfra import DeepInfraEmbeddingProvider
provider = DeepInfraEmbeddingProvider() # reads from env vars
All CortexDB embedding providers implement the same interface (embed, embed_query, dimension, model_name), so switching is a one-line change.
Under the Hood
When using the DeepInfra provider, the SDK translates your calls into REST API requests against the CortexDB and DeepInfra endpoints.
Storing a memory (remember)
# SDK: cortex.remember("Rate limits are enforced at the API gateway.")
curl -X POST https://api.cortexdb.ai/v1/remember \
-H "Authorization: Bearer $CORTEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Rate limits are enforced at the API gateway.",
"tenant_id": "my-app"
}'
# Returns: { "event_id": "evt_abc123" }
Retrieving context (recall)
# SDK: result = cortex.recall("Where are rate limits enforced?")
# result.context, result.confidence, result.latency_ms
curl -X POST https://api.cortexdb.ai/v1/recall \
-H "Authorization: Bearer $CORTEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Where are rate limits enforced?",
"tenant_id": "my-app"
}'
# Returns: { "context": "...", "confidence": 0.93, "latency_ms": 47 }
Generating embeddings (DeepInfra)
# The provider calls DeepInfra's embedding endpoint
curl -X POST https://api.deepinfra.com/v1/inference/BAAI/bge-large-en-v1.5 \
-H "Authorization: Bearer $DEEPINFRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Where are rate limits enforced?"]
}'
# Returns: { "embeddings": [[0.123, -0.456, ...]] }