Route /v1/answer through DeepInfra's serverless open-model inference.

DeepInfra Integration

DeepInfra hosts open-source models behind an OpenAI-compatible API with usage-based pricing. Point CortexDB's LLM router at it.

Deployment configuration

CORTEX_LLM_URL=https://api.deepinfra.com/v1/openai
CORTEX_LLM_API_KEY=di_...
CORTEX_LLM_MODEL=meta-llama/Llama-3.3-70B-Instruct
CORTEX_EMBEDDING_PROVIDER=deepinfra
CORTEX_EMBEDDING_MODEL=BAAI/bge-base-en-v1.5

Per-request override

client.answer(
    scope="org:acme/user:alice",
    question="What did we decide?",
    answer_model="deepinfra/meta-llama/Llama-3.3-70B-Instruct",
)

See also