Route /v1/answer through DeepInfra's serverless open-model inference.
DeepInfra Integration
DeepInfra hosts open-source models behind an OpenAI-compatible API with usage-based pricing. Point CortexDB's LLM router at it.
Deployment configuration
CORTEX_LLM_URL=https://api.deepinfra.com/v1/openai
CORTEX_LLM_API_KEY=di_...
CORTEX_LLM_MODEL=meta-llama/Llama-3.3-70B-Instruct
CORTEX_EMBEDDING_PROVIDER=deepinfra
CORTEX_EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
Per-request override
client.answer(
scope="org:acme/user:alice",
question="What did we decide?",
answer_model="deepinfra/meta-llama/Llama-3.3-70B-Instruct",
)