Route /v1/answer through Fireworks AI's optimized open-model inference.
Fireworks AI Integration
Fireworks hosts optimized open-source models with sub-100ms TTFT. Point CortexDB's LLM router at the Fireworks endpoint.
Deployment configuration
CORTEX_LLM_URL=https://api.fireworks.ai/inference/v1
CORTEX_LLM_API_KEY=fw_...
CORTEX_LLM_MODEL=accounts/fireworks/models/llama-v3p3-70b-instruct
Per-request override
client.answer(
scope="org:acme/user:alice",
question="What did we decide?",
answer_model="fireworks/accounts/fireworks/models/qwen2p5-72b-instruct",
)