Route /v1/answer through Fireworks AI's optimized open-model inference.

Fireworks AI Integration

Fireworks hosts optimized open-source models with sub-100ms TTFT. Point CortexDB's LLM router at the Fireworks endpoint.

Deployment configuration

CORTEX_LLM_URL=https://api.fireworks.ai/inference/v1
CORTEX_LLM_API_KEY=fw_...
CORTEX_LLM_MODEL=accounts/fireworks/models/llama-v3p3-70b-instruct

Per-request override

client.answer(
    scope="org:acme/user:alice",
    question="What did we decide?",
    answer_model="fireworks/accounts/fireworks/models/qwen2p5-72b-instruct",
)

See also