Route /v1/answer through Groq's LPU-accelerated models for sub-second responses.

Groq Integration

CortexDB routes /v1/answer through the configured LLM router. Point it at Groq for low-latency inference.

Deployment configuration

CORTEX_LLM_URL=https://api.groq.com/openai/v1
CORTEX_LLM_API_KEY=gsk_...
CORTEX_LLM_MODEL=llama-3.3-70b-versatile

Restart the cortex service after changing config.

Per-request override

client.answer(
    scope="org:acme/user:alice",
    question="What did we decide?",
    answer_model="groq/llama-3.3-70b-versatile",
)

Groq's TPS lets you keep latency under 1 s end-to-end for short answers; for multi-session reasoning you'll still benefit from Claude Opus or GPT-4o on /v1/answer's primary path and Groq as a cost-saver alt.

See also