Route /v1/answer through Groq's LPU-accelerated models for sub-second responses.
Groq Integration
CortexDB routes /v1/answer through the configured LLM router. Point it at Groq for low-latency inference.
Deployment configuration
CORTEX_LLM_URL=https://api.groq.com/openai/v1
CORTEX_LLM_API_KEY=gsk_...
CORTEX_LLM_MODEL=llama-3.3-70b-versatile
Restart the cortex service after changing config.
Per-request override
client.answer(
scope="org:acme/user:alice",
question="What did we decide?",
answer_model="groq/llama-3.3-70b-versatile",
)
Groq's TPS lets you keep latency under 1 s end-to-end for short answers; for multi-session reasoning you'll still benefit from Claude Opus or GPT-4o on /v1/answer's primary path and Groq as a cost-saver alt.