High-level guidance for running CortexDB across multiple nodes for scale and availability.

Cluster Deployment

Use a clustered deployment when you need CortexDB to serve larger workloads, support broader operational coverage, or fit into a more resilient self-hosted environment.

This page is intentionally focused on deployment planning rather than internal coordination or storage mechanics.

When to move beyond a single node

A single-node deployment is often enough for development, pilots, and smaller internal rollouts.

Move to a cluster when you need one or more of the following:

  • higher availability for production traffic
  • more capacity for memory and retrieval workloads
  • separation across failure domains or zones
  • operational flexibility for growth and maintenance

What to plan for

Before deploying a cluster, decide how you want to handle:

  • node placement and failure-domain separation
  • private networking between nodes
  • storage and backup strategy for each node
  • secrets and provider credential management
  • monitoring, alerting, and operational ownership

Recommended rollout pattern

For most teams, the safest approach is:

  1. validate your application integration on a single-node deployment
  2. confirm access control, networking, and storage policies
  3. introduce a small production cluster
  4. expand only after you understand workload shape and operational needs

Operational guidance

Clustered deployments should be treated like production infrastructure.

That means you should:

  • run nodes on durable storage
  • keep internal node-to-node communication on private networks
  • use health checks and monitoring from day one
  • test failure handling, backup, and restore procedures before launch
  • document upgrade and maintenance workflows for operators

Scaling expectations

Cluster growth should be driven by workload, governance, and reliability needs, not by guesswork.

Common signals that it is time to expand include:

  • sustained traffic growth
  • larger memory volumes per tenant or workspace
  • increased concurrency from multiple applications or agents
  • tighter availability targets for business-critical workflows

What this page intentionally does not cover

This public guide does not document internal coordination algorithms, partition logic, or low-level node tuning. The goal here is to help operators plan the deployment model, not expose implementation-specific internals.

Next Steps