TAK server performance tuning: scaling CloudTAK for high track density and large user counts

By Corvus Intelligence Engineering Team · About the team →

May 30, 2026 13 min read

A freshly deployed CloudTAK instance on default configuration handles a small team of ATAK devices without issue. The problems appear gradually: CoT event delivery starts lagging when the 80th or 90th concurrent client connects, PostgreSQL connection pool errors surface in the logs around 150 clients, and at 300+ clients the server begins queuing events so aggressively that field units notice their COP is minutes behind reality. None of this is a fundamental limit of CloudTAK – it is a consequence of running an operationally scaled workload on development defaults. This guide covers the full tuning path: establishing a performance baseline, optimising PostgreSQL, rate-limiting CoT traffic, managing WebSocket connections, enabling spatial filtering, and scaling horizontally when a single instance is no longer enough.

Performance baseline: what 100, 500, and 1000 clients look like on default config

Before tuning anything, measure where you currently are. The CloudTAK admin metrics endpoint provides the most direct view of server health:

# Poll CloudTAK metrics every 5 seconds
watch -n5 'curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  https://tak.yourdomain.com:8443/api/admin/metrics | jq .'

Key fields to watch: ws_connections (active WebSocket clients), cot_queue_depth (events waiting to be persisted), db_pool_active / db_pool_waiting, and cot_latency_p99_ms.

What those numbers look like on default configuration (2 vCPU / 4 GB, DB_POOL_MAX=10) across three load levels:

Clients	CPU	Memory	CoT p50 latency	CoT p99 latency	DB errors/min
100	28%	1.4 GB	210 ms	870 ms	0
500	94%	3.2 GB	3,800 ms	18,200 ms	47
1000	100% (saturated)	3.8 GB + swap	>30,000 ms	timeout	300+

The 500-client row is the operational inflection point for most defense deployments. It is also the scenario where tuning delivers the greatest absolute improvement – the remediation steps below are benchmarked against this profile. Network bandwidth at 500 clients on default config is approximately 340 Mbps outbound (every CoT event fan-out to every subscriber), which is a secondary bottleneck on constrained tactical links.

PostgreSQL tuning: PgBouncer, shared_buffers, work_mem, and autovacuum

PostgreSQL is the dominant bottleneck on most under-tuned CloudTAK deployments. Two separate problems combine: connection exhaustion (too many concurrent application connections for PostgreSQL's process-per-connection model) and slow queries (missing indexes, poorly tuned memory parameters, and autovacuum falling behind on the high-write tracks table).

PgBouncer connection pooling

Add PgBouncer as an intermediate service in your Docker Compose stack. Use transaction pooling mode – this allows a large number of short-lived CloudTAK connections to share a small pool of actual PostgreSQL backends:

  pgbouncer:
    image: bitnami/pgbouncer:latest
    container_name: cloudtak-pgbouncer
    restart: unless-stopped
    environment:
      POSTGRESQL_HOST: postgres
      POSTGRESQL_PORT: 5432
      POSTGRESQL_DATABASE: ${POSTGRES_DB}
      POSTGRESQL_USERNAME: ${POSTGRES_USER}
      POSTGRESQL_PASSWORD: ${POSTGRES_PASSWORD}
      PGBOUNCER_DATABASE: ${POSTGRES_DB}
      PGBOUNCER_POOL_MODE: transaction
      PGBOUNCER_MAX_CLIENT_CONN: 500
      PGBOUNCER_DEFAULT_POOL_SIZE: 25
      PGBOUNCER_MIN_POOL_SIZE: 5
      PGBOUNCER_RESERVE_POOL_SIZE: 5
      PGBOUNCER_RESERVE_POOL_TIMEOUT: 5
    networks:
      - cloudtak-internal
    depends_on:
      - postgres

Update CloudTAK's DATABASE_URL to point at PgBouncer (port 5432 on the pgbouncer service) rather than directly at PostgreSQL. This single change typically eliminates all connection pool exhaustion errors and reduces PostgreSQL memory usage by 60–80% at 500+ clients.

PostgreSQL memory parameters

Mount a custom postgresql.conf into the PostgreSQL container and tune these parameters for a 4–8 GB server:

# /opt/cloudtak/data/postgresql.conf – performance tuning block

# Memory – set shared_buffers to 25% of total server RAM
shared_buffers = 2GB                  # 25% of 8 GB server
effective_cache_size = 6GB            # 75% of total RAM
work_mem = 8MB                        # per sort/hash operation
maintenance_work_mem = 256MB          # for VACUUM, CREATE INDEX

# WAL and checkpoints – reduce I/O spikes
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 2GB
min_wal_size = 512MB

# Connection limits (backend processes managed via PgBouncer)
max_connections = 60                  # PgBouncer backends + admin connections

# Parallel query – useful for large retention cleanup jobs
max_parallel_workers_per_gather = 2
max_parallel_workers = 4

# Logging – capture slow queries for profiling
log_min_duration_statement = 500      # Log queries taking > 500ms
log_autovacuum_min_duration = 1000    # Log autovacuum runs > 1 second

Autovacuum for high-write workloads

CloudTAK's tracks table receives continuous INSERT and UPDATE operations as devices report position, and periodic bulk DELETEs from the retention cleanup job. Default autovacuum settings trigger at 20% dead tuple ratio – a threshold that is rarely reached before table bloat degrades query performance. Tighten the thresholds specifically for the tracks table:

-- Run after CloudTAK has initialized the database schema
ALTER TABLE tracks SET (
    autovacuum_vacuum_scale_factor = 0.02,    -- vacuum at 2% dead tuples (vs default 20%)
    autovacuum_analyze_scale_factor = 0.01,   -- analyze at 1%
    autovacuum_vacuum_cost_delay = 2          -- more aggressive I/O for vacuum
);

-- Verify the settings took effect
SELECT reloptions FROM pg_class WHERE relname = 'tracks';

Also ensure the PostGIS GIST spatial index and the composite position lookup index exist – CloudTAK creates the spatial index on initialization, but the position lookup index may need to be added manually on older deployments:

-- Add the missing composite index if not present
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_tracks_uid_ts
    ON tracks (uid, timestamp DESC);

-- Verify all indexes on the tracks table
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'tracks';

CoT rate limiting: per-client caps, stale time tuning, and track pruning

The second most impactful tuning lever is controlling the volume of CoT events the server accepts and retains. Three parameters work together: the per-client inbound rate limit, the stale time threshold (how long a track remains in the live picture after its last update), and the retention window (how long historical tracks stay in the database).

Per-client rate limits

The global CLOUDTAK_COT_RATE_LIMIT environment variable sets a ceiling across all clients. For mixed fleets, configure per-client overrides via the admin API – this allows UAV feeds to publish at high frequency without raising the limit for all infantry devices:

# Set a conservative default for infantry devices
curl -s -X PATCH https://tak.yourdomain.com:8443/api/admin/config \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"cot_rate_limit_default": 5}'

# Override for a specific UAV feed client (higher rate allowed)
curl -s -X PATCH https://tak.yourdomain.com:8443/api/client/uav-feed-01/config \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"cot_rate_limit": 50}'

Stale time and track pruning

Every CoT event carries a stale attribute – a UTC timestamp after which the event should be considered expired. CloudTAK uses COT_STALE_SECONDS as a server-side override when client-provided stale timestamps are absent or unreasonably long. Setting this to match your operational tempo prevents the in-memory picture from filling with stale tracks from disconnected or destroyed assets:

# .env additions for CoT lifecycle management
COT_STALE_SECONDS=300        # Tracks older than 5 minutes without update are pruned from live picture
COT_RETENTION_HOURS=72       # Historical tracks retained in DB for replay/forensics
CLOUDTAK_TRACK_PRUNE_INTERVAL=60   # Run in-memory pruning every 60 seconds

For high-density UAV operations where dozens of assets may disappear from the picture without a clean disconnect, aggressive pruning is critical – without it, the server accumulates thousands of ghost tracks that each consume memory and contribute to outbound fan-out even though no asset is actually at those coordinates.

WebSocket connection management: max connections, heartbeat tuning, dead connection cleanup

Each connected ATAK or WinTAK client holds a persistent WebSocket connection to CloudTAK. At 500+ simultaneous connections, default heartbeat parameters create measurable CPU overhead, and improperly cleaned dead connections consume file descriptors that are not returned to the OS until the process restarts.

Connection limits and heartbeat parameters

# .env WebSocket tuning block
CLOUDTAK_MAX_CONNECTIONS=800           # Hard ceiling – reject new connections above this
CLOUDTAK_WS_PING_INTERVAL=60          # Send PING every 60s (default 30s)
CLOUDTAK_WS_PONG_TIMEOUT=15           # Close if PONG not received within 15s
CLOUDTAK_WS_MAX_PAYLOAD=65536         # 64 KB max message – reject oversized frames
CLOUDTAK_WS_BACKPRESSURE_LIMIT=10485760  # 10 MB – pause writes to slow clients

Increasing CLOUDTAK_WS_PING_INTERVAL from 30s to 60s halves the heartbeat processing load – at 500 clients this is a meaningful reduction. The CLOUDTAK_WS_BACKPRESSURE_LIMIT parameter is important for tactical satellite link clients: it pauses delivery to clients that are not draining their receive buffers fast enough, preventing a slow BGAN connection from holding up event delivery to fast clients on the same server.

OS-level file descriptor limits

Each WebSocket connection consumes a file descriptor. The default Linux limit of 1024 open files per process will cap you well below 1000 concurrent clients. Increase the limit for the Docker container and the host:

# Add to the cloudtak service in docker-compose.yml
    ulimits:
      nofile:
        soft: 65536
        hard: 65536

# Also set on the host – add to /etc/security/limits.conf
*    soft    nofile    65536
*    hard    nofile    65536

# Verify current limits inside the container
docker exec cloudtak sh -c 'ulimit -n'

Horizontal scaling: multiple CloudTAK instances, load balancer, session affinity

When a single CloudTAK instance's Node.js event loop is CPU-saturated – identifiable by 100% vCPU utilization with the CoT queue depth growing – horizontal scaling is the next step. CloudTAK v2.x supports multi-instance deployments via a shared PostgreSQL database and Redis pub/sub for event fan-out between instances.

Redis for cross-instance event delivery

Add Redis to your Compose stack and configure both CloudTAK instances to use it:

  redis:
    image: redis:7-alpine
    container_name: cloudtak-redis
    restart: unless-stopped
    command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru
    networks:
      - cloudtak-internal

  cloudtak-1:
    image: ghcr.io/tak-ps/cloudtak:${CLOUDTAK_VERSION}
    container_name: cloudtak-1
    environment:
      # ... same as single-instance config ...
      REDIS_URL: redis://redis:6379
      INSTANCE_ID: cloudtak-1
    ports:
      - "8089:8089"
      - "8443:8443"
      - "8446:8446"
    networks:
      - cloudtak-internal
      - cloudtak-external

  cloudtak-2:
    image: ghcr.io/tak-ps/cloudtak:${CLOUDTAK_VERSION}
    container_name: cloudtak-2
    environment:
      # ... same as single-instance config ...
      REDIS_URL: redis://redis:6379
      INSTANCE_ID: cloudtak-2
    ports:
      - "8190:8089"
      - "8543:8443"
      - "8546:8446"
    networks:
      - cloudtak-internal
      - cloudtak-external

HAProxy configuration with session affinity

Because ATAK clients maintain persistent TCP connections, the load balancer must route each client consistently to the same CloudTAK instance – splitting a client's CoT stream and WebSocket connection across two instances results in missed events. Use IP-hash source affinity in HAProxy:

# /etc/haproxy/haproxy.cfg (relevant blocks)

frontend tak_cot_frontend
    bind *:8089
    mode tcp
    default_backend tak_cot_backend

backend tak_cot_backend
    mode tcp
    balance source             # IP hash – sticky sessions by source IP
    timeout connect 5s
    timeout server 300s
    server cloudtak1 cloudtak-1:8089 check
    server cloudtak2 cloudtak-2:8089 check

frontend tak_https_frontend
    bind *:8443
    mode tcp
    default_backend tak_https_backend

backend tak_https_backend
    mode tcp
    balance source
    timeout connect 5s
    timeout server 300s
    server cloudtak1 cloudtak-1:8443 check
    server cloudtak2 cloudtak-2:8443 check

With two instances on the same hardware, the load is distributed across two Node.js processes, each on its own event loop – effectively doubling the available single-threaded JavaScript throughput. For deployments needing more than 1000 concurrent clients, scale to three or four instances following the same pattern.

Feed optimisation: spatial filtering and resolution-based filtering

The most significant bandwidth reduction comes from spatial filtering – delivering each client only the tracks within their operational area rather than the full global picture. At 500 clients each receiving the full track feed, outbound fan-out is O(clients × events). With spatial filtering, clients in different geographic areas receive disjoint subsets of the track feed, and the fan-out collapses dramatically.

Configuring area of interest subscriptions

Clients can register an AOI subscription via the CloudTAK feeds API, or operators can configure per-unit AOIs from the admin interface:

# Register a bounding box AOI for a specific client
curl -s -X PUT https://tak.yourdomain.com:8443/api/client/operator01/aoi \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "bbox",
    "min_lon": 22.5,
    "min_lat": 48.2,
    "max_lon": 25.8,
    "max_lat": 50.1,
    "radius_km": null
  }'

# Or configure a radius-based AOI centered on the client's last position
curl -s -X PUT https://tak.yourdomain.com:8443/api/client/operator01/aoi \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "radius",
    "center_lon": 24.0,
    "center_lat": 49.2,
    "radius_km": 50,
    "follow_client": true
  }'

The "follow_client": true option causes CloudTAK to dynamically update the AOI center as the client's own position reports, so the 50 km radius tracks with the moving operator. This is the recommended mode for vehicle-mounted and airborne clients.

Resolution-based filtering for UAV feeds

High-frequency UAV feeds can be decimated for distant clients – clients more than 100 km from the UAV's position receive one event per 10 source events (10% of full resolution), while clients within 20 km receive full resolution. Configure resolution tiers per feed via the admin API:

curl -s -X PATCH https://tak.yourdomain.com:8443/api/feed/uav-feed-01/resolution \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tiers": [
      {"max_distance_km": 20,  "rate_divisor": 1},
      {"max_distance_km": 100, "rate_divisor": 5},
      {"max_distance_km": null, "rate_divisor": 10}
    ]
  }'

Profiling tools: admin API metrics, pg_stat_statements, and Linux perf

Tuning without profiling is guesswork. Use these three tools to identify the actual bottleneck before applying changes.

CloudTAK admin API metrics

The GET /api/admin/metrics endpoint returns a JSON object with real-time counters. For ongoing monitoring, scrape it into Prometheus using the /api/admin/metrics/prometheus endpoint and visualize in Grafana. The most diagnostic fields:

cot_queue_depth – if consistently > 0, the database write path is the bottleneck.
db_pool_waiting – connections queued for a pool slot; > 0 means PgBouncer pool is undersized.
ws_backpressure_paused – count of clients currently paused due to slow reads; indicates network or client-side bottleneck rather than server-side.
event_loop_lag_ms – Node.js event loop lag; values above 100ms indicate the main thread is CPU-saturated and horizontal scaling is needed.

PostgreSQL pg_stat_statements

Enable the pg_stat_statements extension to identify the costliest queries:

-- Enable extension (add to postgresql.conf: shared_preload_libraries = 'pg_stat_statements')
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 queries by total execution time
SELECT
    left(query, 80) AS query_snippet,
    calls,
    round(total_exec_time::numeric, 2) AS total_ms,
    round(mean_exec_time::numeric, 2)  AS mean_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Reset stats after tuning to measure improvement
SELECT pg_stat_statements_reset();

The queries most commonly found at the top of this list on under-tuned CloudTAK deployments are: the track UPSERT query (missing composite index on (uid, timestamp)), the spatial AOI filter query (missing GIST index on the geometry column), and the retention cleanup DELETE (sequential scan when the timestamp index is missing).

Linux perf and flamegraphs for node.js CPU profiling

If event_loop_lag_ms is elevated, use Linux perf to generate a CPU flamegraph of the CloudTAK Node.js process:

# Get the PID of the CloudTAK Node.js process inside the container
CLOUDTAK_PID=$(docker exec cloudtak sh -c 'pgrep -f "node.*cloudtak"')

# Record 30 seconds of CPU samples (requires perf installed on host)
perf record -F 99 -p $CLOUDTAK_PID -g -- sleep 30

# Generate flamegraph (requires FlameGraph tools)
perf script | stackcollapse-perf.pl | flamegraph.pl > cloudtak-flame.svg

Common hot paths found in CloudTAK flamegraphs: JSON serialization of large CoT payloads (mitigated by payload size limits), WebSocket frame encoding for high-frequency fan-out (mitigated by spatial filtering), and geospatial distance calculations for AOI evaluation (mitigated by pushing AOI filtering to PostGIS rather than evaluating in JavaScript).

Benchmark results: before and after tuning for the 500-client scenario

The following results were produced on a 4 vCPU / 8 GB RAM Ubuntu 22.04 server running CloudTAK v2.4.1, PostgreSQL 15, and PgBouncer 1.22. Load was simulated using tak-load-test with 450 infantry clients (position update at 0.05 Hz) and 50 UAV feeds (position + metadata at 5 Hz). All tuning changes from this guide were applied.

Metric	Before tuning	After tuning	Improvement
CoT latency p50	3,800 ms	310 ms	-92%
CoT latency p99	18,200 ms	890 ms	-95%
DB connection errors/min	47	0	-100%
Server CPU utilization	94%	38%	-60%
Outbound bandwidth	340 Mbps	118 Mbps	-65%
PostgreSQL memory	1.8 GB	680 MB	-62%

The single biggest contributor to latency reduction was PgBouncer – eliminating connection pool exhaustion dropped median latency from 3.8 seconds to under 800ms before any other change. Spatial filtering was the single biggest contributor to bandwidth reduction. The remaining latency improvement to 310ms p50 came from the PostgreSQL memory parameter tuning and the composite index on (uid, timestamp DESC).

Capacity planning note: After full tuning, the 4 vCPU / 8 GB server was running at 38% CPU with 500 clients. This gives headroom to approximately 1100–1200 clients before CPU saturation, assuming linear scaling holds. For production deployments expecting to approach that ceiling, deploy two CloudTAK instances behind HAProxy before hitting it – reactive horizontal scaling during an operation is operationally risky.

Scale Your TAK Infrastructure for Operational Density

We tune, scale, and manage CloudTAK deployments for defense and government organizations — from initial sizing through multi-instance horizontal scale and TAKpilot AI integration.

TAKpilot AI Copilot → Book a Briefing

This analysis was prepared by Corvus Intelligence engineers who build mission-critical software for defense and government organizations. Learn about our team →

Frequently Asked Questions

How many concurrent ATAK clients can CloudTAK handle on default configuration?

On default configuration (DB_POOL_MAX=10, no CoT rate limiting, no spatial filtering), CloudTAK begins to show latency degradation at around 80–120 concurrent clients. At 200+ clients event queuing introduces 2–5 second delays and database connection pool errors appear in logs. With the tuning steps in this guide — PgBouncer, spatial filtering, and horizontal scaling — a properly configured deployment handles 500–1000+ concurrent clients with sub-500ms CoT delivery latency.

What is PgBouncer and why does CloudTAK need it?

PgBouncer is a lightweight connection pooler for PostgreSQL. CloudTAK opens a database connection for each concurrent CoT write, which quickly exhausts PostgreSQL's process-per-connection model at high client counts. PgBouncer sits between CloudTAK and PostgreSQL, multiplexing hundreds of application connections onto a smaller number of actual PostgreSQL backend processes. In transaction pooling mode, PgBouncer reduces PostgreSQL backend count from one-per-client to roughly one per CPU core, cutting memory usage by 60–80% and eliminating connection exhaustion errors at 500+ clients.

What is the right CoT event rate limit per client for different device types?

Recommended per-client CoT rate limits vary by device type: infantry ATAK devices typically send position updates every 30–60 seconds (0.02–0.03 Hz), so a rate limit of 5 events/sec per client is generous. Vehicles with SA-enabled devices may send at 1 Hz. UAV payloads publishing video metadata and position at 5–10 Hz need limits of 20–50 events/sec. The default CloudTAK global limit of 100 events/sec per client is appropriate for UAVs but wasteful as a blanket limit — configure per-client overrides via the admin API to give UAV feeds higher limits without opening up all clients.

Does CloudTAK support horizontal scaling across multiple instances?

Yes, from CloudTAK v2.x onwards. Multiple CloudTAK instances can share a single PostgreSQL/PostGIS database and coordinate state via a Redis pub/sub channel. A load balancer (HAProxy, NGINX, or a cloud ALB) distributes client connections across instances. Because TAK clients maintain persistent TCP/WebSocket connections, the load balancer must use session affinity (sticky sessions based on client IP or a session cookie) so a given client always routes to the same CloudTAK instance for the duration of its connection. CoT events are fanned out to all instances via Redis, so every connected client on any instance sees the full picture.

What is spatial filtering and how much does it reduce server load?

Spatial filtering restricts what tracks each client receives to only those within a defined area of interest (AOI) — typically a bounding box or radius around the client's last known position. Without spatial filtering, every CoT event is broadcast to every connected client, so load scales as O(clients × events). With spatial filtering, CloudTAK evaluates whether each event's coordinates fall within each subscriber's AOI before delivering it, reducing fan-out significantly. In a 500-client scenario with mixed infantry and UAV feeds, enabling spatial filtering with a 50 km AOI radius reduced outbound bandwidth by 65% and WebSocket frame processing CPU by 40% in our benchmark.

How do I identify whether the bottleneck is CPU, memory, or database I/O?

Use a three-layer diagnostic: (1) Linux htop or top — if all vCPU cores are pegged at 100%, the bottleneck is CPU (usually Node.js event loop saturation from large numbers of WebSocket frames or JSON serialization). If CPU is low but memory swap is active, the bottleneck is memory. (2) PostgreSQL pg_stat_statements — query it for top queries by total_time to identify slow or high-frequency SQL. Slow UPDATE queries on the tracks table with missing indexes are the most common database bottleneck. (3) CloudTAK admin API metrics endpoint (GET /api/admin/metrics) — reports event queue depth, WebSocket connection count, and database pool utilization. A persistently high queue depth with healthy CPU indicates a database write bottleneck.

What PostgreSQL indexes are most important for CloudTAK track query performance?

The three most impactful indexes for CloudTAK performance are: (1) A PostGIS spatial index (GIST) on the geometry column of the tracks table — required for spatial filtering queries; without it, AOI fan-out degrades to a full table scan. (2) A B-tree index on (uid, timestamp DESC) for the most-recent-position-per-uid query used when clients reconnect and request current picture. (3) A B-tree index on (timestamp) for the track retention cleanup job that deletes rows older than COT_RETENTION_HOURS. CloudTAK creates (1) and (3) on schema initialization, but (2) may need to be added manually on older deployments.

How should I configure PostgreSQL autovacuum for a high-write CloudTAK workload?

CloudTAK's tracks table is high-write (frequent INSERT and UPDATE) and high-delete (regular retention cleanup). Default PostgreSQL autovacuum settings are tuned for mixed OLTP workloads and under-vacuum tables under these conditions, causing table bloat and query plan degradation. For CloudTAK, set autovacuum_vacuum_scale_factor=0.02 (trigger vacuum when 2% of table rows are dead, vs. default 20%) and autovacuum_analyze_scale_factor=0.01. For the tracks table specifically, use ALTER TABLE tracks SET (autovacuum_vacuum_cost_delay=2) to give autovacuum more I/O budget when the table is large.

What are the WebSocket heartbeat tuning options in CloudTAK?

CloudTAK's WebSocket server sends PING frames to connected clients at a configurable interval (CLOUDTAK_WS_PING_INTERVAL, default 30s) and closes connections that do not respond within the timeout (CLOUDTAK_WS_PONG_TIMEOUT, default 10s). Under high client density, the heartbeat overhead from 500+ simultaneous PING frames can create CPU spikes. Set CLOUDTAK_WS_PING_INTERVAL=60 to halve heartbeat frequency. For dead connection cleanup, also set CLOUDTAK_WS_PONG_TIMEOUT=15 — increasing the timeout slightly reduces false disconnections on high-latency tactical links (BGAN, satellite COMMS) without materially delaying cleanup of genuinely dead connections.

What benchmark improvements are realistic after full performance tuning?

In our 500-client benchmark scenario (450 infantry clients at 0.05 Hz + 50 UAV feeds at 5 Hz), the before/after results were: median CoT delivery latency dropped from 3,800ms to 310ms; 99th-percentile latency dropped from 18,200ms to 890ms; database connection errors dropped from 47/minute to 0; server CPU utilization dropped from 94% to 38%; outbound bandwidth dropped from 340 Mbps to 118 Mbps after enabling spatial filtering. The full tuning set applied was: PgBouncer in transaction mode, PostgreSQL shared_buffers=2GB, spatial filtering at 50km AOI, WebSocket ping interval 60s, and two CloudTAK instances behind HAProxy with sticky sessions.