A tactical command-and-control system is a distributed application running across radios, vehicles, dismounted operators, and rear-area servers. The messaging bus is the spine. Pick the wrong one and the system feels fast in the lab and dies on a contested link. Pick the right one and the operator sees a fused picture refresh inside their decision cycle.
This article walks the four candidates that actually appear in production C2 builds — NATS, Apache Kafka, MQTT, and RabbitMQ — and lays out a decision framework for choosing between them. The short version: there is no single answer. Real systems run two or three, bridged.
1. The Tactical Messaging Problem
Tactical networks are not data-center networks. Bandwidth on a typical VHF combat radio is measured in kilobits per second, not megabits. Round-trip times across a MANET (mobile ad-hoc network) routinely cross 500ms. Packet loss above 20% is normal under jamming. Links flap as platforms move behind terrain. A satcom backhaul gets contended in the morning push and again at last light.
The operator does not tolerate stale data. A fused track that is two minutes old is worse than no track — it presents a confident lie about where a threat is. The bus must therefore enforce message expiry, prioritize fresh state over backlog, and degrade gracefully when the link returns after a partition rather than dumping ten minutes of queued telemetry at once.
Ordering matters too, but not uniformly. Telemetry can be coalesced (only the latest position matters). Commands cannot — a "weapons hold" issued at T+5 must not be overtaken by a "weapons free" issued at T+3 that arrived late. The bus needs different delivery semantics per topic, not one global guarantee.
Finally, the bus has to survive partition. When the radio link returns after a five-minute drop, three behaviours are wrong: dumping every queued message at once (overwhelms the consumer), silently discarding everything (loses ordered commands), and reordering during catch-up (delivers stale weapons-free after fresh weapons-hold). The correct behaviour is per-topic: coalesce-on-recovery for telemetry, ordered drain with timestamps for commands, full replay for audit logs. No single delivery mode satisfies all three.
2. NATS and JetStream
NATS is a small, opinionated pub-sub bus written in Go. A single binary, no external dependencies, default in-memory subjects, and sub-millisecond publish-to-deliver latency on a LAN. Footprint is in the tens of megabytes — small enough for a vehicle compute brick or a ruggedized edge node.
Core NATS is fire-and-forget. JetStream is the persistence layer added in 2020: durable streams, replay by sequence or time, consumer cursors, message expiry, and per-subject deduplication windows. JetStream uses Raft for replication. A 3-node JetStream cluster is the standard tactical core deployment — quorum survives one node loss, and the streams replicate without a separate Zookeeper-style coordinator.
NATS wins when the dominant traffic is small, frequent, low-latency messages between services — commands, fused track updates, microservice RPC over request-reply subjects. It is the default bus for service-to-service traffic inside a fusion engine.
Where it breaks: JetStream's replication is excellent inside a cluster but it is not designed to span a degraded WAN. Leaf nodes can extend a NATS topology outward to edge devices, but if the leaf goes offline for hours, the catch-up window is bounded by the stream's retention — not by the leaf's expectations. Treat NATS as the core bus, not the wide-area bus.
The fault-tolerance trade-off worth naming: JetStream Raft quorum requires a majority of replicas to acknowledge a write. In a 3-node cluster that means two acks. If one node is down for maintenance and a second loses its disk, writes stall — the cluster is preserving consistency at the cost of availability. For a tactical core that is the right choice; consistency of the operational picture is non-negotiable. But the operator pattern matters: do not run JetStream three-node clusters where two nodes share a single point of failure such as one switch or one power feed.
3. Apache Kafka
Kafka is the durability champion. An append-only log per partition, replication factor configurable per topic, retention measured in days or weeks, and a consumer model that lets new clients replay history from offset zero. For after-action review, audit logging, and analytics over historical operational data, Kafka is almost always the right answer.
It is also expensive. A production Kafka cluster wants three brokers minimum, fast local disks, gigabytes of page cache, and either Zookeeper (legacy) or KRaft (current, since Kafka 3.3 GA in late 2022, default in 3.5+). Partition rebalancing under network partition is a known operational hazard. Consumer group coordination assumes a stable connection to the group coordinator broker.
The "Kafka-for-everything" pattern that works in cloud-native shops fails at the tactical edge for three reasons. First, the resource footprint is wrong — a JVM-based broker on a fanless edge box loses to a NATS binary every time. Second, Kafka's strong-durability default punishes you on a high-loss link: producers stall waiting for acks. Third, the operational complexity (broker config, topic partitioning strategy, retention tuning, ISR monitoring) is unjustifiable when the box is unattended in a forward position.
Kafka belongs at the strategic tier — the rear-area cluster that ingests aggregated event streams from forward-deployed gateways and serves them to analytics, training data pipelines, and long-term archives.
4. MQTT
MQTT was designed in 1999 for oil-pipeline telemetry over satellite links — exactly the network profile a tactical sensor network presents today. Tiny header overhead (2-byte fixed header in the minimal case), three quality-of-service levels (0 fire-and-forget, 1 at-least-once, 2 exactly-once), and a topic hierarchy that maps naturally onto sensor → unit → echelon structures.
MQTT 5.0, finalized in 2019, added the features that make it operationally serious for defense. Shared subscriptions ($share/group/topic) load-balance a topic across a consumer group — useful for fan-out processing of sensor data. Message expiry intervals discard stale tactical data automatically at the broker. User properties carry classification labels and release markings as message metadata. Topic aliases compress repeated long topic strings into a single byte after the first publish — a real win on a 9600 bps radio.
The broker side is mature: Mosquitto for small footprints, EMQX or HiveMQ for larger clustered deployments with shared subscriptions and bridging. All three run on edge-class hardware. MQTT-SN (Sensor Networks) extends the protocol over non-TCP transports for the truly tiny — battery-powered sensors with no IP stack.
MQTT's weakness is durability. Persistent sessions and QoS 2 give you reliable delivery to a known client, but MQTT is not an event log — there is no replay-by-offset semantics. If a consumer disconnects past its session expiry, the messages are gone. For sensor telemetry that is acceptable. For an audit trail it is not.
5. RabbitMQ and AMQP
RabbitMQ predates the cloud-native messaging wave and still earns its place. The AMQP 0-9-1 model — exchanges, bindings, queues — gives routing flexibility that pub-sub buses cannot match. Topic exchanges with wildcard bindings, header exchanges for content-based routing, dead-letter queues for failed messages, per-queue TTLs, and per-message acknowledgement with redelivery counts.
For workflows where a message must be processed exactly once by exactly one worker, with explicit ack and retry semantics, RabbitMQ is still the cleanest answer. Examples in a C2 stack: tasking workflows where each tasking goes to one operator, geocoding requests that hit an external service, OCR jobs against captured imagery. These are queue problems, not stream problems, and queue semantics are what RabbitMQ does.
Observability is the other quiet strength — the management UI, the Prometheus exporter, and per-queue metrics make it the easiest of the four to operate at 03:00 when something is wrong. For a small ops team running an unattended tactical cloud, that matters.
RabbitMQ's limits show up at very high throughput (it is not a million-messages-per-second bus) and on flaky networks (the connection-oriented AMQP model dislikes link flaps). Use it for the workflow layer, not the telemetry firehose.
6. Bridging Buses
Production C2 systems run two or three buses simultaneously. A representative deployment: MQTT at the edge for sensor and radio traffic, NATS in the tactical core for service-to-service commands and fused tracks, Kafka at the strategic tier for durable event archive. RabbitMQ may appear alongside NATS for the workflow layer.
The bridges are first-class components, not afterthoughts. An MQTT-to-NATS gateway subscribes to selected MQTT topics, transforms the payload to the canonical internal schema, and re-publishes onto a NATS subject. A NATS-to-Kafka bridge consumes JetStream streams and produces to Kafka topics with the same partition key strategy. Schema translation, backpressure handling, and idempotent re-publish on bridge restart are the hard parts — not the connection itself.
Build the bridges with the same engineering discipline as any other service: health checks, metrics, a defined replay procedure on restart, and clear ownership. The classic failure mode is a bridge that silently drops messages under load because its internal queue overflowed.
7. Security and Classification
Every bus speaks TLS. Every bus supports mutual TLS with client certificates. That is necessary, not sufficient.
Per-enclave isolation is the next layer: a separate broker instance with a separate certificate authority for each classification level. The bus inside the SECRET enclave never talks to the bus inside the UNCLASSIFIED enclave directly. Cross-domain release goes through an approved guard or cross-domain solution that strips, transforms, and re-publishes — never through a broker bridge.
Per-topic ACLs are the third layer. On NATS, accounts and subject permissions. On MQTT, broker ACL files or a plugin. On Kafka, ACLs via the AdminClient API. On RabbitMQ, user-vhost-resource permissions. Default-deny is the only acceptable posture: a service can publish to and subscribe from exactly the topics its role requires, and no others.
Message metadata carries classification labels — MQTT 5 user properties, NATS headers, Kafka headers. The broker does not enforce classification semantics; the consuming services and the cross-domain guard do. The broker enforces who can read what topic.
Key insight: The messaging bus is part of the security boundary, not separate from it. Treat broker configuration — ACLs, TLS, account isolation — with the same rigour as offline-first application design and symbology compliance. A misconfigured ACL is a classification spill waiting to happen.
8. Decision Framework
Score each traffic class against four axes:
Latency budget. Sub-millisecond service-to-service RPC: NATS. Tens of milliseconds for sensor telemetry: MQTT. Seconds for archive ingest: Kafka. Per-message workflow steps with ack semantics: RabbitMQ.
Throughput. Up to ~10k messages/sec per topic on modest hardware: any of the four. 100k+ sustained per topic: NATS or Kafka. Millions across many topics: Kafka. Sensor fan-in from thousands of low-rate clients: MQTT with shared subscriptions.
Durability. No replay required: core NATS or MQTT QoS 0/1. Replay within a session or short window: NATS JetStream, MQTT persistent sessions. Multi-day audit-grade replay: Kafka. Per-message ack with retry and dead-letter: RabbitMQ.
Edge-network reality. 9600 bps radio with 30% loss: MQTT, with topic aliases and QoS 1. Tactical LAN inside a vehicle: NATS. Strategic WAN to a rear cluster: Kafka with a gateway in front. Intermittent satcom: MQTT for telemetry, asynchronous Kafka producer with local spool for archive.
Build the matrix for your specific system. Each traffic class maps to one bus. The bridges between them are explicit. The deployment runs the buses it needs and no more — adding a bus has an operational cost, and that cost is paid every shift, not just at integration time.