Apache kafka for defense: real-time streaming

Defense systems generate data at a pace and volume that conventional request-response architectures cannot absorb. A single UAS feeds dozens of telemetry parameters per second. A brigade-level C2 node handles hundreds of position reports and status events per minute. An ISR fusion cell ingests feeds from radar, signals intelligence, and human reporting simultaneously, all requiring sub-second correlation. When these streams must flow reliably across a resilient, auditable, and classified infrastructure, Apache Kafka has become the architectural backbone of choice.

This article covers how to deploy Kafka specifically for defense use cases: partitioning strategy for multi-classification environments, full encryption configuration, air-gap deployment using KRaft mode, and the trade-off between self-hosted clusters and managed alternatives such as Azure Event Hubs for GovCloud workloads.

Why event streaming fits defense architectures

Defense workflows are inherently event-driven. Sensor telemetry does not arrive in neat batches – it is a continuous stream of readings that must be processed the moment it arrives to be operationally useful. C2 events – unit movement, tasking changes, status updates – are discrete messages that multiple consuming systems need simultaneously: the common operating picture, logistics, fires coordination, and after-action reporting all need the same underlying event without the producer knowing who is listening.

Kafka's publish-subscribe model maps cleanly onto this requirement. A producer writes a sensor reading or a C2 event to a topic. Any number of consumer groups – each representing a different downstream application – replay the event independently at their own pace. This decoupling means that adding a new analytics workload does not require changes to the producing system, which is critical in defense environments where software change control is slow and approval cycles are long.

Beyond decoupling, Kafka's durable log provides an append-only audit trail that satisfies the forensic requirements most defense systems carry. Every message is retained on disk for a configurable period. If an incident occurs, operators can replay the exact sequence of events leading up to it without relying on application-level logging.

Core kafka architecture for classified environments

Broker topology

A production-grade classified Kafka cluster requires a minimum of three broker nodes to support a replication factor of three and a min.insync.replicas setting of two. This configuration tolerates the loss of a single broker without data loss or producer errors. For high-availability classified deployments, five brokers – spread across at least three physical racks or availability zones – provide stronger fault tolerance with headroom for rolling upgrades.

Since Kafka 3.3, KRaft mode replaces ZooKeeper for cluster metadata management. For air-gapped defense deployments this is not optional – it is the correct default. A separate ZooKeeper ensemble adds three more nodes, a separate failure domain, and an additional attack surface. KRaft consolidates metadata management into the Kafka brokers themselves using a Raft-based quorum of controller nodes, typically co-hosted with brokers in small clusters or separated in large ones.

Topic partitioning by classification level

The most important architectural decision in a multi-classification Kafka deployment is how to enforce isolation between data at different sensitivity levels. There are two approaches.

The first approach uses a single cluster with topic-level ACL isolation. Topics are namespaced by classification: ts.sensor.uav-telemetry for top-secret telemetry, s.c2.position-reports for secret-level C2 data, c.logistics.supply-events for confidential logistics. Each service account is granted produce and consume rights only to topics matching its clearance level. This approach reduces operational complexity but requires high confidence in Kafka's ACL enforcement and careful network segmentation to ensure that broker processes themselves are not a lateral movement path.

The second approach – preferred when handling data above SECRET on the same physical infrastructure – uses separate broker clusters per classification domain, connected via a cross-domain solution (CDS) for the rare cases where downgraded data needs to flow across a boundary. This eliminates shared-broker risk entirely at the cost of increased operational overhead. For a deeper treatment of cross-domain architectures, see the article on cross-domain solutions for defense.

Retention and partition count

Set partition counts based on expected throughput, not on convenience. A topic handling 10,000 messages per second from a sensor array should have enough partitions so that each consumer in a group can process its assigned partitions without lag. A rule of thumb: partition count should be at least the number of consumers in the consuming group, and ideally two to three times that to allow for consumer group rebalancing without introducing hotspots.

Retention policy decisions must be documented and defensible. Sensor telemetry that is analyzed in near-real-time typically needs only 24–72 hours of retention before it can be offloaded to cold storage or discarded. C2 event logs required for after-action review should be retained for 30–90 days in the hot tier, after which they should be exported to an encrypted, immutable archive. Do not rely on Kafka alone as a long-term audit store – it is an event bus, not an archival database.

Encryption in transit: TLS 1.3 and SASL SCRAM

Classified environments mandate encryption on every data path. For Kafka, this means two distinct controls: transport encryption and client authentication.

TLS 1.3 configuration

Configure every Kafka listener – including inter-broker communication – with TLS 1.3. In server.properties:

listeners=SASL_SSL://0.0.0.0:9093
advertised.listeners=SASL_SSL://broker-1.internal:9093
ssl.protocol=TLSv1.3
ssl.enabled.protocols=TLSv1.3
ssl.keystore.location=/etc/kafka/ssl/broker.keystore.jks
ssl.keystore.password=${KEYSTORE_PASSWORD}
ssl.truststore.location=/etc/kafka/ssl/ca.truststore.jks
ssl.truststore.password=${TRUSTSTORE_PASSWORD}
ssl.client.auth=required

The ssl.client.auth=required setting enforces mutual TLS (mTLS): every connecting client must present a certificate signed by your internal certificate authority. This ensures that only known, provisioned systems can connect to the cluster – a requirement in any classified enclave. Do not use requested or none in classified environments.

Certificates must come from your internal PKI. Do not use certificates signed by public CAs in an air-gapped environment – and do not allow public CA roots in the broker truststore, as this could allow a compromised external certificate to masquerade as a legitimate client.

SASL SCRAM-SHA-512

On top of mTLS, use SASL SCRAM-SHA-512 for user-level authentication. This binds a named identity – such as a service account for a specific application – to the TLS connection, enabling fine-grained ACL enforcement based on principal name rather than certificate subject alone.

sasl.enabled.mechanisms=SCRAM-SHA-512
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
security.inter.broker.protocol=SASL_SSL

Provision credentials with kafka-configs.sh and store them in your secrets management system – HashiCorp Vault, or an equivalent air-gapped secret store – rather than in configuration files. Rotate credentials on a schedule that aligns with your accreditation's key management policy, typically every 90 days or upon personnel changes.

Encryption at rest: AES-256 and storage-layer controls

Kafka does not natively encrypt data written to its log segments. Encryption at rest is the responsibility of the storage layer. For bare-metal or virtual machine deployments, use LUKS (Linux Unified Key Setup) with AES-256 in XTS mode on the block devices hosting Kafka's log.dirs. For Kubernetes-based deployments, provision StorageClass resources backed by encrypted volumes – on Azure Government, use server-side encryption with customer-managed keys (SSE-CMK) on Azure Disk. On-premises equivalents include NetApp with NSE drives or pure software LUKS on standard NVMe.

For workloads where even the storage operator must not be able to read message content – particularly relevant for special access programs – implement application-layer encryption: the producer encrypts the message payload before writing, and only authorized consumers hold the decryption key. This approach is independent of Kafka's configuration and provides end-to-end confidentiality that persists even if broker storage is compromised. The trade-off is that broker-side filtering and compaction become impossible on encrypted payloads, since the broker cannot inspect the content.

Air-gapped kafka deployment with KRaft mode

An air-gapped Kafka deployment has no internet connectivity, no external DNS resolution, and no access to public container registries or package mirrors. Every component must be available locally before the cluster can start. This section covers the specific gotchas that catch engineers when deploying in this environment.

KRaft mode and no-ZooKeeper operation

Use Kafka 3.6 or later with KRaft mode enabled. The cluster requires a controller quorum – typically three controller nodes, which may be co-located with brokers in deployments of three to five nodes. Each node is assigned a node.id and a process.roles value of controller, broker, or both.

Bootstrap the cluster with kafka-storage.sh format to generate a cluster UUID and write the initial metadata log. This step must be run on every node with the same UUID before starting any broker process. In an air-gapped environment, generate the UUID on one node, copy it to the others, then run format on each.

CLUSTER_ID=$(kafka-storage.sh random-uuid)
kafka-storage.sh format -t $CLUSTER_ID -c /etc/kafka/server.properties

Internal DNS and certificate management

Configure advertised.listeners to use fully qualified hostnames resolvable within the enclave – either via an internal DNS server or via /etc/hosts on every host that will connect to the cluster. Using IP addresses directly in advertised.listeners works but complicates certificate management, since certificate SANs must list every IP.

Run an offline root CA using step-ca or CFSSL, both of which have no external dependencies at runtime. Generate broker certificates with SANs covering the broker's hostname. Distribute the CA root certificate to every client's truststore. Set certificate validity periods aligned with your re-accreditation schedule, and maintain a certificate inventory so renewals do not cause unexpected outages.

Container image and artifact management

Pull all required images – Kafka, your monitoring stack, and any Kafka Connect plugins – on an internet-connected machine, export them with docker save, transfer to the air-gapped environment using an approved data diode or portable media process, and load into a local registry with docker load. Pin image tags to specific digests in your deployment manifests to prevent unexpected changes if the local registry is updated. For more detail on air-gapped Kubernetes deployments in defense contexts, see the article on air-gapped deployment patterns for defense.

Azure event hubs as a kafka-compatible alternative

Not every defense workload requires a fully disconnected, self-managed cluster. For programs operating within GovCloud boundaries – Azure Government, IL4, or IL5 – Azure Event Hubs Premium and Dedicated tiers provide a Kafka-compatible endpoint that accepts standard Kafka producers and consumers without code changes. The protocol surface is compatible with Kafka 1.0 and later client libraries.

Event Hubs in Azure Government satisfies FedRAMP High authorization and, for Dedicated tier, supports customer-managed keys via Azure Key Vault Managed HSM, providing the AES-256 at-rest encryption control that classified workloads require. The operational benefit is significant: no broker provisioning, no certificate rotation for the cluster itself, built-in geo-redundancy, and SLA-backed availability.

The trade-off is clear: Event Hubs does not support the full Kafka API surface (no transactions, no exactly-once semantics across topics at the broker level, and no custom ACL model beyond namespace-level RBAC). And for workloads that must be completely air-gapped – with no connectivity to any external network – Event Hubs is not an option. For those programs, self-hosted KRaft clusters remain the only viable path.

Zero-trust integration for kafka consumers

Kafka's ACL model is a necessary but not sufficient security control in a zero-trust environment. Each consumer service should authenticate using a short-lived credential issued by your identity provider at pod or process start time, rather than a long-lived static password. Vault's Kafka secrets engine can issue short-lived SCRAM credentials dynamically, with automatic revocation when the lease expires. Combined with mTLS client certificates rotated on a schedule, this ensures that a compromised service account credential has a limited operational window for an attacker.

Apply network policies at the Kubernetes or firewall layer to ensure that only pods with the correct labels can reach Kafka broker ports. Kafka's native ACLs should be the second line of defense, not the first. For a full treatment of zero-trust architecture applied to defense networks, see the article on zero-trust architecture for military networks.

Corvus.Quantum: kafka-based streaming with post-quantum encryption

Corvus.Quantum is a battle-tested event streaming platform built on Kafka, deployed operationally in Ukraine for real-time defense data processing. It extends standard Kafka with post-quantum encryption at the application layer – using CRYSTALS-Kyber for key encapsulation and AES-256-GCM for payload encryption – so that messages are protected against both current adversary interception and future quantum-capable decryption attacks. This addresses the "harvest now, decrypt later" threat that is particularly relevant for signals and ISR data with a long sensitivity lifetime.

Beyond encryption, Corvus.Quantum provides a pre-hardened broker configuration for classified deployments: KRaft-mode cluster templates, TLS 1.3 certificate automation using an embedded step-ca instance, SCRAM credential rotation integrated with HashiCorp Vault, and classification-aware topic ACL templates. The platform has been validated in environments with no internet connectivity, handling thousands of sensor events per second with sub-100ms end-to-end latency.

For procurement teams evaluating Kafka for defense programs, Corvus.Quantum reduces the engineering effort of securing a Kafka cluster from months to days, while providing an auditable configuration baseline that aligns with CNSA 2.0 requirements and supports integration with existing cross-domain solutions.

Corvus.Quantum delivers a production-ready, post-quantum secured Kafka streaming platform – pre-hardened for classified environments, validated in active operational deployments, and ready for GovCloud or air-gapped integration. If your program requires high-throughput defense streaming without the months of security engineering, talk to our team.

Explore Corvus.Quantum →

Apache kafka for defense: secure real-time messaging architecture