Cloud-native software is built on an assumption that almost never holds at the tactical edge: that the network is always there. Service discovery, distributed databases, container orchestrators, and stateless API tiers all presume that any node can reach any other node within a few milliseconds, all the time. Move that same software into a forward operating base, a vehicle on the move, or a dismounted team behind a hill, and the assumption collapses. The link drops for hours, comes back for ninety seconds, then offers two kilobits per second through a satellite terminal. This is the DIL environment – disconnected, intermittent, and limited – and running services in it requires inverting many of the defaults that make cloud software convenient in a data center.
What DIL actually means for software
The three letters describe three distinct failure modes, and a tactical edge system has to handle all of them simultaneously. Disconnected means the link is gone entirely, sometimes for the duration of a mission. A patrol may operate for two days with no reachback at all. Intermittent means connectivity flickers unpredictably – a vehicle passes behind terrain, a directional antenna loses lock, an adversary's jamming sweeps across the band. The link is up for seconds, down for minutes, and the software cannot predict the pattern. Limited means that even when a link exists, it is narrow and slow: a tactical SATCOM channel might offer a few kilobits per second shared across a whole element, with latency measured in hundreds of milliseconds or worse.
A system designed for the data center treats any of these as an error to be retried. A system designed for DIL treats all of them as the normal operating state. The practical consequence is that no operator action can ever block waiting for a remote service, no critical data can live only on a node the operator cannot reach, and every byte sent over the link has to earn its place. These constraints are not edge cases to be bolted on after the fact; they shape the architecture from the first design decision. The same discipline underpins our broader work on resilient defense cloud strategy, where availability across heterogeneous environments is the governing requirement.
Local-first: the foundational inversion
The single most important architectural decision for a tactical edge cloud is to make every node local-first. In a local-first design, each edge node holds a complete, authoritative copy of the working data it needs and serves every operator request from that local copy. Reads and writes complete against an embedded store at local latency. Synchronization with peers and the enterprise is a background process that runs opportunistically whenever connectivity permits – it is never on the critical path of a user action.
This inverts the usual cloud pattern, where the client is thin and the server holds the truth. At the edge, the node holds the truth for as long as it is disconnected, and the enterprise becomes one more peer to reconcile with rather than a dependency that must be reachable. The operator experience is identical whether the node has a fat pipe back to headquarters or no connectivity at all – and that invariance is the whole point. An operator who has to think about whether the network is up before deciding whether an action will work has already been failed by the system.
Where the state lives
Local-first means provisioning each node with a real database, not a cache. An embedded SQLite or embedded relational engine holding the node's full working set is a common choice; for collaborative state, a document store with built-in replication semantics works well. The store must be durable across power loss – edge hardware is rebooted, dropped, and run off depleting batteries – so write-ahead logging and crash recovery are not optional. Crucially, the local store is the source of truth while disconnected. The temptation to treat the local copy as a disposable cache that can be invalidated by a server is exactly the data-center reflex that DIL design has to suppress.
Synchronization: moving only what matters
If every node holds its own authoritative copy, the hard problem becomes keeping those copies usefully aligned over links that are mostly absent and always narrow. Naive replication – streaming a full state snapshot, or pushing every write the instant it happens – is hopeless over a few kilobits per second. The sync protocol has to be delta-based, prioritized, resumable, and idempotent.
Delta-based means each sync exchange carries only the records that changed since the last successful exchange with that peer, identified by a per-peer high-water mark or vector clock. Prioritized means the outbound queue is ordered by operational value: friendly and hostile positions, orders, and alerts go first; routine status updates next; bulk media such as imagery and full-motion video last, in a separate low-priority lane that uses only spare capacity. Resumable means an interrupted transfer – the normal case when links are intermittent – restarts from the last acknowledged record rather than from the beginning, so a sync that gets ninety seconds of connectivity makes ninety seconds of real progress. Idempotent means replaying a batch that was partially delivered before the link dropped produces no duplicates, because the receiver keys on stable record identifiers rather than arrival order.
Compression matters more here than almost anywhere else in software engineering, because the link is the binding constraint. Structured operational data compresses extremely well, and a dictionary tuned to the message schema can shrink a position report or order to a fraction of its wire size. The engineering goal is a useful, current operational picture synchronizing over a channel that a data-center engineer would consider unusable.
Opportunistic and store-and-forward transport
Because connectivity is unpredictable, the transport layer has to be opportunistic: the moment any link appears – primary radio, line-of-sight mesh to a neighboring vehicle, a brief SATCOM window, even a courier carrying a physical drive between nodes – the sync engine drains as much of its priority queue as the window allows. Store-and-forward routing lets one node relay another node's pending updates when it has better connectivity, so a vehicle that surfaces from behind terrain can carry the dismounted team's reports forward. This is closer in spirit to delay-tolerant networking than to a request-response API, and designing the sync engine around that model rather than around HTTP semantics is what makes it survive the intermittent case.
Reconciliation: resolving concurrent edits
The price of letting every node write locally while disconnected is that two nodes will inevitably edit the same thing without seeing each other's change. When they reconnect, the system has to reconcile. There is no single correct strategy; the right one depends on the shape of the data.
Append-only event logs sidestep conflict entirely. If a node only ever appends records – sensor readings, reports, log entries – then merging two logs is just a union, ordered by a logical clock. Most telemetry and reporting data fits this model, and it should be the default whenever the data is naturally a stream of events rather than a mutable record.
Conflict-free replicated data types (CRDTs) handle shared mutable state that several nodes edit collaboratively – a shared map of graphics, a running roster, a set of waypoints. A CRDT carries enough metadata that any two replicas merge deterministically to the same result regardless of the order in which updates arrive, which is exactly the guarantee an intermittent network cannot otherwise provide. The cost is per-record metadata overhead, so CRDTs are reserved for genuinely collaborative state rather than applied blanket-wide.
Last-writer-wins with operator arbitration covers the remainder: mutable records where neither an event log nor a CRDT fits. A hybrid logical clock decides a deterministic winner so the system never deadlocks, but the loser is preserved and the record is flagged for human review. The reasoning is that a genuine semantic conflict – two operators independently changing the same target's classification – is a judgment a human should make, not one an automatic rule should silently bury. This pattern shares a conceptual lineage with the offline-first design used in dismounted field applications, where the same disconnected-edit problem appears at the device level.
Key insight: The hardest part of a DIL system is not surviving the disconnection – it is reconverging cleanly afterward. Any design can buffer writes while the link is down. The systems that fail in the field are the ones that produce duplicated, contradictory, or silently-lost data when three nodes that each edited offline finally reconnect at once. Spend the design effort on the reconciliation path, test it under simultaneous multi-node reconnection, and treat clean convergence as the primary acceptance criterion.
Running cloud-native services on edge hardware
Tactical edge nodes are not hyperscale racks. They are ruggedized small-form-factor compute – a mounted server in a vehicle, a transit-case cluster at a command post, sometimes a single board computer in a backpack – running on constrained power and cooling. Yet the goal is still to run cloud-native services, because the same containerized services should run identically in the enterprise data center, in a regional node, and at the forward edge. That portability is what lets a capability be developed once and fielded everywhere.
The practical approach is a lightweight container orchestrator sized for the edge rather than the data center. A single-binary Kubernetes distribution such as K3s, or a small managed cluster, gives the same deployment model and the same manifests as the enterprise without the control-plane weight that edge hardware cannot spare. The same hardening discipline still applies – the threat model does not soften because the cluster is small, and the practices in our guide to hardening Kubernetes for defense carry directly to edge clusters. What changes is sizing and failure assumptions: the orchestrator must keep workloads running with no reachback to a central control plane, image pulls must come from a local registry seeded before deployment rather than an internet pull, and the cluster has to tolerate a node simply vanishing when a vehicle drives out of range.
Identity and security without reachback
A disconnected node still has to authenticate operators and authorize actions, and it cannot phone home to do it. Credentials and authorization policy have to be cached locally with sensible offline lifetimes – long enough to outlast a realistic disconnection window, short enough that a captured node does not stay trusted indefinitely. Certificate revocation is the canonical hard case: a node that cannot reach a revocation list has to fall back to short-lived certificates whose natural expiry bounds the exposure. Encrypting the local store and providing a fast, irreversible zeroize for hardware at risk of capture are baseline requirements, not enhancements, given that edge nodes are the part of the architecture most likely to fall into hostile hands.
Validating a DIL design before it is fielded
The failure mode that ends programs is discovering in the field that a system tested only on a clean LAN does not actually work over a tactical radio. A LAN has none of the properties that define DIL, so a green test suite on a LAN says nothing about DIL behavior. Validation requires a network emulator placed between the nodes that injects the real conditions – link drops of varying duration, latency in the hundreds of milliseconds, packet loss, and hard bandwidth caps matched to the target radios. The acceptance test is twofold: operators must be able to complete every mission-critical task with the link held down for the full mission duration, and the nodes must reconverge to a single consistent picture once connectivity returns, including under the stress case of several nodes reconnecting simultaneously after each edited offline.
A system that passes both tests has earned the right to be called tactical edge cloud. One that has only ever run on a LAN has been tested for convenience, not for the environment it will actually face.
Build for the disconnected edge
Corvus Quantum is engineered for DIL conditions from the ground up – local-first services, prioritized delta sync, and clean multi-node reconciliation that hold a consistent operational picture together whether a node has a fat pipe or no connectivity at all.
This analysis was prepared by Corvus Intelligence engineers who build mission-critical cloud and field systems for defense and government organizations. Learn about our team →