Building a C2 system from scratch, part 2: the fusion engine

Part 1 chose a scope, fixed the four-layer architecture, and designed the canonical track schema. Part 2 builds the engine that turns sensor reports into trustworthy tracks: adapters that bring sources in, the correlation algorithms that decide which reports refer to the same physical object, lifecycle management that tells the COP when a track has gone stale, and the track store that anchors everything. By the end of Part 2 the platform produces operationally useful tracks; it just has nowhere to display them.

The conceptual reference for everything in Part 2 is The Complete Guide to Defense Data Fusion, which surveys the discipline. Here we make specific decisions for the running example platform.

Step 1: the adapter pattern, done strictly

Every sensor produces data in its own format. Radars speak ASTERIX; UAVs speak STANAG 4586; AIS receivers emit NMEA 0183; ATAK clients speak CoT; civilian ADS-B feeds emit a different binary protocol; manually reported sightings arrive through a web form. The adapter layer's job is to translate all of these into the canonical track schema defined in Part 1.

The rule is brutal and worth memorizing: no sensor-specific concept leaks past the adapter. If your fusion engine code references ASTERIX categories, you have a leaky architecture. If your track store has a column for AIS message types, you have a leaky architecture. Adapters are one-way data converters with strict isolation; they expose only canonical tracks upstream.

The pragmatic adapter structure for each source:

Transport – the connector to the source (UDP socket, MQTT subscription, HTTP webhook, file watcher). Resilient to source-side failure: reconnects, backoff, dropped-message accounting.
Parser – translates the wire format to a strongly-typed in-process structure. Validates against the format spec. Rejects malformed input loudly rather than silently.
Normalizer – maps source-specific fields to canonical fields. Coordinate-system conversion (typically to WGS84). Timestamp normalization (UTC, with the three-time-stamp discipline from Part 1).
Emitter – publishes the canonical track update to the message bus. Tags with source identifier, source classification, releasability.

Each adapter is a separate service or process. They share a code-generated client library for the canonical schema but no other code paths. Adding a new sensor type means writing a new adapter, not touching any other component. The detailed integration patterns for the common sources are in Integrating AIS and ADS-B into a Military Picture and the CoT side in Cursor on Target (CoT): The XML Standard Behind Tactical Awareness Apps.

Step 2: wire the message bus

Adapters publish to a durable, ordered, partitioned log. Fusion services consume from it. So does the audit service, the historical-replay service, and any downstream analytics. The message bus is the spinal cord of the platform.

For the running example we use Kafka with topic-per-source-type and additional topics for fusion outputs. Adapters publish to raw.source-type; the fusion engine consumes those and publishes to tracks.updates and tracks.lifecycle. Audit subscribes to everything. The bus pattern, including the throughput and durability trade-offs, is in Message Queues for Defense Data Pipelines.

The architectural decision worth surfacing: do not call HTTP between fusion components. Synchronous request-response coupling makes a fusion pipeline brittle. A sensor surge that stalls one consumer must not stall every upstream producer. The bus with backpressure is the structural solution; HTTP between fusion components is a recurring source of outages.

Step 3: track-to-track correlation

The core of the fusion engine is the algorithm that decides whether an incoming report is an update to an existing track or the birth of a new one. Get this wrong and the operator sees track soup (a thousand symbols where there should be a hundred) or ghost tracks (duplicates that should have merged). Get it right and the COP becomes trustworthy.

The pragmatic pattern uses a two-stage filter.

Stage 1: rule-based gating. For each incoming report, compute the set of candidate existing tracks within kinematic reach – a position-time gate that says "a track moving at most V m/s could have travelled from its last known position to this report's position in this time interval". Identity priors filter further: a report tagged "vessel" cannot match an "aircraft" track. Source compatibility filters: a report from a ground radar cannot match an air track of an underwater platform. The rule-based stage handles 90% of inputs cheaply and unambiguously.

Stage 2: probabilistic association. For the contested cases – multiple candidates inside the gate, ambiguous identity, dense scenarios with crossing trajectories – invoke probabilistic data association. Joint Probabilistic Data Association (JPDA) for moderate density; Multiple Hypothesis Tracking (MHT) for the hardest cases. Both compute a likelihood that an incoming report belongs to each candidate track and update tracks weighted by that likelihood.

The full theoretical model with engineering implications is in The JDL Data Fusion Model: A Practical Engineering Reference. The engineering nuances of when each technique applies, and the tuning required, are in Military Data Fusion Explained.

A specific pitfall worth highlighting: MHT generates an exponential number of hypotheses without pruning. The pruning policy – how many hypotheses to retain, when to merge, when to delete – is more important than the core algorithm. Default to aggressive pruning; tune outward only when the threat picture demands it.

Step 4: track lifecycle management

A track is not a static record. It is born, confirmed, ages, fades, and dies. The fusion engine manages the lifecycle explicitly; the COP displays the lifecycle state so operators know which tracks to trust.

The state machine for the running example:

Tentative – first observation; not yet displayed in the operational COP unless explicitly requested. Decays to deleted if no follow-up within configurable window.
Confirmed – two or more correlated reports, kinematic consistency holds. Promoted from tentative. This is the default state for displayed tracks.
Mature – confirmed and persistent for at least N minutes with consistent updates. Used by downstream analytics that need stable identity.
Fading – no update within expected revisit interval. Display flagged as stale. Configurable per source class (a 30-second-old maritime track is fine; a 30-second-old air track is fading).
Lost – no update for an extended period. Removed from active display but retained in the track store for audit and historical analysis.

Every state transition is logged. The audit service consumes the transition stream and writes immutable records – the topic of Event Sourcing for Defense Audit Trails. The transitions are also exposed on the bus so the COP can render lifecycle state without polling.

Key insight: Operators tolerate a missing track. They do not tolerate a confidently displayed stale track. Lifecycle management is the layer that makes the difference. Build it before the fusion algorithm is fully tuned – it's cheap, and it pays back every time a sensor link drops.

Step 5: the authoritative track store

Fusion outputs a stream of track updates and lifecycle transitions. The track store is the materialized view: the current state of every active track, queryable by the COP and analytics. The architectural decision worth making early: the track store is a read model, not an authoritative source. The authoritative source is the event log on the message bus. The track store is rebuilt from the log on demand.

This pattern – event-sourced state with read-model projections – has three operational benefits. The store can be wiped and rebuilt without data loss. Multiple read models with different shapes can coexist (one for the COP, one for analytics, one for an external API). Time-travel queries become trivial: replay the log up to a chosen time to reconstruct what the platform believed then.

The store itself is PostgreSQL with PostGIS for geospatial indexing. Hot tracks live in memory or a Redis layer in front of PostgreSQL for sub-millisecond reads; the relational store handles longer-tail queries and persistence guarantees. The detailed engineering view is in PostGIS for Defense Geospatial Data.

Resist the urge to add a graph database "for relationships". Relationships between tracks – convoy detection, formation recognition, contact networks – are JDL Level 2 fusion, a separate concern from Level 1 track maintenance. Build Level 1 first, run it in operations for a year, then revisit Level 2 with the operational evidence in hand.

Step 6: test with realistic inputs

A fusion engine tested only with toy load passes integration test and fails operations. The disciplines that catch problems before deployment:

Replay test harnesses. Capture real sensor traces in development and replay them at full rate against the fusion engine. The traces serve as the regression test suite: a new algorithm or schema change must produce equivalent or better results on the existing traces, not just on synthetic load.

Adversarial inputs. Spoofed AIS messages with implausible kinematics. Malformed CoT XML. Radar plots that violate physics (Mach 5 ground tracks). The fusion engine must reject or flag these, not panic, not crash, not produce confident-but-wrong tracks. The discipline is the same as the broader testing discipline in Testing Mission-Critical C2 Systems.

Pattern-of-life detection. Once basic fusion works, layer in PoL analytics – see Pattern-of-Life Analysis in Military Intelligence. The PoL service consumes the same bus topics; it produces enriched track-state annotations rather than competing with the fusion engine.

Step 7: performance targets and headroom

Fusion latency is operationally consequential. Targets for the running example platform: 95th-percentile end-to-end fusion latency under 500 ms (sensor report ingest to track-update message on the bus); 99th percentile under 1.5 s; throughput sustained at 10,000 reports per second with single-digit-percent CPU headroom.

These are tactical-brigade targets. Strategic platforms have looser latency tolerances and higher throughput ceilings. The targets drive architectural decisions: avoid synchronous cross-service calls on the hot path; pre-allocate hot-track state; batch only where the bus permits; instrument every stage of the pipeline so latency regressions surface in CI rather than in operations.

What's next

Part 2 has built the engine. Sensor adapters convert to canonical tracks; the message bus carries the events; the fusion engine correlates reports into tracks; lifecycle management keeps operators honest about freshness; the track store exposes current state. The platform now produces trustworthy track data. It just has no operator-facing surface.

Part 3 builds the Common Operational Picture – the frontend that turns tracks into the map the operator actually uses. Symbology, real-time updates, role-based filtering, and the engineering decisions that determine whether the platform gets adopted in the field.