A UAV carries sensors. A sensor produces data. Data becomes information when it is fused with context and placed in front of an operator who can act on it. The distance between those two endpoints – sensor capture and operator decision – is the sensor-to-decision loop, and UAV reconnaissance software is what governs its latency, fidelity, and reliability. This article examines the full pipeline: from onboard sensor configuration through the downlink, into the ground station, through the video analytics pipeline, and into the common operating picture displayed to S2 and S6 officers in the field.
The sensor-to-decision loop: architecture overview
The loop has five discrete stages, each introducing latency and each representing a potential point of failure:
1. Onboard sensor and encoding. Electro-optical (EO), infrared (IR), synthetic aperture radar (SAR), and SIGINT payloads produce raw data that must be compressed and multiplexed for transmission. For video payloads, H.264 or H.265 encoding happens on the UAV's video encoder board. MISB (Motion Imagery Standards Board) KLV metadata – platform position, attitude, sensor field of view – is embedded in the transport stream at this stage. Encoding latency on capable hardware is typically 30–80 ms.
2. Data link. The encoded transport stream is transmitted over the air via the C2 link (command and control uplink) and a separate, higher-bandwidth intelligence downlink. Common downlink types include Tactical Common Data Link (TCDL) at C-band or Ku-band for MALE and HALE platforms, and point-to-point 2.4 GHz or 5.8 GHz links for tactical UAS. Link latency for a well-designed line-of-sight system is 10–50 ms; satellite relay adds 500–600 ms one-way (geostationary) or 20–80 ms (low-Earth orbit), which significantly changes the latency budget for time-sensitive targeting.
3. Ground station receive and decode. The ground data terminal (GDT) receives the RF signal and outputs a STANAG 4609 MPEG-2 transport stream over Ethernet or serial. The ground station software decodes the stream, demultiplexes the KLV metadata from the video elementary stream, and passes both to downstream consumers. A well-implemented receive stack adds fewer than 100 ms of processing latency at this stage.
4. Analytics and geolocation. Decoded frames are passed to the video analytics pipeline – detection, classification, and tracking – while the simultaneously extracted KLV metadata feeds the geolocation engine. The output of this stage is a set of geolocated, classified detections published as events to the tactical network. Analytics latency depends on model complexity and hardware; a YOLOv8-sized model on a GPU-equipped workstation processes 1080p frames faster than real time at under 20 ms per frame. On CPU-only edge hardware, the same model may require 80–150 ms per frame.
5. Operator display and decision. The operator views the video feed, the sensor footprint overlay on the map, and the analytic detection markers in the common operating picture. Decision latency – the time from display to a command or report – is a human factor that no software can fully control, but reducing display latency and improving information density directly reduces cognitive load and shortens the decision cycle.
STANAG 4609 and MISB KLV: the data contract
STANAG 4609 is the foundational data contract for UAV motion imagery within alliance interoperability frameworks. It specifies that UAV video shall be carried as an MPEG-2 transport stream with embedded MISB Local Set (LS) 0601 metadata. LS 0601 defines approximately 140 tagged data elements covering every parameter an analyst or automated system needs to geolocate content in the image: sensor position, platform heading, pitch, roll, sensor FOV angles, slant range, obliquity angle, and more.
The KLV (Key-Length-Value) encoding used by MISB is a compact binary format. Each metadata element is identified by a 1-byte or 2-byte key, followed by a length field, followed by the value in a standardized floating-point or integer encoding. A minimal compliant KLV packet for a video frame might be 80–120 bytes. At 30 frames per second, this adds roughly 3–4 kbps of overhead to the transport stream – negligible on any tactical data link.
For integrators, the critical implementation point is that KLV metadata must be extracted in synchrony with the video frames it describes. KLV packets are embedded in the transport stream as private data PIDs alongside the video PID. A parser that processes the two PIDs asynchronously – or that delays video display without delaying metadata application – will produce geolocation errors that increase with platform velocity and gimbal slew rate. At 60 knots groundspeed and 1-second metadata lag, geolocation error can exceed 30 meters.
Mandatory LS 0601 fields for geolocation
Not all 140+ LS 0601 fields are required for basic geolocation. The minimum set needed to compute where a pixel in the image falls on the ground includes: sensor latitude (tag 13), sensor longitude (tag 14), sensor true altitude (tag 15), platform heading angle (tag 5), platform pitch angle (tag 6), platform roll angle (tag 7), sensor horizontal FOV (tag 16), sensor vertical FOV (tag 17), sensor relative azimuth angle (tag 18), sensor relative elevation angle (tag 19), sensor relative roll angle (tag 20), and slant range (tag 21). All other fields are supplementary – useful for analysis but not required for real-time geolocation computation.
Video analytics pipeline: detection and classification
Automated object detection is the stage most dependent on domain-specific engineering. General-purpose detection models trained on civilian imagery perform poorly on UAV-perspective military imagery – the viewing angle, scale, camouflage, and target diversity are all different. A model used in production should be fine-tuned on a labeled dataset representative of the operational environment: target types (vehicles, personnel, emplacements), altitude range, sensor type (EO vs. IR), and background classes (urban, rural, forested, mixed).
The standard architecture for real-time UAV video analytics uses a two-stage pipeline: a fast single-stage detector (YOLOv8 or equivalent) running at full frame rate for detection and rough classification, feeding detections to a slower but more accurate classification model that confirms class and assigns confidence. The fast detector prioritizes recall – catching all potential targets even at the cost of false positives. The classifier filters the detection list and assigns the final label. This separation allows the system to operate at video frame rate while applying more compute to confirmed detections.
Geolocation of detections
Each detection bounding box must be converted to a ground-plane WGS84 coordinate before it can be published as a geospatial event. The computation uses the pixel coordinates of the detection centroid, the sensor geometry from the KLV metadata, and a terrain elevation model (DTED Level 1 or Level 2). The standard approach is to cast a ray from the sensor through the image-plane pixel and intersect it with the terrain surface. Without a DEM, a flat-earth approximation using slant range introduces elevation-dependent errors that become significant over hilly or mountainous terrain.
For detection tracking – linking detections across frames to produce persistent tracks – a Kalman filter or SORT (Simple Online and Realtime Tracking) algorithm is the production standard. Persistent tracks reduce operator cognitive load compared to per-frame detections: instead of a map that flickers with new markers every frame, the operator sees a small number of stable, moving markers with confidence history.
Ground station integration and C2 link architecture
The ground station is the hub of the sensor-to-decision loop. A production ground station for a tactical UAS program typically runs several software components in parallel: the transport stream receiver and demultiplexer, the video display application (with mission recording), the KLV metadata extractor, the analytics pipeline, and the CoT/tactical network publisher.
The C2 uplink – commands from operator to UAV – and the intelligence downlink are logically separate but often share the same RF system. C2 link integrity is harder to protect than the downlink: command messages are small but must arrive with very low latency and high reliability. The standard architecture for C2 link integrity uses a dedicated narrowband uplink at a separate frequency from the wideband intelligence downlink, with AES-256 encryption and FHSS (frequency-hopping spread spectrum) for jamming resilience. Software on the ground station must monitor C2 link quality metrics – bit error rate, round-trip command acknowledgment latency – and alert the operator before link degradation causes loss of aircraft control.
ATAK plugin pattern for UAV feeds
Integrating a UAV feed into ATAK – the standard tactical situational awareness application – follows a well-established plugin architecture. A UAV integration plugin has three functional components that operate concurrently.
Video panel component. A SurfaceView-backed panel inside the ATAK plugin window renders the decoded video stream. The video decoder runs in a background thread, pushing frames to the surface at the stream's native frame rate. The panel should include overlay annotations (target boxes from the analytics pipeline) rendered via Canvas on a transparent layer above the video surface, synchronized to the frame being displayed.
Footprint overlay component. The four corner coordinates of the sensor footprint – computed from MISB geometry fields and the terrain model – are published as a CoT polygon event and rendered on the ATAK map as a semi-transparent trapezoid. The footprint polygon updates at the KLV metadata rate (typically 1–10 Hz for most systems). At slower update rates, the footprint may appear to lag the video display during rapid gimbal slews; the fix is to extrapolate footprint position using platform attitude change rate between metadata updates.
Detection publisher component. Geolocated detections from the analytics pipeline are published as CoT point events to TAK Server with appropriate CoT type codes. Detection tracks with persistent identity are published with a consistent UID across updates, so ATAK clients display them as moving markers rather than a sequence of independent events. The plugin should allow the operator to confirm or reject a detection – confirmed detections get promoted to a higher-confidence CoT type; rejected detections are removed from the picture.
Latency budgets for time-sensitive targets
Time-sensitive targeting – the process of detecting, identifying, and engaging a target that presents for a short window – imposes the strictest latency requirements on the UAV reconnaissance software stack. The relevant military doctrine specifies a targeting cycle under 30 minutes for deliberate targeting; time-sensitive targeting compresses this to minutes or seconds depending on the threat type.
Within the software pipeline, the latency budget allocations that matter most are:
Video display latency: under 500 ms total from sensor capture to operator display. This means encoding (80 ms) + link (50 ms, line of sight) + decode (30 ms) + display pipeline (20 ms) = approximately 180 ms for a well-optimized system. Buffering for adaptive bitrate streaming or jitter compensation often adds 200–500 ms on top of this – aggressive buffer settings are the most common source of unacceptable display latency.
Detection-to-CoT latency: under 3 seconds from detection in the analytics pipeline to CoT event visible on connected ATAK clients. This budget covers detection inference (20–150 ms), geolocation computation (10 ms), CoT event construction and publish (5 ms), TAK Server relay (50–200 ms depending on federation hops), and ATAK client update (100–500 ms depending on update polling interval).
Operator-to-C2 latency: under 2 seconds from operator designation of a target in the ATAK plugin to a command reaching the UAV operator or fire control element. This is primarily a network and C2 system latency – the UAV integration plugin's contribution is negligible if it publishes CoT immediately on operator action.
Key insight: The most common latency failure in field-deployed UAV reconnaissance software is not the analytics pipeline – it is video buffering. Ground station software configured with a 2-second jitter buffer for stream stability will always miss the latency budget for time-sensitive targeting. Buffer depth must be tunable by the operator and documented as a mission-planning parameter.
For a deeper treatment of the computer vision architecture used in the analytics pipeline, see the article on computer vision for ISR drones.
Integrate UAV feeds into your tactical picture
TAKpilot connects UAV feeds, ground sensors, and operator displays into a unified ATAK-based picture – built for real operational tempo. STANAG 4609 ingest, MISB geolocation, video analytics, and CoT publishing in a single deployable package.
This analysis was prepared by Corvus Intelligence engineers who build mission-critical ISR and field applications for defense and government organizations. Learn about our team →