A drone crew sees a vehicle column moving through a tree line. The S2 officer, three kilometers away, is staring at a map with no video. The drone operator narrates over voice radio. By the time that information becomes an entry in the common operating picture, the column has moved. Live video integration in TAK eliminates that gap – it puts the feed directly on the COP, geo-referenced, distributed, and accessible on every ATAK and WinTAK client in the network without manual configuration. This article covers the full integration stack: RTSP transport configuration, CoT video link publication, sensor footprint geo-referencing, relay architecture for constrained links, latency budgets, and display-side configuration in the ATAK Video Receiver.
How TAK handles video: the CoT video link model
TAK does not carry video bytes through TAK Server. The architecture is deliberately decoupled: TAK Server distributes a reference to a stream – a CoT Video link event – while the actual video flows point-to-point (or through a relay) between the source and each viewer. This separation keeps TAK Server's bandwidth requirements manageable and allows video infrastructure to be sized independently of the tactical data network.
A CoT Video event is a standard CoT XML message with type b-i-v. Its detail block contains a Video element with the stream URL, a human-readable alias, the protocol type (rtsp, rtsps, or udp), a codec hint, and a unique stream identifier. When TAK Server receives this event, it persists it and relays it to all connected clients in the relevant group. An ATAK client receiving the Video event automatically adds the stream to the Video Receiver plugin's stream list – the operator can then open the feed by alias without knowing the underlying URL or configuring anything manually.
The same CoT model handles stream revocation: publishing a Video event with the same UID and a stale timestamp in the past removes the stream from client lists. This makes stream lifecycle management programmatic and consistent with the rest of the TAK data model.
Transport layer: RTSP, RTP, and codec selection
RTSP (Real Time Streaming Protocol) is the dominant transport for TAK video feeds. RTSP operates as a control channel – it negotiates session parameters and establishes the stream – while the actual media flows over RTP (Real-time Transport Protocol) on a separate port. The two common RTSP transport modes for TAK are:
RTSP over TCP (interleaved). The RTP media packets are multiplexed into the RTSP TCP connection. This mode traverses NAT and firewall rules more reliably than UDP because it uses a single established TCP connection. It is the recommended mode for feeds traveling over satellite, LTE, or any link with restrictive packet filtering. The trade-off is that TCP's retransmission behavior can add variable latency during packet loss bursts – a connection-quality issue on degraded links.
RTSP with UDP media transport. The RTP media flows over separate UDP ports negotiated during the RTSP SETUP exchange. UDP transport achieves lower base latency than TCP because there is no retransmission – lost packets produce video artifacts rather than stalling the decoder. On a local MANET segment with low packet loss (<1%), UDP transport is preferred. On links with higher loss rates, the decoder stall behavior of TCP is often preferable to the decoding errors of UDP.
For codec selection, H.264 (AVC) at Baseline or Main profile is the universal baseline. Every ATAK-capable Android device and every WinTAK installation can hardware-decode H.264. H.265 (HEVC) reduces bandwidth by roughly 40% at equivalent quality – a significant saving on radio links – but requires explicit decoder support on the receiving device. Older Android ruggedized hardware may lack H.265 hardware decode support, falling back to software decode with higher CPU and latency overhead. The safe choice for a heterogeneous fleet is H.264; H.265 is appropriate when the device set is homogeneous and well-characterized.
Bitrate and keyframe interval
Bitrate selection determines the bandwidth load on the tactical link. Practical guidance by link type: 1.5–2.5 Mbps for a dedicated LTE backhaul with good signal; 800–1200 kbps for a managed Wi-Fi mesh; 400–800 kbps for a MANET radio with a 1–2 Mbps aggregate budget shared with CoT traffic; 200–400 kbps for a satellite or BLOS link. Below 300 kbps, H.264 1080p video becomes visually unacceptable – reduce to 720p or 480p resolution rather than further compressing 1080p.
The keyframe interval (I-frame interval) determines how quickly a new viewer can start displaying the stream. A 1-second keyframe interval means a new connection starts displaying within at most 1 second of the last keyframe. A 4-second interval saves bandwidth but means new viewers wait up to 4 seconds for the first decodable frame. For tactical use, 1–2 seconds is the recommended interval. Note that shorter intervals increase bitrate – a 1-second I-frame interval at 800 kbps produces larger I-frames roughly every 800 kbits, which can cause brief bitrate spikes on the link.
Relay architecture for constrained and multi-hop networks
The direct RTSP model – ATAK clients connect to the source device's RTSP server – works on flat local networks but fails in most operational deployments. UAV ground control stations are on a different IP subnet from ATAK clients. Satellite or BLOS links require relay to bring remote feeds into the tactical network. Multiple simultaneous viewers stress the source device's upload bandwidth. A relay server addresses all three problems.
The relay pulls the stream from the source once – a single connection, single stream of bytes – and re-distributes it to any number of downstream consumers. The CoT Video event's URL is set to the relay's address, not the source. Each ATAK client connects to the relay and receives the same stream without increasing load on the source GCS or UAV link.
A relay server node needs: a direct network path to the source (GCS network segment or satellite backhaul endpoint) and a path to the TAK network. In a standard forward command post deployment, the relay runs on the same physical node as TAK Server, or on a dedicated compute node at the command post. For edge deployments where the relay must run close to the source (near the GCS), the relay can forward streams over an OpenVPN or WireGuard tunnel into the tactical network, with a second relay instance at the TAK Server side redistributing locally.
Publishing video to the TAK network via CoT
The mechanics of the CoT Video event are straightforward but the detail fields matter. A minimal compliant Video event looks like this in the detail block:
<Video url="rtsp://RELAY_IP:8554/feed1" protocol="rtsp" alias="DRONE-ALPHA Camera" uid="VIDEO-DRONE-ALPHA-01" />
Additional optional fields carry codec hints (codec="H264"), network timeout in milliseconds (networkTimeout="10000"), and a buffer time hint (bufferTime="0" – setting this to zero signals to the player that low-latency mode is desired, suppressing the default consumer-grade buffer). The uid field is the persistent identifier – reusing the same UID in a later event updates the stream reference rather than creating a duplicate.
For ATAK plugin developers publishing video from within a plugin: use the CotService API to inject the Video CoT event into ATAK's internal bus. The event then propagates to TAK Server over the active data link and from there to all group members. There is no separate video management API – the CoT event bus is the only distribution mechanism.
Geo-referencing: the sensor footprint overlay
A video feed displayed in the Video Receiver panel shows the operator what the camera sees. The map overlay shows every other operator where the camera is pointing. The two together – synchronized video and footprint – give the COP genuine spatial context that voice description cannot provide.
For UAV feeds, the footprint computation requires five inputs: the drone's WGS84 position (latitude, longitude, altitude above ground), the gimbal's pan angle (horizontal pointing direction relative to the drone heading), the gimbal's tilt angle (depression angle below horizontal), and the camera's horizontal and vertical field of view. From these, four corner ray vectors are constructed and intersected with a terrain surface (a DTED elevation model, or a flat-earth approximation for low-altitude operations). The four intersection points form the footprint polygon vertices.
The footprint is published as a CoT GeoObject – a filled polygon CoT event – using a consistent UID tied to the drone or sensor. It updates at the telemetry rate. For map overlay purposes, 1–2 Hz updates are visually smooth and impose negligible CoT traffic load. At higher gimbal slew rates (fast panning during active tracking), 5 Hz provides better tracking of the footprint's movement on the map. The update rate should be independently configurable from the telemetry rate used for the drone's position track.
Fixed cameras – tower-mounted, vehicle-mounted perimeter sensors – produce a static or slowly-changing footprint. For a fixed-azimuth camera, the footprint is a trapezoid oriented in the camera's heading direction, updated only when zoom level changes or the camera is repositioned. Publishing the footprint for fixed sensors follows the same CoT GeoObject mechanism; the publisher is the sensor management software rather than a UAV telemetry bridge.
Latency budget: display latency and its sources
Total display latency – from the moment a scene is captured by the camera to the moment it appears on the operator's screen – is the sum of five sequential delays:
Encoding latency: 30–80 ms on a capable hardware encoder running H.264. Software encoders on small UAV compute hardware can add 100–200 ms. This is not tunable by the ground station operator.
Network transit latency: 10–50 ms for line-of-sight MANET or LTE; 30–150 ms for multi-hop mesh; 500–600 ms one-way for geostationary satellite relay. Satellite relay is the single largest fixed latency contributor and imposes a hard constraint on the achievable end-to-end latency for satellite-backhaul feeds.
Relay processing latency: less than 5 ms for a well-implemented RTSP relay. Negligible if the relay is on a local node.
Receive-side jitter buffer: the largest controllable latency source. A 2-second jitter buffer – the default in many consumer video players – adds 2 seconds of fixed latency to every frame. For tactical use, the jitter buffer should be reduced to 100–300 ms. The trade-off is increased visual artifacts during bursts of packet loss. This is an operational decision: accept occasional frame glitches in exchange for near-real-time imagery, or accept 2-second latency for smooth playback. For time-sensitive targeting, the 100–300 ms setting is mandatory.
Decode and display pipeline: 20–40 ms for hardware-accelerated H.264 decode on a modern Android device. Not tunable.
Key insight: In field deployments, the most common cause of unacceptable video latency is not the network or the encoder – it is the default jitter buffer setting in the ATAK Video Receiver. A buffer configured for consumer streaming behavior adds 2–4 seconds of latency regardless of network conditions. Verify buffer depth as part of every pre-mission system check and document the correct value in the unit's SOP.
Multi-stream management and operator UX
A single ATAK client can display multiple simultaneous video streams. The Video Receiver plugin presents registered streams by alias in a list; the operator taps to open a feed in a floating panel or full-screen. For units operating multiple UAVs simultaneously, the stream alias naming convention is the primary UX control: a consistent convention (DRONE-[CALLSIGN]-[SENSOR]) lets operators instantly identify the correct feed without trial and error.
Stream priority signals – embedded in the CoT Video event's remarks or in a companion CoT event – can be used by the C2 software to surface the highest-priority feed automatically. TAKpilot, for example, can receive a natural language operator command ("show me the Alpha drone camera") and bring the corresponding stream to the foreground by aliased name without the operator navigating the stream list manually.
For units using WinTAK at the command post, the same CoT Video events distributed via TAK Server populate WinTAK's video list identically. WinTAK's larger screen real estate supports side-by-side video panels, making it the preferred platform for drone operators and C2 nodes monitoring multiple feeds simultaneously. The underlying stream protocol and relay architecture is identical regardless of whether the consumer is ATAK or WinTAK.
Bring live video into your TAK picture
TAKpilot integrates UAV video feeds, sensor footprint overlays, and natural language C2 into a single ATAK-based picture – so every operator on the network sees what the drone sees, geo-referenced and in context. Video link publication, relay management, and stream lifecycle are handled automatically.
This analysis was prepared by Corvus Intelligence engineers who build mission-critical ISR and field applications for defense and government organizations. Learn about our team →