A modern SIGINT system is a software pipeline that consumes complex baseband samples — IQ data — and produces intelligence. Antennas and analog front-ends exist to deliver those samples cleanly; everything that turns those samples into a track, a transcript, a classification, or a geolocation happens in software. This article walks the pipeline end-to-end: hardware, capture, channelization, demodulation, direction finding, acceleration, integration with the operational picture, and storage. The audience is engineers and program managers building or evaluating a software-defined-radio (SDR) stack for defense use.
Adjacent reading: our overview of SIGINT platform components covers system-level decomposition; this article zooms into the signal-processing core that sits between the antenna and the analyst.
1. The SDR Pipeline Stack
The pipeline has three layers. Hardware front-end covers the antenna, low-noise amplifier, filtering, mixing, and analog-to-digital conversion. In defense and research deployments the Ettus USRP family (X310, X410, N320) with RFNoC is the workhorse — large instantaneous bandwidth, GPS-disciplined timing, and an FPGA fabric that lets you push DSP onto the radio itself. ADALM-Pluto serves training and low-budget COTS work. For tactical embedded use, Microsemi/Microchip RFSoC variants and integrated AMD/Xilinx Zynq UltraScale+ RFSoC parts collapse front-end, ADC, and FPGA into one chip — attractive for SWaP-constrained nodes on drones or man-pack collectors.
Driver and transport layer exposes the radio to user-space software. UHD is the USRP-native API and the canonical reference. SoapySDR is a vendor-neutral abstraction that lets the same pipeline target USRP, LimeSDR, BladeRF, HackRF, or Pluto with a runtime configuration change — invaluable when fielded collectors are heterogeneous. VITA 49 (VRT) is the standard wire format for streamed IQ in larger architectures; if you intend to share IQ between subsystems or across the network, plan for VITA 49 from day one rather than retrofitting it.
Processing framework is where the DSP graph lives. GNU Radio is the community-standard, Python/C++ flowgraph environment with a rich block library — fast for prototyping and increasingly production-capable. REDHAWK, originally a US Naval Research Laboratory project, is the framework government SIGINT programs default to: CORBA-based component model, native FPGA integration, and a deployment story aligned with defense IT. Custom pipelines (raw C++/CUDA, or Rust where teams have invested) appear when latency or determinism budgets exceed what flowgraph runtimes provide.
2. IQ Capture and Storage
The first hard problem is volume. The storage math is unforgiving: bytes per second equals sample rate × 2 (I and Q) × bytes per sample. A 100 MS/s capture at 16-bit IQ is 400 MB/s — 1.44 TB/hour, 34 TB/day, per receiver. Double that for 32-bit float intermediate stages. A four-channel coherent collector at 200 MS/s saturates a 10 GbE link before any processing happens.
Three levers reduce the bill. First, capture at the radio's native sample width (12 or 14 bits packed) rather than promoting to 16 or 32 bits at the wire. Second, channelize early — record only the sub-bands of interest rather than the full passband. Third, use SigMF (Signal Metadata Format) as the file convention. SigMF stores IQ in a binary .sigmf-data file alongside a JSON .sigmf-meta file containing sample rate, center frequency, datatype, timing, geolocation, and arbitrary annotations. It is the closest thing the community has to a portable IQ standard, and unlike vendor-proprietary formats it survives the analyst pipeline.
For long-term retention, lossless compression — FLAC adapted for IQ, or domain-specific codecs like Zstandard on packed integer IQ — yields 1.3–1.8× reduction with full reconstructibility. Lossy compression (quantization, decimation, spectral pruning) is acceptable only if the downstream use is bounded; once you have thrown bits away you cannot rerun the demod chain on a different hypothesis.
3. Channelization
Wide-band SIGINT receivers capture tens or hundreds of MHz at once. The processing pipeline almost never operates on the full passband directly — it splits the wideband stream into narrow channels for per-signal analysis. This is channelization, and the algorithm of choice is the polyphase filterbank (PFB) channelizer.
A PFB channelizer combines a prototype low-pass filter, polyphase decomposition, and an FFT to produce N evenly-spaced narrowband output streams from one wideband input — at a fraction of the cost of running N independent down-converters. The trade-off is rigidity: channel spacing is fixed by the FFT size and input rate, so a 1024-point PFB on a 100 MS/s stream gives you 1024 channels of ~97.6 kHz each, whether or not your target signals fit that grid.
For irregular channel plans (a Tetra cluster at 25 kHz spacing co-resident with LTE at 1.4 MHz), a two-stage approach is standard: a coarse PFB for the wideband split, then per-channel arbitrary tuners and resamplers. FFT-based channelizers (overlap-save with frequency-domain windowing) are an alternative when channels are sparse and irregular — you pay more per-channel but avoid the prototype-filter design effort. The right choice depends on channel occupancy: dense uniform grids favor PFB, sparse cherry-picked channels favor FFT-shift-extract.
4. Demodulation and Decoding
Once a signal is isolated in its own narrow channel, the pipeline classifies and demodulates it. The waveforms you will meet repeatedly in defense work include narrowband FM and SSB (legacy voice, amateur, maritime), DMR and dPMR (digital land-mobile radio used widely in eastern Europe and by some paramilitaries), TETRA (public-safety and military trunked radio), P25 (NATO/US public-safety), and LTE/5G NR (commercial cellular that is increasingly co-opted for tactical comms). Each has a known demod chain — symbol timing recovery, carrier sync, equalization, slot framing, error correction — and a published spec.
The hard part is knowing which demod chain to run. Automatic modulation classification (AMC) sits in front of demodulation: given an unknown signal, infer the modulation family (PSK, FSK, QAM, OFDM, GMSK, …) and order, then dispatch to the matching demodulator. Classical AMC uses high-order cumulants and cyclostationary features; modern AMC is dominated by CNN and transformer models trained on synthetic and over-the-air IQ datasets. Our signal classification with ML article covers the model side in depth; the integration point with the SDR pipeline is a classifier block that consumes a fixed-window IQ tensor and emits a modulation label plus confidence into the metadata stream.
5. Direction Finding
Locating an emitter is a distinct sub-pipeline. Three techniques dominate. Time Difference of Arrival (TDOA) uses two or more spatially separated receivers and exploits the picosecond-level difference in signal arrival time to hyperbolically localize the emitter — accurate at long baselines, requires tight time synchronization. Angle of Arrival (AOA) uses an antenna array at a single site and direction-finding algorithms (MUSIC, ESPRIT, Watson-Watt) to estimate bearing — cheaper to deploy than TDOA but accuracy degrades with multipath. Frequency Difference of Arrival (FDOA) exploits Doppler differences between moving receivers, useful for airborne or satellite collection.
All three need synchronized receivers. GPS-disciplined oscillators (GPSDOs) give you ~10 ns RMS timing across an ad-hoc network; for higher accuracy, OCXO or rubidium references with PTP-over-fiber transport are the next step. White Rabbit pushes sub-nanosecond sync where the geometry can carry fiber. The direction-finding network architecture article details the synchronization stack.
The often-ignored reality is geometric dilution of precision (GDOP): even with perfect timing and SNR, the geometry of your receivers relative to the target determines the resulting position-error ellipse. A linear baseline gives you a long, thin ellipse perpendicular to the line — accurate cross-track, useless along-track. Planning collection geometry is a SIGINT engineering problem, not just an antenna problem.
6. GPU and FPGA Acceleration
At wideband rates, CPUs run out of headroom. Two acceleration paths dominate, and each has a domain where it wins.
GPU (CUDA) wins where the workload is data-parallel and tolerant of latency. Large FFTs, PFB channelization, batched matched-filter correlation, ML inference, and wideband search are textbook GPU workloads. cuFFT and NVIDIA's GPU-accelerated GNU Radio blocks make this accessible; a single A100 or L40 will channelize and pre-classify a few hundred MHz of spectrum at real-time rates. The cost is latency — PCIe transfer plus kernel launch overhead puts you in the millisecond range, which is fine for analysis but not for closed-loop EW.
FPGA wins where latency is sub-millisecond or where the pipeline is fixed and the volume is too high to leave the radio. Initial channelization on RFNoC, low-latency demod for protected-data links, deinterleaving of pulsed radar emitters, and any decision-loop that must beat a single radio frame all belong on the FPGA. The cost is development time: an FPGA-resident algorithm is 3–10× the engineering effort of the equivalent GPU kernel, and integration into the toolchain (HLS, simulation, timing closure) demands specialized skills. Our model optimization for edge inference piece covers the related question of squeezing ML models into edge-class hardware.
The practical pattern is a hybrid: FPGA for the front-end channelizer and any hard-real-time loop, GPU for the bulk of the post-channelization analytics, CPU for control and orchestration. Avoid debating CPU-vs-GPU-vs-FPGA in the abstract — pick per-stage, by latency budget and unit economics.
7. Integration with the Operational Picture
A SIGINT pipeline that produces standalone detections is half-built. The outputs — emitter tracks, modulation labels, geolocation ellipses, decoded payloads — must flow into the broader common operational picture, where they are fused with EO/IR, radar, and other sources. Treat each SIGINT detection as a track with the same fields a radar track would carry: identifier, position with uncertainty, time, classification, and source attribution. This is the contract that fusion engines (Link 16, NATO STANAG, custom track managers) expect; the defense data fusion guide covers the receiving end.
Classification handling matters operationally. Raw IQ is frequently classified; derived products (a bearing, a modulation label, a network-id) are often less classified than the source IQ. The pipeline must carry per-product classification metadata and enforce release rules at the fusion boundary — if you let raw IQ leak into an unclassified track stream, the program is over. The first stage of a fusion pipeline (sources and schemas) is where classification labels become enforceable schema fields.
Cross-cueing is the high-value pattern: a SIGINT detection triggers an EO/IR sensor to slew and confirm, or a radar track triggers a directional SIGINT receiver to listen at the bearing. This requires the pipeline to publish detections fast enough — within seconds of first detection — for the receiving sensor to act before the emitter moves or stops transmitting.
8. Storage, Retrieval, and Replay
SIGINT systems retain three tiers of data. The hot tier holds recent IQ (hours to days), typically on NVMe arrays sized for the full ingest rate; this is where forensic replay and re-demodulation happen. The warm tier holds reduced products — channelized narrowband captures, detection metadata, classified-as-interesting clips — on object storage with hours of retrieval latency. The cold tier holds long-term archives on tape or deep object storage, generally only the metadata index plus selectively retained IQ snippets.
Query engines for the metadata layer split along a familiar axis. kdb+ remains the choice for tick-style time-series with sub-millisecond query latency over huge windows — financial roots, but a natural fit for emitter pulse trains and dense detection streams. ClickHouse is the open-source heavy-hitter: column-store, embarrassingly fast on aggregations, and now widely deployed in defense SIGINT analytics where licensing budgets matter. Custom time-series engines (purpose-built over Parquet plus a track index) appear when the schema is too irregular for either off-the-shelf option.
The replay pattern is the operational payoff. An analyst flags a detection at 14:32; the pipeline retrieves the matching IQ window from the hot tier, re-runs the demod with the analyst's hypothesis (different modulation, different framing, different equalizer), and presents the result alongside the original detection. The same replay path supports training: synthetic detections injected into archived IQ become a regression suite for the classifier. Build replay in from day one — bolting it on later means rebuilding the storage layer.
From IQ to Intelligence, Honestly
The pipeline above is not exotic. Every mature SIGINT program runs some version of it, and the engineering bar is well-understood. What separates a competent SDR pipeline from a brittle one is discipline at the boundaries — SigMF instead of bespoke binary blobs, VITA 49 instead of ad-hoc UDP, classified-aware schemas instead of plaintext metadata, replay as a first-class feature instead of an afterthought. Those decisions are cheap on day one and expensive on day 800.