Modern SIGINT collection produces vastly more signals than human analysts can review. A wideband receiver covering 100 MHz of spectrum in a dense electromagnetic environment may detect thousands of individual signal events per hour. Before any intelligence can be extracted from these signals, they must be classified — sorted into categories that determine how each will be processed and prioritized. Manual classification at this scale is impossible. Automated signal classification, increasingly powered by machine learning, is what makes large-scale SIGINT collection operationally tractable.

Signal classification in the SIGINT context encompasses several distinct but related tasks: determining the modulation type of a signal (AM, FM, PSK, QAM, etc.), identifying the waveform or communication protocol (military tactical radio, LTE, Bluetooth, a specific adversary radio type), and assigning intelligence relevance (military of interest, commercial, unknown). ML approaches have proven effective at all three levels, though the technical requirements and appropriate algorithms differ significantly between them.

The Classification Task: Modulation Type, Waveform, and Protocol

Automatic Modulation Classification (AMC) is the most studied signal classification problem in the communications engineering literature and has the longest history of practical deployment. Given a segment of received IQ samples, AMC determines the modulation scheme used: whether the signal is amplitude-modulated (AM, DSB, USB), frequency-modulated (FM, FSK), or phase/amplitude modulated (BPSK, QPSK, QAM-16, QAM-64, and so on). This classification is foundational — a BPSK signal and a QAM-64 signal require completely different demodulation chains, and misclassification means the signal cannot be decoded.

Beyond modulation type, waveform identification attempts to recognize specific communication standards or radio types from their signal characteristics. A TETRA signal has different spectral and temporal characteristics than a military Link 16 waveform, even if both are digital. A specific adversary tactical radio may have distinctive pulse shaping, guard intervals, or synchronization sequences that distinguish it from other signals in the same modulation class. Protocol identification — determining which communication protocol is in use — requires either demodulating and inspecting the bitstream, or recognizing protocol-specific patterns in the physical layer signal structure.

Intelligence relevance classification is the highest-level task: given a classified signal, assigning it a priority score that determines how quickly it will be reviewed and with what resources. This requires combining the technical classification result with contextual information — the frequency band, the operational area, time of day, and the history of signals observed from the same emitter — to produce a score that reflects the likelihood that this signal contains actionable intelligence.

Feature Engineering: Spectrograms, IQ Samples, and Eye Diagrams

Machine learning models require numerical feature representations of the signals they classify. The choice of feature representation has substantial impact on model performance and the type of ML architecture that is appropriate.

Raw IQ samples. The most direct representation is a segment of raw IQ samples — complex-valued time-series data directly from the receiver. Convolutional neural networks can learn classification-relevant features directly from raw IQ data without hand-crafted feature engineering. The DeepSig RadioML dataset, which has become a benchmark in the research community, demonstrates that CNNs trained on raw IQ data outperform many classical AMC algorithms based on hand-crafted features. However, raw IQ samples are sensitive to channel effects — carrier frequency offset, channel noise, and multipath — that must be handled in the model or preprocessing pipeline.

Spectrograms. A spectrogram represents a signal as a 2D image with time on one axis and frequency on the other, with pixel intensity encoding signal power. The short-time Fourier transform (STFT) is the standard method for computing spectrograms. Spectrograms are intuitive — an experienced analyst can often identify a signal type by visual inspection of its waterfall display — and are well-suited to convolutional neural network classifiers that are optimized for 2D image classification. Different modulation types produce visually distinct spectrogram patterns: an FSK signal shows discrete frequency steps, a frequency-hopping signal shows the characteristic scattered appearance of hop occupancy, a QAM signal appears as a dense filled band.

Eye diagrams and constellation diagrams. An eye diagram is constructed by overlaying successive symbol periods of a demodulated signal. For a clean signal, the overlaid traces form an "eye" pattern whose width and height reflect signal quality. Constellation diagrams display the complex symbol values of a demodulated signal as points in the I/Q plane — a QPSK signal produces four distinct clusters, a QAM-16 signal produces a 4×4 grid of 16 clusters. These representations require demodulation as a preprocessing step, which introduces a dependency on having a correct initial modulation estimate. They are most useful as second-stage features for within-class classification — distinguishing QAM-16 from QAM-64 after the QAM class has been identified.

Supervised Approaches: CNN for Modulation Classification

Supervised machine learning for signal classification requires a labeled training dataset — a collection of signal examples where the correct class label is known. The model learns to map from the signal representation to class labels by minimizing a loss function over the training data.

Convolutional neural networks (CNNs) have become the dominant architecture for AMC. The intuition is direct: a CNN applied to a spectrogram image learns to detect visual features (spectral patterns, temporal structures) that are diagnostic of specific modulation types, in much the same way a CNN for image classification learns to detect edges, textures, and shapes. Applied to raw IQ data, a 1D CNN learns temporal patterns in the complex-valued time series.

A typical AMC CNN architecture consists of several 1D or 2D convolutional layers (depending on input representation), max-pooling layers for spatial/temporal downsampling, batch normalization layers to improve training stability, and fully connected layers mapping to the class probability vector. ResNet-inspired architectures with residual connections have shown improved performance over simple CNN stacks for AMC tasks.

Training data for defense AMC models is a significant challenge. The standard approach uses signal simulation: a communications simulation generates clean signals with the target modulation parameters, and a channel simulation adds realistic channel effects (AWGN, Rayleigh fading, frequency offset, clock error) at varying SNR levels. Models trained on simulated data are then evaluated on real-world captured signals, with the simulation realism being the primary determinant of the sim-to-real performance gap. High-fidelity hardware-in-the-loop simulation — where software-generated signals are transmitted through actual RF hardware and received under controlled conditions — significantly improves the quality of training data.

Performance benchmarks on the RadioML 2018 dataset, the most widely used public benchmark, show that well-tuned CNN models achieve classification accuracy above 90% across 24 modulation classes at SNR values above 10 dB. Performance degrades significantly at low SNR (below 0 dB), which is the operational regime for many SIGINT scenarios involving distant or low-power emitters. This low-SNR performance gap between lab benchmarks and operational reality is an active research area.

Unsupervised Approaches: Clustering Unknown Signals

Supervised classification handles known signal types well. But a core SIGINT challenge involves signals that are not in the training set — new adversary waveforms, modified communication protocols, improvised systems. Supervised models that encounter an unknown signal type will misclassify it as the nearest known class, potentially with high confidence. The model cannot know what it does not know.

Unsupervised clustering approaches address this problem by grouping signals based on feature similarity without reference to predefined class labels. A clustering algorithm applied to a collection of intercepted signals will identify groups of signals with similar characteristics, even if those characteristics do not match any known signal type. New clusters that cannot be matched to known signal types are flagged as unknowns for analyst review.

Operational insight: The most valuable output from unsupervised clustering in an operational SIGINT context is often not the cluster assignments themselves, but the cluster centroids — the representative feature vectors that characterize each identified group. These centroids serve as the seed for a new labeled class when analysts confirm the nature of an unknown signal, allowing supervised models to be rapidly updated to handle the new type.

Common clustering algorithms applied to SIGINT include k-means (computationally efficient, requires specifying k in advance), DBSCAN (density-based, handles irregular cluster shapes and automatically identifies noise points), and Gaussian Mixture Models (probabilistic, provides per-assignment confidence scores). For high-dimensional feature spaces, dimensionality reduction — using t-SNE or UMAP to project features to 2D for visualization, or autoencoders to learn compact representations — is typically applied before clustering.

Semi-supervised approaches combine both paradigms: a model is trained with a supervised loss on labeled examples and an unsupervised loss (clustering or reconstruction) on unlabeled examples. This is well-suited to the SIGINT domain, where labeled data is scarce and expensive to produce but unlabeled operational intercepts are abundant. The unlabeled data helps the model learn a better feature representation even when labels are not available.

The practical deployment of ML signal classification in operational SIGINT systems requires attention to model update cycles, hardware constraints on the processing node (which may need to run inference on a rugged embedded platform with limited GPU resources), and the human-machine interface for analyst interaction with the classifier's outputs. A classifier that produces correct outputs but presents them in a way that disrupts analyst workflow will not be used. Integrating classification confidence scores into the alert prioritization pipeline — surfacing high-confidence classifications for automated processing while flagging low-confidence or unknown-class signals for analyst review — is the key integration design challenge.