An AI model that recommends a course of action in a defense decision is only as useful as the trust an operator and an accreditation board are willing to place in it. A bare label – "hostile, 0.94" – is not enough. The operator must understand why the model reached that output, how confident it actually is, and whether the case falls inside the envelope the model was tested in. The accreditor must see evidence that the model's behaviour is understood, not merely measured. This article walks through explainable AI (XAI) for defense decision support: the attribution methods that show what drove a prediction, the calibration that makes confidence meaningful, the audit trail that survives an after-action review, and the body of evidence that accreditation bodies require before a model is fielded.

What "explainable" has to mean in defense

Consumer XAI exists mostly to help engineers debug models. Defense XAI carries a heavier load. The same explanation artefact has to serve three distinct audiences with conflicting needs, and a system that designs for one tends to starve the other two.

The operator needs a fast, glanceable rationale inside the decision loop. They are not going to read a 200-feature SHAP table while a track is closing; they need to know, in the time available, which inputs drove the recommendation and whether the system is confident enough to be trusted right now.

The accreditor needs the opposite: aggregate evidence across a held-out test set, characterising where the model is reliable and where it fails. A single per-case explanation tells an accreditation board almost nothing about residual risk; they want attribution patterns, calibration curves, and documented failure modes that define the conditions of safe use.

The reviewer – the after-action analyst or the inquiry – needs a reproducible record of what the system recommended, why, under which model build, and what the human did with it. This is closely tied to questions of human control and accountability, where the explanation becomes part of the accountability chain rather than a debugging convenience.

Feature attribution: which inputs drove the output

Attribution is the foundation of model transparency. It answers the question "which parts of the input were responsible for this prediction" and assigns each input feature, token, or pixel a contribution score for the specific output. The method has to match the model class.

Gradient-based methods – Integrated Gradients, Grad-CAM, and saliency maps – apply to differentiable models (the convolutional and transformer detectors used across vision and sequence tasks). Integrated Gradients integrates gradients along a path from a baseline input to the actual input, satisfying useful axioms like completeness: the attributions sum to the difference in model output between baseline and input. Grad-CAM produces a coarse heatmap over the last convolutional layer and is cheap enough to compute inline, which is why it dominates real-time vision explanations.

SHAP (Shapley additive explanations) is the standard for tabular and tree-based models – the fusion scores, sensor-reliability features, and structured track attributes that often feed a defense decision-support model. SHAP attributes a prediction to its features using a game-theoretic allocation that is consistent and locally accurate. Its weakness is cost: exact SHAP is exponential in the number of features, and even the approximations (KernelSHAP, TreeSHAP) can be too slow for a tight real-time loop on a large model.

Attribution can mislead – validate it

An attribution map is itself a model output, and it can be wrong. A detector that has learned a spurious correlation – for example, associating a target class with a background texture that happened to co-occur in training – will produce confident attributions pointing at the background. The discipline is to validate attribution against synthetic controls: construct inputs where the true salient region is known, confirm the attribution tracks it, and treat any divergence as a finding about the model, not noise in the explanation. This validation is part of the model validation evidence an accreditation package needs.

Counterfactuals: what would change the decision

Attribution explains the decision as it was made. A counterfactual explains the decision boundary near the case: it is the smallest change in the input that would flip the output – the smallest perturbation that would move a classification from "hostile" to "unknown", for example. Operators frequently find counterfactuals more actionable than attribution, because a counterfactual tells them how brittle the recommendation is. If a one-pixel change or a marginal contrast shift flips the label, the operator knows the system is sitting on a knife-edge and should weight the recommendation accordingly.

Counterfactual search is expensive – it is an optimisation problem in input space – so it belongs in the asynchronous, accreditation-grade tier rather than the real-time loop. But the artefact it produces is exactly what an accreditor wants when characterising the stability of a decision boundary across a test set.

Communicating uncertainty honestly

The single most damaging failure in fielded decision-support AI is mis-stated confidence. A raw softmax score is not a probability; modern networks are systematically over-confident, and an out-of-distribution input can produce a 99% score on something the model has never seen. The first time an operator watches a "99% confident" recommendation be catastrophically wrong, trust in the system collapses – and it does not come back.

Calibration is the fix. A calibrated model is one whose stated confidence matches its empirical accuracy: of the cases it reports at 80%, about 80% are correct. Temperature scaling on a held-out set is a cheap, effective post-hoc calibration for classifiers; isotonic regression handles multi-class cases. The interface should present this calibrated probability, not the raw score.

Aleatoric versus epistemic uncertainty. A mature system distinguishes irreducible sensor noise (aleatoric – the input is genuinely ambiguous) from model ignorance (epistemic – the input is outside the training distribution). The two demand different responses. Aleatoric uncertainty means "collect more or better data on this target"; epistemic uncertainty means "the model should not be trusted here at all." An out-of-distribution detector turns the second case into an explicit signal, so the system can abstain and route the decision to a human rather than presenting a confident-looking recommendation it cannot support. This abstention behaviour is central to AI decision support in C2 systems, where the human-machine handoff is the whole point.

Key insight: The goal of defense XAI is not to make the model explain itself perfectly – it is to make the model's limits legible. A system that says "I do not know, this input is outside my envelope, defer to a human" earns more operational trust than one that produces a confident, articulate, and wrong rationale. Calibrated abstention is worth more than eloquent explanation.

The audit trail: explanation as evidence

In defense, an explanation that exists only at inference time and is then discarded has no evidentiary value. The accreditation board, the after-action review, and any subsequent inquiry all need to reconstruct a decision after the fact – and they need to reconstruct it exactly. That requires persisting, for every decision, a structured record.

A minimally sufficient audit record captures: the model identity and version hash; the exact input, or a content hash plus a reference to retained source data; the output and its calibrated confidence; the explanation artefact computed for that case; the data and configuration versions in force at the time; and the operator's action – accept, reject, or override – with a timestamp. The version pinning is not optional. An explanation logged today is reproducible only if the model build and data version that produced it can be reconstituted; without that, the next model update silently invalidates every prior explanation and the audit trail loses its meaning.

Tiered explanation to meet the latency budget

Rigorous explanation and real-time response are in direct tension. Exact SHAP or counterfactual search cannot run inside a sub-second decision loop. The resolution is a tiered pipeline. A cheap inline explanation – calibrated confidence, a Grad-CAM saliency map, the top contributing tracks or features – is computed within the latency budget and shown to the operator immediately. The expensive accreditation-grade explanation – full attribution, counterfactual analysis – is queued to an asynchronous worker and attached to the logged decision record out of band. The operator gets a fast rationale; the audit trail gets the rigorous one. Neither audience is compromised to serve the other.

The evidence accreditation bodies require

Accreditation is fundamentally an exercise in bounding residual risk. A board does not field a system because its aggregate accuracy is high; it fields a system because it understands where the system works, where it fails, and what the conditions of use must be. XAI is how a developer supplies that understanding.

The evidence package an accreditor expects goes well beyond a headline metric. It includes slice-based performance – accuracy broken down by operationally meaningful conditions (sensor type, range band, weather, target class) rather than a single pooled number that averages away the dangerous cases. It includes calibration evidence showing the confidence values can be trusted. It includes aggregated attribution demonstrating the model attends to causally relevant features rather than spurious correlates. And it includes documented failure modes: the known conditions under which the model degrades, and the runtime guards (out-of-distribution detection, abstention bands) that catch them.

Frameworks for responsible military AI – including NATO's principles of responsible use and comparable national governance – make traceability and reliability explicit requirements. XAI artefacts are the mechanism by which those abstract principles become a concrete, reviewable file. The same discipline applies with extra force to generative components, where prompt-injection and hallucination create failure modes that classical detectors do not have; the considerations there are covered in our piece on LLM security for defense AI systems.

From explanation to fielded trust

Explainability is not a feature bolted onto a finished model; it is a property designed into the system from the data pipeline through to the operator interface and the logging layer. The model that earns trust in the field is the one whose confidence is calibrated, whose attributions are validated against controls, whose limits are detected and surfaced as abstention, and whose every decision is recorded reproducibly enough to defend in an accreditation review. Get those four things right and the explanation stops being a compliance burden – it becomes the reason an operator is willing to act on the recommendation at all.

Bring accreditation-grade explainability to your AI decision support

Corvus SENSE pairs edge-AI detection and fusion with calibrated confidence, validated attribution, and a reproducible per-decision audit trail – the evidence operators and accreditation bodies both need before they trust an AI recommendation.

Explore Corvus SENSE → Book a Briefing

This analysis was prepared by Corvus Intelligence engineers who build mission-critical AI and decision-support systems for defense and government organizations. Learn about our team →