The vast majority of defence technology currently offered to NATO member militaries has never been deployed in real combat operations. It has been developed, tested in controlled environments, demonstrated to evaluation panels, and certified as meeting specification. This is not the same as being battle-tested — and the difference matters in ways that are difficult to quantify in advance but painfully apparent after deployment.

Since 2022, Ukraine has become the world's most active testing environment for defence technology. The intensity, duration, and peer-adversary character of the conflict has exposed failure modes in military software that years of exercises and demonstrations had not revealed. The systems that survived and proved useful are a qualitatively different category from those that did not.

What "Battle-Tested" Actually Means

Battle-tested does not mean combat-proven in the sense of having participated in kinetic engagements. It means the software has been operated by real military users, in real operational conditions, under real operational pressure, for a sustained period — and the failure modes that emerged have been identified, fixed, and the fixes have been re-tested in the same conditions.

This is a cumulative quality. A software system accumulates operational experience by being deployed, failing, having those failures diagnosed and addressed, being redeployed, and repeating this cycle. The result is a system whose edge cases have been found and handled — not because the developers anticipated them, but because operational reality surfaced them. No amount of requirements analysis or test planning reliably surfaces all the edge cases that real operations produce.

A lab-tested system, by contrast, has been tested against a requirements specification and found to meet it. The specification was written by people who anticipated how the system would be used and what conditions it would face. The gap between that anticipation and operational reality is the source of most real-world defence software failures.

Ukraine: The World's Most Active Defence Tech Test Environment

Several characteristics make the Ukraine conflict uniquely valuable as a technology testing environment. First, the pace: the conflict has seen sustained high-intensity operations with a tempo that exercises cannot replicate, producing years' worth of operational use in months. Second, the adversarial sophistication: Russian electronic warfare, cyber operations, and counter-drone capabilities have tested defence technologies against a peer-level adversary — not a proxy or asymmetric threat. Third, the feedback loop speed: organizations operating in the conflict have been unusually willing to provide direct technical feedback, enabling rapid iteration.

The specific lessons from this environment are concrete. Command and control software that worked reliably in brigade-level exercises failed when used by battalions under fire, because the user population under operational stress interacts with software differently than trained operators in an exercise context. Logistics applications that performed well in connectivity-assured environments failed immediately when Russian EW disrupted communications in contested areas. Drone control software that performed flawlessly with low-latency links failed when operating over degraded connections — revealing that the software had implicitly assumed reliable connectivity at a level that the specification did not require but the developers had relied on.

Failure Modes That Only Appear in Real Operations

Network degradation handling. The most consistent finding from operational deployments is that software designed under the assumption of adequate connectivity fails gracefully in simulations and catastrophically in operations. Real tactical networks operate at 10–30% of the bandwidth available in a garrison or exercise context. Applications that make dozens of API calls per user interaction to support a single screen update — standard in commercial web development — become unusable on a congested tactical radio network. This failure mode is almost never caught in testing because test environments invariably have better connectivity than operational environments.

Operator stress and error tolerance. Real operators under stress use software differently than trained operators in controlled conditions. They press buttons multiple times because they are not sure the first press registered. They interrupt long-running operations. They select wrong options and need to undo. They do things the developer never anticipated. Software that lacks robust error handling and recovery from interrupted operations will fail in ways that a laboratory test will not catch, because a laboratory tester follows the expected workflow and knows the system well enough to avoid most of the traps.

Adversarial interference. Adversaries actively attempt to disrupt software systems — through jamming (affecting connectivity and GPS), spoofing (injecting false data), and cyber attacks (exploiting vulnerabilities). A system that has never been operated in an adversarially contested environment has never had its resilience to these threats actually tested. The test environment provides a clean signal; the operational environment provides a hostile one.

Why Lab Demos Can Be Misleading

A demonstration is designed to show a system at its best: configured correctly, operated by people who know it well, running on a reliable network, against pre-selected scenarios. All the conditions are favorable. A demonstration that goes well is evidence that the system is capable of performing correctly under ideal conditions. It is not evidence that the system performs correctly under operational conditions, which are not ideal.

The evaluation criteria used in most defence procurement processes reward demo performance rather than operational resilience. Features are assessed; failure recovery is not. User interface responsiveness in a controlled environment is assessed; behavior under contested comms is not. The result is a procurement process that systematically favors systems that demonstrate well over systems that operate well.

Key insight: The procurement question to ask is not "can you demonstrate that this works?" — every vendor can demonstrate that. The question is: "where has this been deployed operationally, for how long, and what failures occurred? What did you learn and what did you change?" The answer to that question separates battle-tested from lab-tested.

What Procurement Officers Should Ask About Operational History

Several specific questions separate battle-tested from lab-tested vendors. First: name the operational deployments — not pilots, not proof-of-concept engagements, but actual operational deployments to units conducting real missions. How long have those deployments been running? What issues were reported by operational users (not by the program manager, but by the actual users)? What changes were made in response?

Second: what is the system's behavior when the network is unavailable for 30 minutes? For 8 hours? What does the user see? What data is preserved? This question, asked of the technical lead rather than the sales team, quickly reveals whether operational resilience has been thought through or assumed.

Third: describe a specific operational failure and what you did about it. A vendor who cannot describe a specific failure has either not had their system used operationally, or is not being candid. Operational deployment produces failures. Experienced vendors have a repository of specific failure diagnoses and the engineering work done to address them. This repository is the actual evidence of battle-testing — not a feature list, not a demo, not a certificate.