Defense Software

Penetration testing for defense systems: what makes military security testing different

By Corvus Intelligence Engineering Team · About the team →

June 4, 2026 9 min read

Penetration testing is a standard element of any serious security program. In commercial environments, it is a relatively well-understood exercise: an external team is given a scope, a rules-of-engagement document, and a window of time to find exploitable vulnerabilities before real attackers do. The output is a report; the remediation path is a project management problem.

In defense environments, none of that framing fully applies. The legal authority structure is different, the threat model is different, the constraints on what testers can touch are different, and the path from finding to remediation runs through a formal accreditation process that has no commercial equivalent. Organizations that apply commercial pen testing conventions to defense systems – or that hire commercial pen testing firms without defense experience – routinely produce assessments that miss the most relevant risks, operate outside legally defensible boundaries, or generate findings that the program office cannot act on.

This article examines what actually makes defense penetration testing different, what that means operationally, and how to structure an assessment that produces results a military program can use.

Legal authority: the foundation that has no commercial equivalent

In commercial engagements, the legal basis for a pen test is a contract and a rules-of-engagement document signed by someone with authority to authorize testing of the target systems. That authority is usually straightforward – the company owns the systems and the CISO or CTO can grant permission.

In defense environments, the authority chain is more complex and the stakes for getting it wrong are significantly higher. Government information systems operate under an Authorization to Operate (ATO) granted by an Authorizing Official (AO). The ATO defines the security posture the system is authorized to maintain. Penetration testing modifies that posture, at least temporarily, and must be explicitly authorized by the AO – not merely by the program manager or the system's ISSO.

For US DoD systems, the Computer Fraud and Abuse Act (CFAA) and the UCMJ apply. An individual who accesses a government information system without proper authorization – even with good intentions, even as part of an ostensibly authorized test – has committed a federal crime. The authorization document is not a formality: it is the instrument that converts what would otherwise be unauthorized access into lawful testing. Every tester named in the authorization must be individually identified. The scope of authorized testing must be specific. Activities outside that scope are not protected.

Legal authority requirement: Never begin a defense pen test without a signed, specific authorization document from the system's Authorizing Official. Generic "security assessment" approvals from program managers or prime contractors do not provide CFAA protection. The authorization must identify the testers by name, specify the systems in scope, define the testing window, and enumerate the methods permitted.

Clearance requirements add another layer. Testing a classified system requires that testers hold valid clearances at the appropriate classification level. The testing organization must hold a facility clearance (FCL) at that level. Introducing uncleared personnel – even in supporting roles – into a classified testing environment is a security violation regardless of what they actually see or touch.

ITAR (International Traffic in Arms Regulations) introduces further constraints for programs involving controlled defense articles. Information about vulnerabilities in ITAR-controlled systems may itself be subject to export control restrictions, limiting what can be documented, transmitted, or shared across international boundaries – including within multinational allied programs. Testing firms operating under international subcontracting arrangements must account for this explicitly.

Threat actor emulation: nation-state TTPs, not generic exploits

Commercial pen testing often focuses on finding any exploitable vulnerability – the most common, most readily available, highest CVSS-scored issues in the target's attack surface. For a corporate network, this is a reasonable approach: an opportunistic attacker will exploit the easiest path available.

Defense systems face a fundamentally different threat population. The primary adversaries for high-value defense targets are nation-state actors with substantial resources, advanced capabilities, and long time horizons. They will not be stopped by patching the CVSS-10 OpenSSL issue if they can pivot through a trusted supply chain partner, a legacy embedded component, or a cross-domain solution misconfiguration.

Effective defense pen testing uses adversary emulation: the test team replicates the tactics, techniques, and procedures (TTPs) of specific threat actor groups relevant to the program's threat model. The MITRE ATT&CK framework provides a structured taxonomy for this, with Enterprise and ICS matrices that cover the techniques most commonly employed by advanced persistent threat groups.

For defense systems, the relevant threat actors typically include:

APT28 (Fancy Bear / GRU Unit 26165) – Russian military intelligence, known for spearphishing, credential harvesting, and persistence via legitimate tool abuse. Tactics relevant to defense software include targeting developer workstations and build pipelines to inject malicious code upstream of deployment.
Lazarus Group (DPRK) – North Korean state actor with a track record of targeting defense contractors and aerospace firms, particularly through watering hole attacks and weaponized job-application lures targeting cleared personnel.
Volt Typhoon (PRC) – Chinese state actor focused on living-off-the-land techniques to achieve persistent, low-noise access to critical infrastructure and defense networks. Notable for avoiding custom malware in favor of built-in system tools to evade detection.

The test plan should specify which adversary profile is being emulated and why, based on the program's threat assessment. A test that emulates Volt Typhoon's living-off-the-land approach will look very different from one that emulates APT28's credential-focused operations – and both will look different from a test focused on insider threat scenarios or supply chain integrity.

Adversary selection note: The threat actor profile should be driven by the program's intelligence-informed threat assessment, not by tester preference or generic "advanced" labels. For programs with specific geographic threat profiles or known targeting history, the ISSO should brief the test team on current threat reporting before the engagement begins.

Scope management: no-downtime constraints and isolated test environments

Commercial pen testing scope constraints are primarily about limiting liability and focusing effort. Defense scope constraints carry additional dimensions that fundamentally shape how a test can be conducted.

Many defense systems cannot accept any downtime during testing. Command and control systems, communications infrastructure, and real-time sensor fusion platforms may be operationally active during a test window. An exploitation attempt that causes a service interruption – even a brief one – may have operational consequences that no amount of contractual indemnification can address. The standard commercial approach of testing against production systems with a "stop if you see something unstable" rule is not acceptable for these environments.

The practical consequence is that many defense pen tests are conducted against dedicated test environments: isolated network segments, staging environments, or lab replicas that mirror production as closely as possible. The fidelity of the test environment matters enormously. A test environment that uses different firmware versions, lacks production integrations, or operates with relaxed access controls compared to production will produce findings that do not reflect the actual risk posture of the operational system. Test environment fidelity is an investment that the program office must commit to – it is not the testing team's problem to solve.

Scope violations are treated with greater severity in defense environments than in commercial ones. Accidentally touching a system outside the authorized scope is not a minor procedural issue – it may constitute unauthorized access to a government information system. Testers must maintain a detailed activity log throughout the engagement, documenting significant actions in near-real-time, so that any scope questions that arise during or after the test can be resolved with evidence rather than recollection.

Defense-specific vulnerability classes

Beyond the procedural differences, defense systems present vulnerability classes that are underrepresented in commercial pen testing methodologies.

Legacy embedded systems. Defense platforms routinely run software on hardware that is 10–20 years old, with embedded firmware that cannot be patched through normal software update mechanisms. Vulnerabilities in these components may be known but untreatable within the system's lifecycle – the pen test finding will become a permanent POAM entry with a deviation request rather than a remediation ticket. Testers need to understand what "finding" means in this context: the value is in documenting and quantifying the risk, not necessarily in driving immediate remediation.

Cross-domain solutions. Systems that handle data at multiple classification levels use cross-domain solutions (CDS) to move data between security domains. These are high-value targets: a CDS that can be manipulated to pass information in the wrong direction defeats the entire data-handling architecture. CDS testing requires specialized expertise and specific authorization – these components are often treated as separate scopes within a broader program assessment.

Supply chain integrity. The most significant software supply chain attacks in recent years (SolarWinds, XZ Utils) have targeted build pipelines and dependency injection rather than running systems. Defense programs are high-value targets for this attack class. Pen testing should include assessment of the build environment: can an attacker with access to a developer workstation introduce malicious code that survives into a production build? Can a compromised dependency be introduced without triggering existing controls?

Certificate and key management. Defense systems depend heavily on PKI infrastructure for authentication and communications security. Misconfigured certificate validation, overly broad trust anchor configurations, and poor key lifecycle management are consistently high-severity findings. Unlike application vulnerabilities, these are often invisible to automated scanning and require manual verification of the PKI architecture against the system's security design.

The finding lifecycle: POAM, ATO impact, and deviation requests

In commercial engagements, the pen test report goes to the CISO, findings are triaged, a subset get remediated, and the rest age in a tracker until the next assessment. The process is driven by risk appetite and engineering capacity.

In defense environments, findings feed into a formal accreditation lifecycle with legal and contractual implications. The key concept is the Plan of Action and Milestones (POAM): a document that tracks every known weakness in the system against which an ATO has been granted or is being sought. Every finding from a pen test that is not immediately remediated must be entered into the POAM with a scheduled remediation date, responsible owner, and interim mitigation measure.

The POAM is reviewed by the Authorizing Official as a condition of ATO maintenance. Open high-severity items without adequate interim mitigations or realistic remediation schedules can trigger an ATO suspension – effectively taking the system offline until the risk posture is addressed. For a program office, this outcome is serious enough that some programs delay or limit pen testing scope to avoid generating findings that could trigger an ATO review. This is a risk management failure: the vulnerabilities exist whether or not they are documented.

For findings that cannot be remediated – legacy components with no available patches, architectural constraints that would require a full system redesign – the program office may submit a deviation request or a risk acceptance to the AO. This is not an admission of failure; it is the formal mechanism for operating with known residual risk under explicit authorization. Testers should understand this process and frame findings in ways that help the ISSO construct defensible POAM entries and deviation requests, not just in ways that maximize CVSS scores.

Report framing for defense programs: Defense pen test reports should include, for each finding: a classification marking, a severity rating aligned to the program's risk acceptance criteria, an assessment of exploitability given the program's actual threat actors, and a recommended POAM disposition. Reports written in commercial format – CVSS scores, generic remediation advice, executive summaries aimed at non-technical leadership – require significant rework before the ISSO can use them.

How to scope and authorize a defense pen test

The following steps reflect the requirements for a defensible, legally authorized penetration test on a defense software system.

Step 1: Establish legal authority and written authorization. Obtain a signed test authorization document from the system's AO. The document must name the testers, specify the systems in scope, define the testing window, and enumerate permitted methods. This is not a formality – it is the instrument that makes the testing lawful.

Step 2: Verify clearance and facility credentials. Confirm that all testers hold valid clearances at the classification level of the target system, and that the testing organization holds an FCL of the required level. Brief all testers on the program's security classification guide before they access any classified environment.

Step 3: Define scope and test environment boundaries. Identify which systems, networks, and interfaces are in scope. Where operational systems cannot accept downtime, establish a dedicated test environment. Document explicit exclusions and brief testers on legal consequences of scope violations.

Step 4: Select and vet testing tools. Review ITAR obligations and program accreditation requirements to determine which tools are permitted. Eliminate tools with foreign-origin components, cloud-based licensing, or outbound telemetry. Document the toolset in the test plan and submit for program office review before the engagement begins.

Step 5: Conduct threat actor emulation based on the program's threat model. Select the adversary profile most relevant to the system. Use MITRE ATT&CK for ICS or Enterprise as appropriate, tailored to the specific system architecture and mission profile. Do not substitute generic "advanced" testing for actual adversary emulation.

Step 6: Document activity and findings with classification markings. Maintain a detailed activity log throughout the engagement. All findings must carry appropriate classification markings and severity ratings aligned to the program's risk acceptance criteria.

Step 7: Enter findings into the POAM and track remediation. Work with the ISSO to enter all open findings into the POAM. Assign remediation owners, scheduled dates, and interim mitigations. Brief high-severity findings directly to the AO – do not allow critical vulnerabilities to sit in a queue without explicit risk acceptance or active remediation.

Discuss Your Security Assessment Requirements

Corvus Intelligence builds and tests defense software for programs where the stakes of security failures are operational, not just reputational. If you are planning a security assessment for a defense system or evaluating vendors with the necessary clearance and program experience, we are available for a technical briefing.

Book a Briefing Defense Software Articles →

This analysis was prepared by Corvus Intelligence engineers who build and assess mission-critical software for defense and government organizations. Learn about our team →

Frequently Asked Questions

Who can legally conduct penetration testing on classified defense systems?

Only personnel with appropriate security clearances for the classification level of the system being tested. In US DoD environments, testers typically require at minimum a Secret clearance for SECRET systems, with a Top Secret/SCI clearance for TS-level systems. The testing organization must also hold a facility clearance (FCL) at the appropriate level. Contractors must operate under a written authorization document — a Memorandum of Agreement or specific test authorization — signed by the system's Authorizing Official. Testing without explicit written authorization is a federal crime regardless of intent.

What happens when a zero-day vulnerability is found during a defense pen test?

Zero-day findings in defense engagements follow a different path than commercial disclosures. The finding is classified at the appropriate level and reported immediately to the program's Authorizing Official and ISSO. The tester does not publish or disclose the vulnerability externally — including to the vendor — without government direction. The government may coordinate disclosure through CISA or the relevant national CERT, issue a deviation request to allow continued system operation while a patch is developed, or classify the finding as a known vulnerability in the ATO package. ITAR and export control considerations may restrict what can be shared across international boundaries even within allied programs.

What tools are approved for use in classified defense environments?

There is no single universal approved tool list — each program's accreditation package or test authorization specifies what is permitted. In practice, commercially available tools (Cobalt Strike, Metasploit Pro, Nessus) may be approved but must be sourced, installed, and operated in ways that satisfy the program's supply chain and configuration management requirements. Tools with foreign-origin code or cloud-based licensing that phones home are routinely prohibited. Custom tooling must go through code review and may need to be treated as a software supply chain artifact under SBOM requirements. Some programs prohibit tools that use external threat intelligence feeds entirely.

How does scope management in defense pen testing differ from commercial engagements?

In commercial engagements, scope is primarily a liability management exercise. In defense engagements, scope is a legal authority boundary. Operating outside the approved scope — even accidentally — can constitute unauthorized access to a government information system, triggering criminal liability under the Computer Fraud and Abuse Act and potentially the UCMJ for military personnel. Defense scopes also routinely exclude operational systems that cannot tolerate downtime, requiring dedicated test environments that may not perfectly mirror production. Testers must document every significant action during the engagement to produce a defensible activity log if questions arise later.

What is a POAM and how does it relate to pen test findings?

A Plan of Action and Milestones (POAM) is the formal document that tracks every known security weakness in a system that has received or is seeking an Authorization to Operate (ATO). When a pen test produces findings, each finding that cannot be immediately remediated must be entered into the POAM with a scheduled remediation date, responsible owner, and interim mitigation. The POAM is reviewed by the Authorizing Official as part of ongoing ATO maintenance — high-severity open items can trigger an ATO suspension or revocation. This means the pen test report is not the end product; the disposition of each finding in the POAM is what the program office actually tracks.