A commercial software team can ship a new feature to production in minutes: a pull request passes automated tests, a reviewer approves, the CI pipeline builds and deploys. For defense software teams, the same delivery path must navigate a fundamentally different constraint space: ITAR export regulations that restrict who can touch build artifacts, STIG compliance requirements that define how every layer of the stack must be configured, SBOM mandates that require a machine-readable inventory of every dependency, and deployment environments that may have no internet connection at all. The result is that many defense programs either abandon CI/CD entirely in favor of manual, gate-heavy release processes — or adopt commercial CI/CD tooling without the compliance wiring and create audit liabilities.
Neither outcome is necessary. A defense software CI/CD pipeline that satisfies ITAR controls, produces STIG-compliant artifacts, generates signed SBOMs, and deploys to air-gapped environments is achievable with available open-source tooling and a clear architectural pattern. This article describes that pattern, covering pipeline infrastructure choices, automated testing stages, SBOM generation, container hardening, air-gap deployment, rollback procedures, and audit trail requirements.
Defense software constraints: ITAR, STIG, and classification handling in CI
The International Traffic in Arms Regulations (ITAR) control the export of defense articles and services, including technical data related to defense systems. Software that implements controlled functionality — guidance algorithms, communications protocols, sensor fusion logic — is ITAR-controlled technical data. In a CI/CD context, this means the source code, build artifacts, and test results may all be export-controlled, and access must be restricted to US persons (citizens, permanent residents, or those with appropriate licensing).
The practical implication for pipeline infrastructure: build runners that process ITAR-controlled code must run on systems where access is enforced at the infrastructure level, not just the application level. A SaaS-hosted CI runner — GitHub-managed runners, GitLab.com shared runners, CircleCI cloud — is not appropriate because the cloud provider's operations staff may have access to the execution environment. Self-managed runners on on-premises hardware, or runners in a FedRAMP High or DoD IL4/IL5 cloud enclave with documented access controls, are the compliant path.
The program's Technology Control Plan (TCP) — the ITAR compliance document that describes how controlled technical data is protected — must include the CI/CD infrastructure in its scope. This means documenting where runners run, how access is controlled, how artifacts are stored, and how they are transferred between environments. Updating the TCP when pipeline infrastructure changes is as important as updating it when the product architecture changes.
Classification handling adds a further constraint for programs that process classified information. The CI/CD pipeline itself runs at the highest classification level of any artifact it produces. A pipeline that produces both unclassified deliverables (documentation, open-source components) and CUI (Controlled Unclassified Information) deliverables must either run entirely in the CUI environment or implement information flow controls between segments — a complexity that most programs avoid by running a single pipeline at the CUI level.
Pipeline architecture: on-premises GitLab vs SaaS, air-gap considerations
GitLab self-managed is the dominant CI/CD platform choice for sensitive defense programs. It runs entirely on infrastructure you control, supports fully offline installation (via the offline package), integrates with Active Directory and LDAP for access control, and has a large installed base in DoD programs. Platform One — the DoD's DevSecOps platform — is built on GitLab and provides a reference implementation that many programs adopt directly.
GitHub Enterprise Server is a viable alternative for programs already in the GitHub ecosystem, offering self-hosted deployment with the same workflow syntax as GitHub.com. Jenkins remains in use in legacy programs, though its security plugin ecosystem is more fragmented than GitLab's built-in capabilities. For new programs, GitLab self-managed or a platform built on it is the clearest path to compliance.
The artifact registry — where build outputs, container images, and deployment packages are stored — must also run on infrastructure you control. Harbor is the CNCF-graduated open-source container registry with built-in vulnerability scanning. Artifactory or Nexus serve as package registries for language-ecosystem dependencies (npm, Maven, PyPI, Go modules). Both must be seeded with dependencies and base images from external sources via a controlled process, because in air-gapped environments there is no internet egress for dependency resolution.
For air-gapped or classified network deployments, the pipeline architecture has an additional component: a mirror refresh process that brings external content — OS packages, language dependencies, vulnerability database updates, container base images, signing key bundles — across the air gap on a documented schedule via an accredited cross-domain solution or removable media. Builds reference only the internal mirror; the policy that enforces this is implemented at the network level (no egress route to the internet from build runners), not just by convention. The mirror refresh interval determines the lag between a public CVE disclosure and its appearance in the internal vulnerability scanner — a metric that belongs in the vulnerability management plan.
Automated testing stages: unit through STIG compliance scan
A defense CI/CD pipeline has more mandatory stages than a commercial pipeline. The stage order matters: fast, cheap stages run first to fail quickly on obvious problems; expensive stages run only on builds that pass earlier gates.
Unit and integration tests run on every commit. They are the primary feedback mechanism for functional correctness and should execute in under five minutes for a fast feedback loop. Integration tests that require external services (databases, message queues) use containerized test doubles orchestrated by the pipeline, not shared test environments, to avoid contention and ensure reproducibility.
SAST (Static Application Security Testing) runs on every pull request. Semgrep is the tool of choice for its speed (under 30 seconds on medium codebases), its support for custom rule authoring, and its SARIF output format. Custom rules for defense-specific prohibitions — CNSA-deprecated cryptographic primitives, banned function calls in safety-critical code, hardcoded classification markers — are committed to the repository alongside the code and evolve with the codebase. SonarQube is an alternative that provides a persistent finding dashboard, useful for tracking security debt trends across releases.
DAST (Dynamic Application Security Testing) runs against a deployed test instance of the application. OWASP ZAP in headless (daemon) mode is the standard tool. The pipeline deploys the application to a temporary test environment, runs ZAP with an authenticated scan against the application's API endpoints, and gates on high-severity findings. DAST catches vulnerability classes that SAST cannot — injection flaws that only manifest at runtime, authentication weaknesses, CORS misconfiguration — and its findings are more directly actionable because they include verified request/response pairs.
STIG compliance scanning is the gate that most commercial pipelines lack entirely. DISA STIGs define mandatory configuration requirements for every technology category used in DoD systems — operating system, web server, database, Kubernetes, container runtime — and compliance is a hard ATO requirement. InSpec with DISA-published STIG profiles provides programmatic compliance checking that runs as a pipeline stage. Container images are built from STIG-hardened base images and scanned at build time; infrastructure-as-code (Terraform, Ansible) is scanned for STIG violations before apply. Any deviation from the STIG baseline fails the pipeline and emits a finding with the specific control identifier, making remediation actionable rather than ambiguous.
SBOM generation: CycloneDX/SPDX, Grype/Trivy, license compliance
A Software Bill of Materials is no longer optional for defense software procurement. NDAA Section 1655 (FY2023) directed DoD to develop guidance requiring SBOMs from software vendors, and the DoD Software Modernization Strategy cites SBOMs as a supply-chain security requirement. The practical effect: programs must be able to produce a machine-readable SBOM for every software release, and that SBOM must be current, signed, and linked to the specific build that produced it.
Generating the SBOM at build time — not as a manual documentation exercise — is the only approach that stays current. Syft (from Anchore) and cdxgen are the two leading open-source SBOM generators. Syft analyzes build outputs (container images, binary packages, language-specific lock files) and produces CycloneDX or SPDX format SBOMs. cdxgen has stronger support for polyglot codebases and generates SBOMs with component relationships that satisfy CISA's recommended SBOM minimum elements.
The SBOM is signed alongside the build artifact using Cosign or similar tooling, with the signing key held in an on-premises HSM or KMS. The signed SBOM accompanies the artifact into the registry and into every deployment package, so assessors can retrieve it on demand rather than requiring a separate request to the development team.
Vulnerability analysis against the SBOM uses Grype or Trivy, which cross-reference SBOM components against CVE databases (NVD, OSV, GitHub Advisory) to identify known vulnerabilities. Results are attached to the SBOM as VEX (Vulnerability Exploitability eXchange) annotations that document for each CVE whether the component is actually affected given the program's usage context. This VEX annotation is what allows a program to assert "this CVE in this library does not affect us because we do not call the vulnerable code path" — a capability assessors increasingly expect.
License compliance is a parallel output from the SBOM generation step. Open-source licenses vary in their requirements for defense use — GPL licenses require source disclosure, some licenses are incompatible with commercial redistribution, and some components are explicitly prohibited in certain export contexts. A license-compliance gate in the pipeline — implemented with a tool like FOSSA or an in-house policy script against the SBOM — catches license violations before they become legal issues in procurement.
Container hardening: distroless bases, non-root execution, seccomp, image signing
Container images for defense workloads require hardening beyond what most commercial containerization guides recommend. The hardening requirements flow from STIG guidance (DISA publishes STIGs for container platforms, Docker, and Kubernetes) and from practical security hygiene for environments where lateral movement from a compromised container is a significant concern.
Base image selection is the first decision. Distroless images (Google's distroless project, or Chainguard Images) contain only the application runtime and its direct dependencies — no shell, no package manager, no utilities that an attacker could use post-exploitation. Distroless images are significantly smaller (reducing the attack surface and the number of CVEs in the SBOM) but require that the application can be deployed without shell scripting at startup. Where a shell is genuinely required, DISA STIG-hardened base images (available for RHEL, UBI, and other distributions) provide a pre-hardened alternative with documented compliance.
Non-root execution is mandatory. Containers that run as root (UID 0) can escalate privileges on the host through container escape vulnerabilities. All defense container images must specify a non-root USER in the Dockerfile, set the root filesystem to read-only, and use a writable volume mount only for data that genuinely changes at runtime. Admission controllers enforce this at scheduling time — a container spec that requests root execution is rejected before it can run.
Seccomp profiles restrict the Linux system calls available to the container, reducing the attack surface for kernel exploitation. The RuntimeDefault seccomp profile (provided by the container runtime) blocks the most commonly exploited system calls. For well-understood workloads, a custom seccomp profile that permits only the specific calls the application requires provides stronger isolation. Seccomp profiles are attached to the pod spec and enforced by the kernel.
Image signing with Cosign (from the Sigstore project) or Notary v2 provides cryptographic assurance that a deployed image is exactly the image that the pipeline produced — not a substituted or modified version. The signing key is held in a hardware security module (HSM) or cloud KMS with access restricted to the pipeline's signing step. Admission controllers (using Kyverno's image verification policies or OPA rules) verify the signature at pod scheduling time and reject unsigned or invalidly signed images.
Deployment to classified environments: sneakernet transfer, hash verification, manifest signing
Deploying to an air-gapped classified environment from a CI/CD pipeline is a two-phase process: the pipeline produces a deployment package on the unclassified or lower-classification side, and a separate installation process on the classified side consumes that package. The pipeline cannot directly reach the classified environment — the air gap is enforced at the network level.
The deployment package produced by the pipeline contains everything needed for installation: the signed container images or application binaries, the SBOM and vulnerability scan results, the SLSA provenance attestation linking the package to the specific commit and pipeline run that produced it, a deployment manifest that documents the target environment configuration, and a SHA-256 hash file covering all package contents. The entire package is signed with the program's release signing key before transfer.
Transfer across the air gap uses either an accredited cross-domain solution (a hardware/software system certified to transfer data between classification levels, such as a Forcepoint or Owl data diode) or documented sneakernet procedures using encrypted removable media with chain-of-custody records. Both methods require the transfer to be logged with the approving authority, the media serial number or CDS transaction ID, and the package hash.
On the classified receiving side, the installation script verifies the hash and signature before taking any installation action. If either check fails, installation aborts and the failure is logged. This is not a courtesy check — it is the primary integrity control for the classified environment, which cannot reach back to the pipeline to verify provenance in real time. The deployment manifest is reviewed by the system administrator before installation to confirm the target configuration matches the package expectations.
Rollback procedures: blue-green for services, SQLite snapshots for embedded systems
Rollback in defense environments must be reliable, fast, and auditable. A failed deployment in a tactical system during an operation is not a recoverable situation in the commercial sense — there may be no help desk to call and no internet connection to pull a fix from.
Blue-green deployment is the rollback strategy for web services and containerized applications. The previous version remains running and receiving traffic; the new version is deployed to the inactive slot; a health check verifies the new version; traffic is switched. Rollback is a traffic switch back to the previous slot — a seconds-long operation that does not require reinstallation. The previous version stays deployed for a defined hold period (typically one release cycle) before being decommissioned. The pipeline records the current active slot version in the deployment metadata so rollback can be executed without consulting the change log.
Versioned SQLite snapshots address the rollback problem for embedded systems and applications with file-based persistent state. Before every upgrade, the deployment script takes a snapshot of the current SQLite database file with the version tag. The snapshot is stored in a known location on the target system. Rollback restores the previous binary and the corresponding snapshot, returning the system to a consistent state without requiring a full reinstallation. The CI pipeline tests the snapshot/restore path in integration tests, not just in production — a rollback procedure that has never been tested is not a rollback procedure.
Audit trail requirements: immutable logs, change approvals, requirements traceability
Defense programs operate under audit requirements that commercial software programs do not face in the same form. The Authority to Operate process, program management reviews, and Inspector General audits all require evidence that changes were authorized, tested, and deployed through a controlled process. The CI/CD pipeline is the system that produces most of this evidence — and it must produce it in a form that survives a change of personnel, a FOIA request, or a program review years after the fact.
Immutable pipeline logs are the foundation. Logs are written to a SIEM or object-storage sink with object locking (S3 Object Lock in COMPLIANCE mode, or equivalent) so they cannot be modified or deleted, even by administrators. Log entries include the actor identity (user or service account), the action taken, the artifact affected (with hash), the timestamp, and the pipeline run ID that links back to the source code commit. Log retention follows the applicable records schedule — typically a minimum of three years for program records.
Change approvals in the issue tracker create the link between authorized work and executed changes. Every deployment must reference an approved change ticket. The deployment pipeline gate checks that the current release is linked to an approved ticket before allowing deployment to proceed. Ticket approval workflows are configured to require sign-off from both technical and program management representatives, with the approval timestamp and approver identity stored immutably in the tracker. This creates a complete chain: requirement → approved change → pipeline run → signed artifact → deployment event.
Traceability to requirements closes the chain. Defense programs are typically required to demonstrate that every deployed capability traces to a system requirement and that every requirement has been tested. The issue tracker provides this traceability when development work items are linked to requirement items and test results are attached to the work items. The pipeline can automate part of this by attaching test result summaries and artifact hashes to the relevant issue when a deployment completes, reducing the manual documentation burden while maintaining the evidence chain that ATO assessors require.
Key insight: A compliant defense CI/CD pipeline does not slow down delivery — it makes fast delivery possible in a high-compliance environment. The manual alternative — gate reviews, hand-carried documentation, point-in-time security audits — is slower, more error-prone, and produces weaker evidence. Programs that invest in pipeline compliance infrastructure recoup the investment in every release cycle and every ATO renewal thereafter.