What is the difference between STIX and TAXII?

STIX (Structured Threat Information eXpression) is the data model — it defines how to represent threat intelligence as typed objects such as indicators, malware, threat actors, and the relationships between them. TAXII (Trusted Automated eXchange of Intelligence Information) is the transport protocol that moves STIX bundles between systems over HTTPS using collections and channels. STIX answers what the intelligence looks like; TAXII answers how it travels. Automating CTI sharing means generating valid STIX 2.1 objects and exchanging them over a TAXII 2.1 server, with no human re-keying at either end.

How do you deduplicate indicators across multiple STIX feeds?

Deduplication should happen at ingest, keyed on the STIX pattern (the observable value) rather than the object id, because the same IP or hash arrives from many sources with different ids. The pipeline normalizes each indicator to a canonical observable, hashes that canonical form, and merges incoming objects into a single internal record that accumulates sources, confidence, and sighting counts. STIX relationships and sightings are preserved so the merged record still shows every feed that reported the indicator and when it was last seen.

What confidence and TLP handling does an automated CTI pipeline need?

Every indicator must carry a confidence score and a Traffic Light Protocol (TLP) marking, both modeled as STIX marking-definition and confidence properties. The pipeline must never strip or downgrade these on republish. Automated outbound sharing should filter by TLP — TLP:RED and most TLP:AMBER indicators must not be pushed to wider collections — and by minimum confidence, so low-confidence single-source indicators do not become block rules in a partner's firewall. Release authority and classification labels ride alongside TLP for defense deployments.

How are STIX indicators pushed into detection tools like SIEM and firewalls?

The pipeline translates the STIX pattern into the native rule language of each detection tool: a Sigma or Splunk SPL rule for the SIEM, a block-list entry for the firewall or DNS resolver, and a Suricata or Snort signature for the network sensor. High-confidence atomic indicators (IPs, domains, hashes) become block lists and watchlists; behavioral patterns mapped to MITRE ATT&CK techniques become detection logic. A SOAR layer applies the translated rules and tracks which indicators are deployed where, so expired indicators can be retracted automatically.

Can a CTI automation pipeline run in an air-gapped or classified enclave?

Yes. The STIX object store, enrichment workers, correlation engine, and TAXII server all run inside the enclave. External feeds enter through an approved cross-domain solution as signed STIX bundles, and outbound sharing to partners is exported as bundles for transfer rather than a live TAXII pull from outside. The design constraint is that no pipeline component should require an internet call at runtime — enrichment sources must be mirrored locally and refreshed through the same controlled transfer path.

Automating CTI sharing with STIX and TAXII

Threat intelligence loses value by the hour. An indicator of compromise that warns one organization on Monday is useless to a partner who receives it as a PDF attachment on Friday. The entire point of cyber threat intelligence (CTI) sharing is to compress that interval to seconds – to move a validated indicator from the analyst who discovered it into the detection stack of every trusted partner before the adversary rotates infrastructure. That is an automation problem, and the two standards that make automation possible are STIX and TAXII. This article walks through the architecture of an automated CTI sharing pipeline for a defense organization: how to model objects, ingest feeds, enrich and correlate indicators, and push the result into the tools that actually block traffic and raise alerts.

STIX and TAXII: the data model and the transport

The two standards are constantly named together, but they solve different problems. STIX (Structured Threat Information eXpression) is the data model. It defines a set of typed objects – STIX Domain Objects such as indicator, malware, threat-actor, attack-pattern, and campaign – plus STIX Relationship Objects that connect them into a graph. A single STIX 2.1 indicator object carries a pattern (the machine-readable detection expression), a validity window, a confidence score, and TLP marking, all in a JSON structure that any compliant system can parse without a custom adapter.

TAXII (Trusted Automated eXchange of Intelligence Information) is the transport protocol. It defines a RESTful HTTPS API for exchanging STIX bundles between systems through two abstractions: collections (a pull model where a client polls a server for objects matching a filter) and channels (a publish-subscribe model). For automated sharing, the typical pattern is a TAXII 2.1 server exposing several collections – segmented by classification and release authority – that partner clients poll on a schedule.

The distinction matters for pipeline design because the two standards fail independently. A pipeline can produce perfectly valid STIX that no partner can fetch because the TAXII server is misconfigured, or it can have flawless TAXII transport carrying STIX bundles that fail schema validation at the far end. Both layers need automated validation before anything leaves the building.

Pipeline architecture: ingest, enrich, correlate, publish

An automated CTI sharing pipeline is four stages connected by a common object store. The same STIX-aligned schema flows through all of them, which is what makes end-to-end automation tractable.

Ingest. The pipeline pulls from heterogeneous sources: TAXII collections published by national CERTs and trusted partners, commercial feeds that emit CSV or proprietary JSON, internal telemetry from the SOC, and structured CTI platform exports. Each source gets a dedicated adapter whose only job is to normalize its native format into the canonical internal schema and stamp provenance metadata – source, collection timestamp, confidence, and TLP marking. A non-negotiable rule at this stage is deduplication keyed on the observable pattern rather than the STIX object id, because the same malicious IP arrives from a dozen feeds with a dozen different ids.

Enrich. A raw indicator is a data point; an enriched indicator is intelligence. Enrichment workers attach passive DNS history, WHOIS, ASN attribution, geolocation, reputation scores, and links to known malware families. In STIX terms, enrichment produces additional objects and relationships hanging off the original indicator – a relationship to an autonomous-system object, a note recording reputation lookups. Enrichment must run asynchronously and cache aggressively; a slow WHOIS lookup cannot be allowed to delay publication of an urgent indicator.

Correlate. The correlation stage loads objects and relationships into a graph store and looks for structure: indicators sharing infrastructure, infrastructure reused across campaigns, techniques that fingerprint a known actor. MITRE ATT&CK mapping is the connective tissue – each attack-pattern object references an ATT&CK technique id, enabling both actor profiling and detection-coverage gap analysis. This stage also computes or propagates confidence and applies expiry windows so that atomic indicators age out automatically instead of accumulating into a stale block list.

Publish. The output stage does two things in parallel. It exposes a TAXII 2.1 server with collections segmented by TLP and release authority, filtering every outbound object by marking and minimum confidence before it can leave. And it pushes high-confidence indicators directly into detection tooling, translating STIX patterns into the native rule languages described below.

Modeling indicators as STIX objects

The heart of the schema is the STIX indicator object, and the field that does the work is the pattern. STIX 2.1 patterning is a small query language: [ipv4-addr:value = '203.0.113.10'] describes a single malicious address, while a compound pattern can express "this domain AND this file hash observed together." The pipeline must validate every generated pattern against the STIX patterning grammar before publication, because a malformed pattern silently fails to match anything when a partner deploys it.

Beyond the pattern, three properties are mandatory for defense sharing. confidence is an integer from 0 to 100 that must propagate untouched through every republish. valid_from and valid_until bound the indicator's operational lifetime. And a reference to a marking-definition object carries the TLP level. An indicator without these three is not shareable in a defense context – there is no way for the receiving system to decide whether it may redistribute the object or how long to trust it.

Confidence, TLP, and the trust problem

Automation amplifies mistakes. A human analyst who receives a sketchy single-source indicator applies judgment before acting; an automated pipeline that pushes every ingested indicator straight to partner firewalls will, sooner or later, block a legitimate service because one low-quality feed misclassified it. The defenses against this are confidence scoring and Traffic Light Protocol handling, and both must be enforced mechanically.

Confidence scoring should be multi-source by design. An indicator reported by five independent feeds and corroborated by internal telemetry earns a high score; a single-source indicator with no corroboration stays low and is held back from automated blocking, even if it is still shared for analyst review. The scoring function is part of the correlation stage, not an afterthought.

TLP handling governs redistribution. The pipeline must filter outbound sharing by marking: TLP:RED never leaves the originating enclave, TLP:AMBER goes only to named partners, TLP:GREEN to the trusted community, TLP:CLEAR to anyone. Stripping or downgrading a marking on republish is a serious incident, so the marking must travel with the object as an immutable STIX marking-definition reference, never as a mutable field that an adapter could accidentally drop. Release authority and classification labels for defense deployments ride in the same structure.

Key insight: The hardest part of CTI automation is not parsing STIX or speaking TAXII – libraries handle both. It is the governance layer: enforcing confidence thresholds and TLP markings on every outbound object so that automation never shares something it should not, or blocks something it should not. Build the marking and confidence filter as a mandatory gate that no publication path can bypass, and treat any indicator missing a marking as undeliverable rather than defaulting it to a permissive level.

Pushing indicators into detection tools

Sharing STIX with partners is only half the value; the other half is operationalizing indicators inside your own detection stack. A STIX pattern is not directly executable by a SIEM or a firewall, so the pipeline needs a translation layer that emits the native rule language of each tool.

High-confidence atomic indicators – IPs, domains, file hashes – become block-list entries on firewalls and DNS resolvers, and watchlist entries in the SIEM. A STIX pattern matching a network connection maps to a Sigma rule or a Splunk SPL search; a file-hash pattern maps to an endpoint detection query. Behavioral indicators expressed as ATT&CK-mapped attack-pattern objects become correlation logic rather than simple block lists, because they describe a sequence of activity instead of a single bad value. This is the same enrichment-to-detection path that LLM-assisted classification can accelerate, by structuring messy source reports into clean STIX before they ever reach the translation layer.

A SOAR (Security Orchestration, Automation and Response) layer should own the deployment and, critically, the retraction. The single most common operational failure in indicator delivery is the absence of retraction: indicators get pushed to detection tools and never removed, so the block list grows without bound and eventually starts generating false positives from indicators that expired months ago. The pipeline must track which indicators are deployed to which tools and automatically retract them when valid_until passes or confidence drops below threshold.

Validation and monitoring at the boundaries

Every automated exchange needs guardrails at the two seams where data enters and leaves. On ingest, each incoming STIX bundle is validated against the 2.1 schema and each pattern against the patterning grammar before it touches the object store; a feed that begins emitting malformed objects must be quarantined automatically rather than poisoning the graph. On publish, the same validation runs again on generated objects, plus the marking-and-confidence gate that no path may bypass. Between the two, the pipeline should emit operational metrics – ingest volume per source, deduplication ratio, enrichment latency, time-from-ingest-to-publish, and per-collection outbound counts – so an analyst can see at a glance when a partner feed has gone silent or when the publication latency has crept past its target. Silent failure is the enemy of automation: a TAXII poll that quietly returns nothing for a week looks identical to a feed with no new intelligence unless the pipeline is watching for it.

Running the pipeline on the classified side

For defense organizations, much of this pipeline runs inside a classified enclave with no internet path. The architectural constraint is absolute: no pipeline component may require an external network call at runtime. The STIX object store, enrichment workers, correlation graph, and TAXII server all run inside the enclave. External partner feeds enter as signed STIX bundles through an approved cross-domain solution rather than a live TAXII poll across the boundary, and outbound sharing is exported as bundles for controlled transfer rather than a partner pulling directly from inside the perimeter.

This reverses the normal assumption that enrichment can reach out to a reputation API on demand. In an air-gapped deployment, every enrichment source – passive DNS databases, ASN tables, reputation snapshots – must be mirrored locally and refreshed through the same controlled transfer path that carries the threat feeds. Designing for offline operation from day one is far cheaper than retrofitting it, because the alternative is discovering during accreditation that half the enrichment stack quietly assumes outbound HTTPS.

Done well, an automated STIX/TAXII pipeline turns CTI sharing from a manual, lossy, hours-long process into a machine-to-machine exchange measured in seconds – while the governance gate ensures that speed never costs the organization its control over what it shares and what it blocks.

Automate your threat intel sharing

Corvus SENSE ingests STIX/TAXII feeds, enriches and correlates indicators against a threat graph, and pushes high-confidence detections straight into your SIEM and firewalls – with TLP and confidence gating built in.

Explore Corvus SENSE → Book a Briefing

This analysis was prepared by Corvus Intelligence engineers who build mission-critical cyber threat intelligence systems for defense and government organizations. Learn about our team →

Automating CTI sharing: STIX, TAXII, and a defense intel pipeline

STIX and TAXII: the data model and the transport

Pipeline architecture: ingest, enrich, correlate, publish

Modeling indicators as STIX objects

Confidence, TLP, and the trust problem

Pushing indicators into detection tools

Validation and monitoring at the boundaries

Running the pipeline on the classified side

Automate your threat intel sharing

Frequently Asked Questions

Automating CTI sharing: STIX, TAXII, and a defense intel pipeline

STIX and TAXII: the data model and the transport

Pipeline architecture: ingest, enrich, correlate, publish

Modeling indicators as STIX objects

Confidence, TLP, and the trust problem

Pushing indicators into detection tools

Validation and monitoring at the boundaries

Running the pipeline on the classified side

Automate your threat intel sharing

Frequently Asked Questions

Related Articles