The Million-Track Query Problem

A common operator query in a defense fusion platform reads: "show me everything within 5 km of this point in the last 60 seconds." It sounds trivial. On a table of one hundred million track updates, written by AIS receivers, ADS-B feeds, radar tracks, ESM, and drone telemetry, a naive implementation collapses inside three weeks of production traffic.

The naive query is a sequential scan. SELECT * FROM tracks WHERE ST_DWithin(geom, $point, 5000) AND ts > now() - interval '60 seconds'. On a hundred-million-row table with no spatial index, that is 12 to 40 seconds depending on disk. With a thousand operators each refreshing every 2 seconds, the database burns. The fix is not faster hardware. The fix is a spatial index — and the choice of index dictates everything downstream: ingest throughput, query latency, storage footprint, and the cost of distributed sharding.

This article walks the four index families that matter for defense: H3, S2, the R-Tree lineage, and the PostGIS GiST/SP-GiST/BRIN trio. Then it covers the hybrid time-series patterns, the dual-layer culling architecture for the common operational picture, and the migration story.

H3 (Uber Hexagonal)

H3 is a hierarchical hexagonal grid system. The planet is partitioned into 122 base cells (10 pentagons, 112 hexagons), and each base cell subdivides recursively into seven children, down to resolution 15. A cell at resolution 9 is roughly 0.1 km² — the size of a city block. A cell at resolution 6 is around 36 km² — a useful operational engagement bubble.

Hexagons beat squares for proximity queries because every hexagonal neighbor sits at the same centroid distance. With a square grid, diagonal neighbors are 1.41× farther than edge neighbors — every distance calculation has to special-case it. With hexagons, a k-ring of radius 3 yields exactly 37 cells, all at predictable distance. A 5 km radius search becomes: convert query point to an H3 cell, expand to a k-ring covering 5 km, fetch all tracks tagged with those cell IDs, then refine with exact ST_Distance.

Defense-friendly characteristics: H3 cell IDs are 64-bit integers, trivial to index in any KV store. Cell membership is hashable, so a track with an H3 column can be sharded by cell prefix without bespoke logic. Cell-level aggregates ("how many tracks per H3-7 cell in the last hour") collapse straight to a GROUP BY h3_index. We have seen H3-indexed track tables answer 5 km proximity queries in 8–14 ms on tables of 200 million rows.

Google S2

S2 partitions the sphere via a quadtree projection onto the surface of an inscribed cube. Each face is recursively subdivided into four children down to level 30, where cells are around 1 cm². The S2CellId is a 64-bit integer encoding cell face, position along a Hilbert curve, and level. The Hilbert encoding is the trick: cells close on the sphere have close IDs, so a BETWEEN range on the integer column behaves like a spatial range query.

That integer-arithmetic advantage is when S2 wins. If the working store is a sharded RDBMS, a flat KV (DynamoDB, Cassandra), or a search engine (Elasticsearch's geo_shape uses BKD trees but range queries on S2CellIds are equally fast), S2 keeps proximity logic inside cheap integer comparisons. No GIS extension required.

H3 versus S2 in practice: H3 is friendlier for uniform-distance reasoning, aggregations, and human-readable cells. S2 is friendlier when the storage layer has no geometry support and you are working with billions of points across global coverage where pentagon distortion in H3 (10 cells per planet have only 5 neighbors instead of 6) creates problems. NATO maritime surveillance teams indexing global AIS data have moved to S2 for exactly this reason. Regional air-pictures with continental scope mostly stay on H3.

R-Tree and Variants

The R-Tree, introduced by Guttman in 1984, indexes geometries by bounding boxes. Each tree node stores the minimum bounding rectangle (MBR) of its children. A proximity query walks the tree, pruning subtrees whose MBR does not intersect the query envelope. Worst case is poor — overlap between MBRs degrades to linear scan — but the average case on real-world data is excellent.

R*-Tree (1990) reduces MBR overlap with smarter node splits and reinsertion on overflow. It is the variant most production systems implement today, including the in-memory R-Tree inside libspatialindex used by GeoPandas and inside DuckDB's spatial extension.

The bulk-load story matters. Loading a million geometries one-at-a-time into an R-Tree is slow and produces a poorly balanced tree. The Sort-Tile-Recursive (STR) bulk-load builds the entire tree in a single pass after sorting geometries along a space-filling curve, yielding near-optimal MBR layout. Always bulk-load when standing up a new index on a backfilled dataset.

When the in-memory R-Tree still wins: client-side geometry indexes inside a single operator workstation (Qt-based common operational picture clients, browser-side Leaflet/Mapbox supercluster layers), and stateless geofence services where the geofence set fits in RAM. A 100k-geofence R-Tree in libspatialindex services point-in-polygon lookups in 30–60 µs. PostGIS for the same workload, with network round-trip, is closer to 1–2 ms.

PostGIS Indexing — GiST vs SP-GiST vs BRIN

PostGIS is the default geospatial backbone for most defense fusion stacks Corvus has touched, and the index choice inside it is non-trivial. See our deeper PostGIS for defense walkthrough for the full setup; this section is the index-selection summary.

GiST is the generalized-search-tree, which PostGIS uses to implement an R-Tree-equivalent over geometries. It is the default. Fits well when geometries are heterogeneous in size (mixing points, lines, polygons) and when the workload mixes read and write. Build time on 100M point rows is roughly 15–25 minutes on a fast SSD with maintenance_work_mem tuned to 2 GB.

SP-GiST (space-partitioned GiST) implements quadtrees and k-d trees. Faster on point-only workloads where data is uniformly distributed. We have measured SP-GiST 1.4–1.8× faster than GiST for pure point proximity queries on a 50M-row track table, with a 25% smaller index footprint.

BRIN (block range index) is the cheap option when data is already physically clustered by location — for example, when track inserts are partitioned by region and physically appended in geographic order. BRIN indexes are tiny (kilobytes for tables that need gigabytes of GiST) and useful as a secondary filter, not a primary spatial index. They shine on cold archive partitions.

The spatial-temporal combined index pattern: for "within 5 km in the last 60 seconds" workloads, do not try to build a single multi-column index. Instead build a GiST on geom and a B-tree on ts, then partition the table by time. The query planner combines both, and partition pruning eliminates 99% of the data before any spatial work happens. Real query plans on million-row partitions answer in 4–12 ms. The same query without partition pruning, on the same indexed table, is 200–400 ms.

Time-Series + Geospatial Hybrid

Defense tracks are time-series at heart. Every track update is an append. Reads are predominantly "latest position per emitter" and "trail of the last N minutes." The hybrid stack matters.

TimescaleDB hypertables with PostGIS. A hypertable partitions automatically by time chunks (1-hour or 6-hour chunks for high-volume feeds). PostGIS GiST indexes are built per chunk. Compression on chunks older than 24 hours yields 8–14× storage reduction. Continuous aggregates pre-compute per-H3-cell hourly summaries. This is the recommended starting stack for most defense fusion deployments.

kdb+/q for the highest-throughput cases. When the platform must absorb several hundred thousand track updates per second — large radar networks fused with multi-source ESM — kdb+ columnar storage with q-sql remains best-of-breed. q-sql is austere, the licensing is expensive, but the throughput is real. The geospatial story is hand-rolled (H3 or S2 cell IDs as regular long columns) rather than a native GIS extension.

ClickHouse with geohash columns. ClickHouse handles the analytical-replay layer well — replay the last seven days of tracks against a new correlation algorithm. Geohash columns (variable-length base32 strings, prefix-equality = bounding-box containment) are the pragmatic index. ClickHouse 24.x added native geoDistance and S2 functions; on a 5-billion-row historical track table, a 10 km proximity scan resolves in 1–3 seconds. Not interactive, but acceptable for analyst workloads.

Indexes for the Operational Picture

The common operational picture (COP) is the canonical consumer of geospatial indexes. Hundreds to thousands of operator sessions, each rendering tracks within a viewport, refreshing 1–4 times per second. Without a dual-layer culling architecture, the backend dies first.

Server-side culling. The server applies the viewport bounding box as a spatial index query, then applies the time-window filter, then applies entity-type and clearance filters. The result is a bounded payload — typically capped at 5,000 tracks per response to keep the wire size predictable. On a properly indexed PostGIS hypertable, this is 6–20 ms per session.

Client-side culling. The client receives a slightly larger envelope than the visible viewport (so panning does not immediately trigger a refetch), maintains an in-memory R-Tree of tracks, and renders only what is inside the visible canvas. The R-Tree handles pan, zoom, and hover hit-testing locally at 60 FPS without server round-trips.

The dual layer is what scales to thousands of concurrent sessions. Server-side culling caps backend cost. Client-side culling caps frontend cost. Either layer alone breaks. This pattern is essential whether the COP is a thick client (Qt, .NET) or a browser map (Mapbox GL, MapLibre, deck.gl) — see the broader pipeline reference in part 1 of our defense fusion pipeline series.

Migration Realities

The migration from "everything in a single table with no spatial index" to a properly indexed, partitioned, hypertable-backed store is where most defense data programs stumble. Three rules from delivery experience:

Dual-write before cutover. The new indexed store runs in parallel with the legacy table for at least two operational cycles (typically two weeks). Writes go to both. Reads gradually shift. Rollback is a config flag, not a database migration.

Schema evolution discipline. Adding a column to a 500 GB hypertable with full GiST indexing is not a 5-second ALTER TABLE. Plan column additions as deliberate releases. Use JSONB payloads for fields under experimentation; promote stable fields to typed columns once the schema settles.

Backfill in time-ordered chunks. Restoring historical track data into the new hypertable in reverse chronological order (newest first) means the operationally relevant window is queryable first, while older chunks ingest in the background. Do not wait for the full backfill before cutting reads over.

Key insight: Geospatial indexing is not a configuration knob — it is an architectural decision that propagates through ingest, storage, query, sharding, and the operator UI. Pick the index family before the data model is frozen. For the broader fusion context, see our complete guide to defense data fusion and the deeper algorithmic treatment in track correlation algorithms for defense.