TAKpilot model-agnostic edge deployment explained

Every tactical environment is different. Some units operate with persistent, high-bandwidth connectivity to a classified cloud enclave; others push forward into areas where the only network is the mesh radio in the soldier's pack. AI copilots that work only when the stars align – full connectivity, commercial cloud access, no classification restrictions – are not useful tools for military operations. TAKpilot, Corvus Intelligence's AI chat copilot for CloudTAK, is built around a model-agnostic architecture that gives commanders and system integrators a genuine choice: run Claude Opus 4.7 against the Anthropic API for peak analytical performance, or deploy Llama 3.3 70B on a ruggedized GPU server with zero internet dependency. This article covers how that architecture works, how to select the right model for a given mission context, and how to configure TAKpilot for air-gapped edge deployments step by step.

Why model agnosticism matters for defense deployments

Commercial AI products typically hardcode a single provider. That approach creates a hard dependency on internet connectivity, commercial API availability, and the provider's data handling terms – constraints that are frequently incompatible with classified or operationally sensitive environments. TAKpilot's architecture resolves this by abstracting model access behind a single interface: the OpenAI-compatible API specification. Any model that speaks this protocol – whether hosted by Anthropic, AWS, Google, or a local inference server running on the same rack as the CloudTAK node – is a valid TAKpilot backend.

This is not a theoretical flexibility. TAKpilot is operationally deployed with Ukrainian Defense Forces, where network conditions, connectivity constraints, and classification requirements vary significantly across the force. A headquarters element with reliable connectivity uses Claude Sonnet 4.6 via the Anthropic API. A forward-deployed unit with only tactical radio connectivity runs Llama 3.3 8B on a local inference node. Both units interact with the same TAKpilot interface; only the backend differs.

Key insight: TAKpilot does not hardcode any AI provider. Model selection is a runtime configuration decision made by the deployer – not a product limitation. A single TAKpilot installation can be moved from a cloud backend to an air-gapped local model by changing two environment variables and restarting the process.

Model selection guide: matching capability to mission context

TAKpilot supports three tiers of Claude models via the Anthropic API, plus the full range of open models through the OpenAI-compatible interface. The choice between them involves trade-offs among reasoning depth, latency, operational cost, and connectivity requirements.

Claude opus 4.7: complex multi-step analysis

Opus 4.7 is the highest-capability Claude model and the correct choice for tasks that require sustained multi-step reasoning: synthesizing ISR reports from multiple sources, generating detailed mission orders from fragmentary instructions, or analyzing ambiguous sensor data where false positives carry serious operational consequences. The trade-off is latency – Opus 4.7 produces tokens more slowly than Sonnet or Haiku, and the cost per token is higher. For headquarters-level S2 and S3 analysis work where response time is measured in minutes rather than seconds, Opus 4.7 is the appropriate selection. It requires connectivity to the Anthropic API or to AWS Bedrock / Google Vertex with the Opus model enabled.

Claude sonnet 4.6: balanced performance for daily COP management

Sonnet 4.6 is the default recommended model for active operations where operators are issuing conversational COP commands – placing markers, querying unit positions, building data packages, subscribing to channels. It provides strong instruction-following and tool-use accuracy at lower latency than Opus, making it responsive enough for interactive use without the cost overhead of running Opus for every map marker placement. Sonnet 4.6 is the model used in TAKpilot's operational deployment with Ukrainian forces as the baseline configuration for connected elements.

Claude haiku 4.5: speed-first for high-frequency tasks

Haiku 4.5 is optimized for latency and throughput. It is the appropriate selection for high-frequency, well-structured commands – querying current tracks, listing missions, retrieving position data for specific callsigns – where the task is routine enough that maximum reasoning capability is not needed. Haiku responds faster than Sonnet and at significantly lower cost per token, which matters in environments where TAKpilot is handling a high volume of operator queries across multiple concurrent sessions. It also makes sense as a fallback model during periods of API rate pressure.

Open models for air-gapped environments

When cloud connectivity is unavailable or classification requirements prohibit external API calls, TAKpilot routes inference to a locally hosted model through the OpenAI-compatible endpoint. Three models have been validated for TAKpilot's tool-use patterns:

Llama 3.3 70B – Meta's 70B instruction-tuned model provides the strongest tool-use accuracy among open models validated with TAKpilot. In 4-bit quantization (Q4_K_M), it fits on a dual-GPU server or a single A100 and delivers 25–40 tokens per second – adequate for conversational COP interactions. This is the recommended air-gapped default for well-resourced edge deployments.
Qwen 2.5 72B – Alibaba's Qwen 2.5 at 72B parameters performs comparably to Llama 3.3 70B on structured tool calls and has stronger multilingual performance, which can be valuable for coalition operations or non-English-speaking units. Hardware requirements are similar.
Mistral Large – Mistral's instruction-tuned model is available as a local deployment option and performs well on classification and routing tasks. It is a reasonable choice when a smaller footprint is required and the command workload is relatively structured.
Llama 3.3 8B – For severely hardware-constrained environments (single consumer GPU, 8–12 GB VRAM), the 8B variant in 4-bit quantization provides acceptable performance for simple COP queries. Complex multi-step tool sequences will degrade relative to the 70B model, so operators should expect more explicit instruction phrasing.

Key insight: Tool-use reliability decreases with model size. The 70B class models (Llama 3.3 70B, Qwen 2.5 72B) maintain acceptable tool invocation accuracy for TAKpilot's CloudTAK API calls. Models below 13B parameters show significantly higher rates of malformed tool calls and should be validated against your specific COP command workload before operational use.

Cloud backends for classified environments: AWS bedrock and Google vertex

Not all cloud deployments are equivalent from a classification and data residency standpoint. The Anthropic API sends inference traffic to Anthropic's infrastructure. For environments that require data to remain within a specific cloud enclave – AWS GovCloud, Azure Government, or a Google Workspace for Government tenancy – TAKpilot supports routing Claude models through AWS Bedrock and Google Vertex AI, which handle model hosting within the customer's cloud boundary.

AWS Bedrock exposes Claude Opus 4.7, Sonnet 4.6, and Haiku 4.5 through the standard AWS SDK. From TAKpilot's perspective, the configuration change is a swap of the API base URL and authentication method: replace the Anthropic API key with AWS IAM credentials (via environment variables or an instance role) and set TAKPILOT_PROVIDER=bedrock with the appropriate AWS region. The same Claude models are available; inference traffic stays within the AWS network boundary and is subject to the customer's AWS data handling agreements rather than Anthropic's commercial terms.

Google Vertex AI offers the same Claude model access via Google's model garden. Configuration follows the same pattern: set TAKPILOT_PROVIDER=vertex with a GCP project ID and service account credentials. For organizations already operating within Google's defense-grade cloud offerings, this keeps all inference traffic within the existing security perimeter.

OpenAI-compatible endpoint support

TAKpilot's air-gapped path uses the same OpenAI Chat Completions API specification that has become the de facto standard for local model inference servers. This means TAKpilot is compatible with any inference runtime that implements this interface – Ollama, vLLM, llama.cpp server, LM Studio, Hugging Face TGI, and any custom container that wraps a model with an OpenAI-compatible REST layer.

The configuration is intentionally minimal. Two environment variables are sufficient to redirect TAKpilot from the Anthropic API to any local endpoint:

# Direct TAKpilot to a local Ollama inference server
TAKPILOT_API_BASE=http://192.168.1.50:11434/v1
TAKPILOT_MODEL=llama3.3:70b-instruct-q4_K_M
TAKPILOT_API_KEY=ollama

# Or to a vLLM server running Qwen 2.5
TAKPILOT_API_BASE=http://10.0.1.20:8000/v1
TAKPILOT_MODEL=Qwen/Qwen2.5-72B-Instruct
TAKPILOT_API_KEY=vllm-token

When TAKPILOT_API_BASE is set, TAKpilot does not attempt to reach the Anthropic API under any circumstances. There is no fallback to cloud models if the local endpoint is unreachable – TAKpilot will return an error to the operator rather than silently routing traffic to an unintended endpoint. This is a deliberate safety behavior for classified environments.

Per-session data sandboxing

Regardless of which model backend is in use, TAKpilot enforces the same session isolation model. Each operator connection creates an in-memory session context that holds the conversation history, pending tool calls, and any COP data retrieved from CloudTAK during the session. This context is never written to disk, never shared with other sessions, and never sent to any endpoint other than the configured model backend.

When the operator disconnects – either by closing the CloudTAK chat panel or after a configurable session timeout – the session context is deleted from memory. There is no session persistence between connections. An operator who reconnects starts a fresh context with no knowledge of the previous session's commands or retrieved data.

Key insight: TAKpilot's session sandbox means that even in cloud-connected deployments, the window of exposure is bounded by session duration. A session that processes a single tactical query and closes has exposed only that query's data to the model backend. There is no accumulating data store that grows with usage.

For air-gapped deployments, the sandboxing guarantee is absolute: the session context never crosses a network boundary, because the model backend is on the same network segment. Operators handling classified COP data should use air-gapped mode against a local model – the per-session sandbox ensures that classified data is processed only by the local inference node and deleted when the session ends.

How to deploy TAKpilot with llama 3.3 on air-gapped tactical hardware

The following procedure assumes a TAKpilot Node.js instance already deployed and connected to a CloudTAK server. For initial CloudTAK deployment, see the CloudTAK server deployment guide. The inference server must be on the same tactical LAN as both CloudTAK and TAKpilot.

Step 1: provision a GPU inference server on the tactical LAN

Install Ollama on a Linux server (Ubuntu 22.04 LTS recommended) with at least one NVIDIA GPU. Verify GPU recognition:

curl -fsSL https://ollama.com/install.sh | sh
nvidia-smi   # should list GPU(s)
ollama --version

Assign the server a static IP on the tactical LAN (e.g., 192.168.1.50). Ensure port 11434 is reachable from TAKpilot's host. By default Ollama binds to 127.0.0.1 only – to accept LAN connections, set OLLAMA_HOST=0.0.0.0 in the Ollama service environment.

Step 2: pull the llama 3.3 model

# 70B model – requires ~40 GB VRAM (dual GPU or A100)
ollama pull llama3.3:70b-instruct-q4_K_M

# 8B model – fits on a single 8 GB GPU
ollama pull llama3.3:8b-instruct-q4_K_M

The pull command downloads the model weights over the internet. For fully air-gapped environments where even this initial download is prohibited, transfer the model file manually: download the GGUF file on a connected machine, copy it to the server via removable media, and import it with ollama create. Ollama's documentation covers the offline import procedure.

Step 3: verify the OpenAI-compatible endpoint

# From the TAKpilot host
curl http://192.168.1.50:11434/v1/models
# Expected: {"object":"list","data":[{"id":"llama3.3:70b-instruct-q4_K_M",...}]}

If the request times out, check that Ollama is bound to 0.0.0.0 and that no host firewall is blocking port 11434.

Step 4: configure TAKpilot environment variables

# .env or systemd service environment
TAKPILOT_API_BASE=http://192.168.1.50:11434/v1
TAKPILOT_MODEL=llama3.3:70b-instruct-q4_K_M
TAKPILOT_API_KEY=ollama

# Unset or leave empty – TAKpilot will not fall back to Anthropic
# ANTHROPIC_API_KEY=

Step 5: start TAKpilot and confirm model routing

Start the TAKpilot Node.js process and inspect the startup log for the model backend line. Then send a test command via the CloudTAK chat interface and confirm a response is returned. Monitor the inference server's GPU utilization with nvidia-smi dmon to verify inference is running locally.

Step 6: test tool-use with a COP command

Send a structured COP command: "List all active units in Alpha Company." TAKpilot should invoke the CloudTAK list_units tool and return a formatted response. If the model returns a plain text answer without invoking any tools, this indicates the model's instruction-following capability is insufficient for TAKpilot's tool-call schemas – switch to the 70B variant or to Qwen 2.5 72B.

Step 7: validate no traffic exits the network boundary

# On the TAKpilot host – capture any traffic not destined for the LAN
tcpdump -i eth0 -n 'not net 192.168.1.0/24 and not net 10.0.0.0/8'

Send several TAKpilot commands and confirm no packets appear in the tcpdump output. All model inference traffic should remain within the tactical LAN. If packets to external IPs are observed, audit the TAKpilot environment configuration – ensure TAKPILOT_API_BASE is correctly set and ANTHROPIC_API_KEY is absent from the environment.

Performance trade-offs for common COP tasks

The practical performance differences between cloud and edge models become apparent quickly across the range of tasks TAKpilot handles. The following characterizations are based on observed behavior in TAKpilot deployments, not published benchmarks.

Marker placement and unit queries are the most common COP interactions. Both Claude Haiku 4.5 and Llama 3.3 8B handle these accurately and at low latency. The task is well-structured – the operator says where to place a marker, TAKpilot calls the CloudTAK API – and requires minimal reasoning. Either model is appropriate. For the 8B variant, explicit coordinate formats (decimal degrees or MGRS) improve accuracy; the model can struggle with ambiguous location references.

Multi-step mission management – creating a mission, assigning groups, attaching a data package, and confirming the result – requires the model to maintain context across multiple tool invocations. Claude Sonnet 4.6 handles this reliably. Llama 3.3 70B handles it with acceptable accuracy. Llama 3.3 8B struggles with sequences longer than three tool calls and should not be used for complex mission management workflows.

Document and image intelligence – processing PDFs, images, and intelligence reports uploaded to the TAKpilot session – benefits significantly from larger models. Claude Opus 4.7 and Sonnet 4.6 provide the most coherent synthesis of multi-page documents. Vision-based tasks (analyzing PNG/JPG attachments) require a model with vision capability; Llama 3.3 is text-only. For vision tasks in air-gapped environments, LLaVA or a Qwen-VL variant would be required.

Frequently asked questions

What AI models does TAKpilot support out of the box?

TAKpilot ships with support for the full Claude model family – Opus 4.7, Sonnet 4.6, and Haiku 4.5 – via the Anthropic API or AWS Bedrock and Google Vertex AI. It also supports any model reachable through an OpenAI-compatible endpoint, which covers Llama 3.3, Qwen 2.5, Mistral Large, and any other open model served by Ollama, vLLM, llama.cpp, or a custom inference container. The active model is selected via the TAKPILOT_MODEL and TAKPILOT_API_BASE environment variables – no code changes required.

Can TAKpilot operate without an internet connection?

Yes. TAKpilot's air-gapped deployment path routes all model inference to a local OpenAI-compatible inference server running on the same tactical LAN or on the same physical host. No traffic leaves the network. Operators provision a model such as Llama 3.3 70B or Qwen 2.5 72B onto a ruggedized GPU server, expose it on a private endpoint (e.g., http://192.168.1.50:11434/v1), and set TAKPILOT_API_BASE to that address. TAKpilot connects to it identically to how it would connect to a cloud provider – the transport layer is the only difference.

How does TAKpilot ensure operator data does not leave the network?

TAKpilot enforces a per-session sandbox for all operator data. Each operator session receives an isolated context that is never written to disk or shared across sessions. When the operator disconnects, the session context – including all messages, tool call results, and COP references – is deleted from memory. For cloud-hosted models (Claude via Anthropic API), Anthropic's enterprise data policies apply; for air-gapped deployments with local models, data never leaves the tactical LAN because the inference endpoint is local. Operators running classified workloads should always deploy TAKpilot in air-gapped mode against a locally hosted model.

What are the hardware requirements for running Llama 3.3 70B on a tactical edge server?

Llama 3.3 70B in 4-bit quantization (GGUF Q4_K_M) requires approximately 40 GB of VRAM. A single NVIDIA RTX 4090 (24 GB) is insufficient at full precision; a dual-GPU setup or a server-grade A100/H100 is recommended for full 70B parameter inference. For more constrained tactical hardware, Llama 3.3 8B (Q4_K_M, ~5 GB VRAM) or Qwen 2.5 7B provide acceptable performance on a single consumer GPU. Inference speed at 70B on an A100 is approximately 25–40 tokens per second, which is sufficient for conversational COP interactions with acceptable latency.

Can TAKpilot switch models mid-operation without restarting the server?

Model selection in the current TAKpilot release is set at startup via environment variables and applies to all sessions. Hot-switching models without a server restart is not supported in the base configuration. However, because TAKpilot is open-source under AGPL-3.0, deployers who need per-session model selection can extend the configuration API. A common pattern for multi-classification environments is running two TAKpilot instances on separate ports – one connected to a cloud Claude endpoint for unclassified work, one connected to a local Llama endpoint for classified operations – and routing operators to the appropriate instance via a reverse proxy.

TAKpilot model flexibility: cloud, air-gapped, and edge AI for tactical environments

Why model agnosticism matters for defense deployments

Model selection guide: matching capability to mission context

Claude opus 4.7: complex multi-step analysis

Claude sonnet 4.6: balanced performance for daily COP management

Claude haiku 4.5: speed-first for high-frequency tasks

Open models for air-gapped environments

Cloud backends for classified environments: AWS bedrock and Google vertex

OpenAI-compatible endpoint support

Per-session data sandboxing

How to deploy TAKpilot with llama 3.3 on air-gapped tactical hardware

Step 1: provision a GPU inference server on the tactical LAN

Step 2: pull the llama 3.3 model

Step 3: verify the OpenAI-compatible endpoint

Step 4: configure TAKpilot environment variables

Step 5: start TAKpilot and confirm model routing

Step 6: test tool-use with a COP command

Step 7: validate no traffic exits the network boundary

Performance trade-offs for common COP tasks

Frequently asked questions

Deploy TAKpilot on Your Terms

TAKpilot model flexibility: cloud, air-gapped, and edge AI for tactical environments

Why model agnosticism matters for defense deployments

Model selection guide: matching capability to mission context

Claude opus 4.7: complex multi-step analysis

Claude sonnet 4.6: balanced performance for daily COP management

Claude haiku 4.5: speed-first for high-frequency tasks

Open models for air-gapped environments

Cloud backends for classified environments: AWS bedrock and Google vertex

OpenAI-compatible endpoint support

Per-session data sandboxing

How to deploy TAKpilot with llama 3.3 on air-gapped tactical hardware

Step 1: provision a GPU inference server on the tactical LAN

Step 2: pull the llama 3.3 model

Step 3: verify the OpenAI-compatible endpoint

Step 4: configure TAKpilot environment variables

Step 5: start TAKpilot and confirm model routing

Step 6: test tool-use with a COP command

Step 7: validate no traffic exits the network boundary

Performance trade-offs for common COP tasks

Frequently asked questions

Deploy TAKpilot on Your Terms

Related Articles