Targets — SignaVision Architecture

Supported Workloads

Designed workloads.

The platform is purpose-built for AI inference workloads that require privacy, low latency, and predictable throughput — not general-purpose cloud compute.

LLM Inference

Quantised large language model inference via llama.cpp. GGUF model files loaded from NVMe into GPU VRAM at startup — subsequent requests served from resident weights with no reload latency.

Streaming token output over persistent HTTP connection

Context window configurable per client agreement

No cross-session KV-cache leakage — context isolated per request

RAG Pipelines

Retrieval-augmented generation using client-supplied document corpora. Embeddings are generated on-cluster and indexed into a vector store scoped to the client's corpus — the index persists per deployment as required for retrieval to function, but is isolated per client with no shared index between tenants. Raw document content is not retained after indexing.

sentence-transformers for embedding generation

FAISS or Qdrant for vector index depending on corpus size

Client corpus isolated — no shared index between tenants

Computer Vision / ASL Recognition

GPU-accelerated frame inference for sign language recognition models. Video frames processed in-memory, classification returned per frame or sequence — no video stored at rest. This is the primary workload driving SignaVision's own Deaf accessibility platform, which means the infrastructure is actively used in production for this purpose — not offered speculatively.

Direct ingest from application layer over WireGuard tunnel

PyTorch / ONNX model runtimes supported

Primary workload for SignaVision's own accessibility platform

Batch & Embedding Jobs

Scheduled or triggered batch jobs for document ingestion, model fine-tuning preparation, and corpus embedding generation. Runs during off-peak inference windows to avoid throughput contention.

Job submission via authenticated API endpoint

Output delivered to client — not retained on-cluster after delivery

Audit receipt generated per job with hash of output payload

Integration Model

Integration profile.

The platform assumes a technical operator on the client side. Access is machine-to-machine by default — no browser-based UI, no managed console. Clients connect programmatically and own their integration layer.

Access Pattern

Connection

WireGuard peer provisioned per client. Public key exchange completed during onboarding. Tunnel is persistent and encrypted end-to-end

API Surface

REST endpoints via FastAPI over the tunnel. OpenAI-compatible completion endpoint available for drop-in SDK compatibility

Authentication

Mutual: WireGuard peer key at network layer + API token at application layer. Both required for any accepted request

Integration Requirements

Client Side

WireGuard client (Linux, macOS, Windows, mobile all supported). Any HTTP client capable of hitting a local tunnel endpoint

No SDK Required

Standard HTTP/JSON — works with curl, Python requests, any OpenAI-compatible client, or direct TCP from application code

Latency profile

Regional gateway minimises WireGuard round-trip. First-token latency primarily determined by model size and context length, not network overhead

Ideal Fit

Organizations that retain control over their data and execution

Teams capable of integrating and operating workloads programmatically

Environments requiring private infrastructure for compliance or policy

Use cases where data cannot leave organizational control

Not a Fit For

Consumer-facing products requiring multi-tenant browser sessions with no engineering team

Workloads requiring persistent training runs or full model fine-tuning at scale

Clients who need a managed UI dashboard rather than an API

Burst-only workloads where cost beats privacy — public cloud inference is cheaper at small scale

Who It's For

Organizations where data cannot leave.

Law firms

Client privilege

Routing client communications or case materials through external inference infrastructure creates discovery exposure and privilege risk. Execution stays inside the firm's control boundary.

Healthcare providers

Patient data compliance

PHI cannot be processed on shared cloud infrastructure without contractual and technical controls that most providers cannot verify. This keeps patient data inside the clinical environment — no external processor in the chain.

Government agencies

Data sovereignty

CUI, ITAR, and jurisdictional data residency requirements often prohibit processing on foreign-owned or multi-tenant infrastructure. Dedicated private compute eliminates the compliance ambiguity.

Universities and research institutions

Restricted datasets

IRB agreements, export control obligations, and funding body data use agreements routinely restrict where research data can be processed. Cloud inference APIs cannot satisfy these contractual requirements.

Financial and engineering teams

IP protection

Sending proprietary models, trading logic, or engineering specifications to external inference APIs creates IP exposure that legal review rarely approves. Private infrastructure removes the external dependency entirely.

Accessibility and Deaf education programs

Sensitive communication data

Signing video from students — particularly minors — carries consent and privacy constraints that third-party cloud processing cannot meet. Data is evaluated in-session and never routed externally.

Continue

Next: the principles and philosophy behind how this platform is designed.

Design Principles →

Technical Deep Dive

Designed workloads.

Integration profile.

Organizations where data cannot leave.