Technical Deep Dive
Full technical specification. Written for engineers, operators, and architects evaluating the platform.
Supported Workloads
Designed workloads.
The platform is purpose-built for AI inference workloads that require privacy, low latency, and predictable throughput — not general-purpose cloud compute.
LLM Inference
Quantised large language model inference via llama.cpp. GGUF model files loaded from NVMe into GPU VRAM at startup — subsequent requests served from resident weights with no reload latency.
RAG Pipelines
Retrieval-augmented generation using client-supplied document corpora. Embeddings are generated on-cluster and indexed into a vector store scoped to the client's corpus — the index persists per deployment as required for retrieval to function, but is isolated per client with no shared index between tenants. Raw document content is not retained after indexing.
Computer Vision / ASL Recognition
GPU-accelerated frame inference for sign language recognition models. Video frames processed in-memory, classification returned per frame or sequence — no video stored at rest. This is the primary workload driving SignaVision's own Deaf accessibility platform, which means the infrastructure is actively used in production for this purpose — not offered speculatively.
Batch & Embedding Jobs
Scheduled or triggered batch jobs for document ingestion, model fine-tuning preparation, and corpus embedding generation. Runs during off-peak inference windows to avoid throughput contention.
Integration Model
Integration profile.
The platform assumes a technical operator on the client side. Access is machine-to-machine by default — no browser-based UI, no managed console. Clients connect programmatically and own their integration layer.
Access Pattern
Connection
WireGuard peer provisioned per client. Public key exchange completed during onboarding. Tunnel is persistent and encrypted end-to-end
API Surface
REST endpoints via FastAPI over the tunnel. OpenAI-compatible completion endpoint available for drop-in SDK compatibility
Authentication
Mutual: WireGuard peer key at network layer + API token at application layer. Both required for any accepted request
Integration Requirements
Client Side
WireGuard client (Linux, macOS, Windows, mobile all supported). Any HTTP client capable of hitting a local tunnel endpoint
No SDK Required
Standard HTTP/JSON — works with curl, Python requests, any OpenAI-compatible client, or direct TCP from application code
Latency profile
Regional gateway minimises WireGuard round-trip. First-token latency primarily determined by model size and context length, not network overhead
Ideal Fit
Not a Fit For
Who It's For
Organizations where data cannot leave.
Law firms
Client privilege
Routing client communications or case materials through external inference infrastructure creates discovery exposure and privilege risk. Execution stays inside the firm's control boundary.
Healthcare providers
Patient data compliance
PHI cannot be processed on shared cloud infrastructure without contractual and technical controls that most providers cannot verify. This keeps patient data inside the clinical environment — no external processor in the chain.
Government agencies
Data sovereignty
CUI, ITAR, and jurisdictional data residency requirements often prohibit processing on foreign-owned or multi-tenant infrastructure. Dedicated private compute eliminates the compliance ambiguity.
Universities and research institutions
Restricted datasets
IRB agreements, export control obligations, and funding body data use agreements routinely restrict where research data can be processed. Cloud inference APIs cannot satisfy these contractual requirements.
Financial and engineering teams
IP protection
Sending proprietary models, trading logic, or engineering specifications to external inference APIs creates IP exposure that legal review rarely approves. Private infrastructure removes the external dependency entirely.
Accessibility and Deaf education programs
Sensitive communication data
Signing video from students — particularly minors — carries consent and privacy constraints that third-party cloud processing cannot meet. Data is evaluated in-session and never routed externally.
Continue
Next: the principles and philosophy behind how this platform is designed.