How does JIL scale horizontally?

JIL uses NODE_ID-aware Kafka consumer groups so each validator processes its own partition of the workload. Services are stateless by design, using Redis for session state and PostgreSQL for persistence. Adding validators linearly increases throughput.

Can individual services scale independently?

Yes. Each of JIL's 250+ production services is independently deployable and scalable. High-throughput services like the settlement consumer and ledger writer can be scaled with additional container instances without affecting other services.

Horizontal Scaling Architecture - 273 Services, Multi-Node Design

Current Status

Current Deployment

☁

DevNet (Hetzner)

1 instance in Nuremberg (nbg1). 250+ production services on a single Hetzner CPX62 instance. Hosts all microservices, portal, and CI/CD pipeline. JILHQ fleet controller runs on a dedicated server.

⬢

MainNet (Hetzner)

20 SCN validators across 13+ jurisdictions on 4 continents. 14-of-20 (70% BFT) consensus with P2P settlement via RedPanda. Each jurisdiction runs the full service stack.

→

Expansion (Planned)

14-of-20 consensus (70% BFT). Each jurisdiction runs the full service stack. No single nation or datacenter can halt the network.

Core Principle

One Image. Every Jurisdiction.

JIL Sovereign deploys the same container stack to every node in every jurisdiction. There is no "primary" server. There is no "secondary" server. Each node runs the full 192 service stack, its own PostgreSQL instance, its own Redis cache, its own SCN validator, and its own RedPanda broker. A node in Zurich is byte-for-byte identical to a node in Singapore, Abu Dhabi, or Dallas.

This means: spin up a new jurisdiction by deploying the same image. No special configuration. No hand-wiring. The node joins the RedPanda cluster, catches up on events, and starts serving traffic.

⚙

Identical Deployment

Same Docker images, same compose file, same config. Region is an environment variable, not a code change.

🔒

Session Affinity

Where a request starts is where it finishes. No mid-flight handoffs. No cross-server state lookups during processing.

🚀

Event-Driven Sync

RedPanda propagates every state change to every node. PostgreSQL consumes events with source-id tracking. Eventually consistent, always available.

🌏

Jurisdiction Autonomy

Each node enforces its local compliance rules. FINMA in Zurich. MAS in Singapore. FCA in London. Compliance is local, consensus is global.

Global Topology

20+ Nodes Across 13 Jurisdictions

Each node is a full-stack deployment running the entire platform. Nodes are grouped by jurisdiction for compliance purposes, but any node can process any request. Cloudflare routes users to the nearest healthy node.

Global Node Distribution

Cloudflare CDN - Geo-Routing

*.jilsovereign.com → Nearest Healthy Node

LIVE

🇨🇭

CH-ZUG-01

Switzerland / FINMA

LIVE

🇦🇪

AE-ADGM-01

UAE / FSRA

LIVE

🇸🇬

SG-MAS-01

Singapore / MAS

LIVE

🇺🇸

US-DAL-01

United States / FinCEN

🇬🇧

GB-LON-01

United Kingdom / FCA

🇩🇪

DE-FRA-01

Germany / BaFin

🇯🇵

JP-TYO-01

Japan / JFSA

🇧🇷

BR-SAO-01

Brazil / CVM

🇮🇳

IN-MUM-01

India / SEBI

🇦🇺

AU-SYD-01

Australia / ASIC

🇰🇷

KR-SEL-01

South Korea / FSC

🇨🇦

CA-TOR-01

Canada / OSC

🇭🇰

HK-HKG-01

Hong Kong / SFC

Node Anatomy

What Runs on Every Node

Each node is a self-contained deployment of the entire platform. Nothing depends on another node being available to process requests. The only cross-node communication is event propagation via RedPanda.

Single Node Stack (Identical Across All Nodes)

Ingress

Cloudflare Nginx TLS Termination Geo-Routing

API Layer

wallet-api explorer-api launchpad-api settlement-api ramps-api

Services

250+ production services compliance proof layer bridges liquidity recovery

Data

PostgreSQL 16 Redis Hetzner S3 RocksDB (ledger)

Consensus

SCN Validator Node CometBFT 14-of-20 Quorum MPC Co-Signer

Sync Bus

RedPanda Event Producer Event Consumer Source-ID Tag

Sync Model

RedPanda: The Global Event Bus

RedPanda is the backbone of cross-node synchronization. Every state change on any node is published as an event to RedPanda. Every other node's PostgreSQL instance consumes those events and applies them locally. The source-id on each event tells every consumer which node originated the change - so a node never re-applies its own writes.

Event Propagation Flow

Node CH-ZUG-01

Service processes request
Writes to local PostgreSQL
Emits event with source-id

→

RedPanda Cluster

Receives event on topic
Replicates across brokers
Delivers to all consumers

→

All Other Nodes

PostgreSQL consumes event
Checks source-id ≠ self
Applies state change locally

Event Schema: Every state change emitted to RedPanda follows an envelope containing a globally unique event_id, the originating source_id (node name), topic, wall-clock timestamp, monotonic sequence number, and the payload with transaction details.

PostgreSQL as Consumer: Each node's PostgreSQL instance subscribes to RedPanda topics via a dedicated consumer service. When an event arrives, the consumer checks the source_id field: if it matches the local node, the event is skipped (the write already happened locally). Otherwise, the state change is applied. This prevents duplicate writes and ensures every node converges to the same state.

RedPanda Topics

jil.settlement.{ZONE}

P2P settlement queues per compliance zone (13 zones: DE_BAFIN, US_FINCEN, SG_MAS, etc.)

HIGH VOLUME

jil.ledger.writes

Serialized ledger write requests, partition key = sender account

HIGH VOLUME

jil.state.sync.*

Cross-node state sync for 9 tables (settlements, wallet_accounts, compliance_decisions, etc.)

PER EVENT

jil.fleet.status

SCN Validator heartbeats with system, consensus, and security metrics

EVERY 60s

jil.fleet.commands

JILHQ remote control commands (refresh, cycle, pause, reboot)

PER EVENT

jil.fleet.inspector

AI Fleet Inspector events: inspection results, auto-actions, threat alerts

EVERY 60s

jil.dex.*

DEX token discovery, listing decisions, risk scores, policy evaluation, delisting alerts

PER EVENT

jil.events

Platform-wide events (transactions, bridges, governance)

HIGH VOLUME

Request Handling

Session Affinity: Start Here, Finish Here

When a user initiates a service request - a DEX trade, a bridge transfer, a settlement - the node that receives the request is the only node that processes it. There is no mid-flight handoff to another server. The originating node executes the full request lifecycle, writes results to its local database, then emits the completed event to RedPanda for all other nodes to consume.

Design Rule:

A request that starts on Node A will always complete on Node A. Other nodes learn about the result asynchronously via RedPanda. This eliminates distributed locking, cross-node latency, and split-brain scenarios during request processing.

Session Affinity Flow - DEX Trade Example:

1. User submits DEX order to Node SG-MAS-01 → 2. Validate + match locally → 3. Execute trade in local DB → 4. Return result to user → 5. Emit jil.events to RedPanda → RedPanda propagates event with source_id: SG-MAS-01 → All other nodes consume event, check source_id ≠ self, apply to local PostgreSQL → State converged.

This model means there is zero cross-node coordination during request processing. Latency is purely local. The user gets their response from the nearest node in milliseconds. The rest of the network catches up in the background.

Scalability

Why This Architecture Has No Bottlenecks

📈

No Shared Database

Each node has its own PostgreSQL. No connection pool contention. No cross-region latency on reads. Writes are local, replicated via events.

🔌

No Distributed Locks

Session affinity means one node owns each request. No distributed mutex, no two-phase commits, no cross-node coordination during processing.

🚀

No Central Router

Cloudflare geo-routes to nearest node. No load balancer bottleneck. Each node handles its own traffic independently.

🌐

Linear Horizontal Scale

Adding a node adds capacity. 20 nodes handle 20x traffic. 50 nodes handle 50x. No diminishing returns from coordination overhead.

💥

Fault Isolation

If a node goes down, Cloudflare routes traffic to the next nearest node. No cascade failures. The rest of the network is unaffected.

⚖

Compliance Locality

Each node enforces its jurisdiction's rules natively. No cross-border data movement during request processing. Regulators audit their local node.

Scaling equation:

N nodes = N x (single-node throughput). No coordination tax. No leader election. No consensus bottleneck on application requests. SCN Validator consensus is only for ledger finality, not for service requests.

Separation of Concerns

Consensus Layer vs Application Layer

A critical distinction: SCN validator consensus (14-of-20 BFT) is only required for ledger finality - confirming blocks and cross-chain bridge operations. Application-layer requests (DEX trades, wallet operations, compliance checks, document vault) are processed locally with session affinity and propagated via events. This means application throughput scales linearly with nodes, while consensus throughput is governed by the SCN validator quorum.

⚠

Consensus Layer

14-of-20 SCN Validator Quorum

Block finalization (1.5s), cross-chain bridge attestation, governance parameter changes, emergency halt/resume.

Bounded by quorum speed

✓

Application Layer

Session-Affinity Per Node

DEX trading, RFQ, AMM, wallet operations, transfers, compliance checks, KYC/AML, document vault, proof layer.

Scales linearly with nodes

Operations

Adding a New Node

Deploying a new jurisdiction takes one command. The new node pulls the same container images, connects to the RedPanda cluster, and replays events to build its local state. Once caught up, Cloudflare begins routing traffic to it.

Deploy a new node in Toronto, Canada: Set JIL_NODE_ID=CA-TOR-01, JIL_REGION=ca-toronto, JIL_JURISDICTION=CA/OSC, and REDPANDA_SEEDS to the existing cluster. Run docker compose up. The node connects to RedPanda, replays events, builds local state. PostgreSQL catches up from the event stream with source_id filtering. The SCN validator joins the consensus network via peer discovery. Cloudflare health check passes and traffic begins routing.

Time to operational: minutes, not weeks. The node is identical to every other node. The only unique values are JIL_NODE_ID, JIL_REGION, and JIL_JURISDICTION. Every image is pulled from JILHQ's secure registry after verifying cryptographic signatures.

Control Plane

JILHQ: The Fleet Command Center

JILHQ is the central control plane that governs the entire node fleet. It does not process user traffic. It does not participate in consensus. Its sole purpose is to manage, authorize, and govern every node in the network. No node can join the network, pull images, or start services without JILHQ's authorization.

Separation of Concerns:

JILHQ manages the fleet. Nodes serve the users. JILHQ never touches user data, transactions, or wallets. It controls who can run the software and what version they run - nothing more.

📦

Image Registry

Signed container images. Devnet, testnet, mainnet tracks. Ed25519 signatures.

🔑

Certificate Authority

mTLS certificates for node auth. Revoke = instant lockout.

🎮

Fleet Management

Start, stop, upgrade, rollback, deprecate. Rolling upgrades.

🔔

Alert Engine

60s evaluation loop. Auto-resolve. HMAC-signed webhooks to Slack/PagerDuty.

📜

Audit Ledger

Append-only log of every fleet action. Full chain of custody.

💻

Fleet Dashboard

Single pane of glass. Node status, alerts, images, security. Zero SSH.

Image Security

Signed Images & Secure Registry

Every container image deployed to any node must be cryptographically signed by JILHQ before a node will run it. Nodes verify signatures at pull time using the JIL root public key. An unsigned or tampered image is rejected immediately. This ensures that every node in every jurisdiction is running code that JILHQ has explicitly authorized.

Image Signing Pipeline: Developer pushes code to repo, CI/CD builds image, image tagged with commit hash. JILHQ Signer verifies build provenance, signs image with HSM key, stores in private registry. Node pulls image from registry, verifies signature vs root key, runs only if valid.

Image Signature Envelope: Every image in the JILHQ registry carries a manifest containing the image name, SHA256 digest, signing key identifier, ed25519 signature, timestamp, track (devnet/testnet/mainnet), commit hash, and multi-party approval list.

☁

Devnet

Automatic deploy on merge. Unstable, fast iteration. Internal testing only.

⚠

Testnet

Requires 1 approval. Load tested, integration tested. Partner & SCN validator preview.

✓

Mainnet

Requires 2 approvals. HSM-signed, audit-logged. Rolling deploy to nodes.

Fleet Operations

Fleet Management: Start, Stop, Upgrade, Revoke

JILHQ exposes a Fleet Management API that provides full lifecycle control over every node in the network. Every operation is authenticated via mTLS, authorized via RBAC, and logged to the immutable audit ledger.

▶

Start / Stop

Bring a node online or take it offline gracefully. Cloudflare health checks automatically route traffic away from stopped nodes.

↑

Rolling Upgrade

Push a new image version to nodes one-by-one or by jurisdiction. Each node pulls the new signed image, restarts, and rejoins. Zero downtime.

↺

Rollback

Instant rollback to any previous signed image. JILHQ pins the target version and nodes revert on next health cycle. No manual SSH.

🚫

Revoke

Permanently revoke a node's certificate. The node is immediately cut off from the registry, RedPanda cluster, and peer network. Nuclear option.

📈

Health Monitoring

Every node reports health, version, and sync status to JILHQ. Dashboard shows fleet-wide view. Alerts on drift, lag, or failures.

🛠

Config Push

Push compliance zone configs, fee schedules, and parameter updates to specific nodes or jurisdictions without a full image redeploy.

Fleet Management API examples: GET /v1/fleet/nodes (list all nodes and status), POST /v1/fleet/nodes (start a new node), POST /v1/fleet/nodes/:id/upgrade (upgrade a specific node), POST /v1/fleet/jurisdictions/:zone/upgrade (upgrade all nodes in a jurisdiction), POST /v1/fleet/nodes/:id/stop (emergency stop), DELETE /v1/fleet/nodes/:id (revoke permanently), POST /v1/fleet/jurisdictions/:zone/config (push config update), GET /v1/fleet/audit (view audit log).

Security Model

Node Authorization & mTLS

No node can participate in the JIL network without authorization from JILHQ. Each node receives a unique mTLS certificate from JILHQ's Certificate Authority. This certificate is required for three things: pulling images from the registry, connecting to the RedPanda cluster, and joining the SCN validator peer network.

1

Node requests authorization
Sends hardware attestation + jurisdiction + operator identity to JILHQ

2

JILHQ verifies and approves
Operator KYC validated, jurisdiction confirmed, hardware meets requirements

3

JILHQ issues mTLS certificate
Unique cert with node_id, jurisdiction, and expiration. Signed by JIL CA root key.

4

Node uses cert for all operations
Image pulls, RedPanda connections, SCN validator peer-to-peer - all require valid cert

✕

Revocation = instant lockout
JILHQ adds cert to CRL. Node cannot pull images, connect to RedPanda, or join peers. Immediate, irreversible.

Three locks, one key:

A node's mTLS certificate is the single credential that unlocks the image registry, the RedPanda cluster, and the SCN validator network. Revoke the cert and the node is completely severed from all three - no partial access possible.

Accountability

Immutable Audit Ledger

Every action taken by JILHQ is logged to an append-only audit ledger. Who pushed an image, who approved a promotion, who started a node, who revoked a certificate - every operation has a permanent, tamper-evident record. This is the chain of custody for the entire fleet.

Sample audit entries: image.push (ci-pipeline pushes wallet-api:v95.3 to devnet), image.promote (ops-lead promotes from devnet to testnet, 1 approval), image.promote (security-lead promotes from testnet to mainnet, 2 approvals), node.upgrade (ops-lead upgrades CH-ZUG-01 from v95.2 to v95.3 with rolling strategy), cert.revoke (security-lead revokes ROGUE-01 for unauthorized modification).

Unified Visibility

Fleet Operations Dashboard

The Fleet Operations Dashboard is a dedicated web interface that provides a single pane of glass over the entire JIL node fleet. It connects exclusively to JILHQ's APIs and presents real-time status across all 20+ nodes, 13+ jurisdictions, and 301 production services per node. Operations staff never need to SSH into individual nodes - everything is managed from this dashboard.

🌐

Fleet Overview

Real-time status map of all nodes across 13+ jurisdictions. Node count, healthy/degraded/offline breakdown, fleet-wide version distribution, and sync lag at a glance.

🖥

Node Detail

Drill into any node to see container-level health, CPU/memory/disk usage, running image versions, certificate expiry, and last heartbeat. Start, stop, upgrade, or deprecate from one screen.

📦

Image Registry

Full image registry browser showing all container images across devnet/testnet/mainnet tracks. Sign images, promote across tracks, view signature chains, and trigger fleet-wide upgrades.

🔔

Alert Management

View all active alerts, acknowledge incidents, configure alert rules and thresholds. Webhook integration for Slack, PagerDuty, or any endpoint with HMAC-signed delivery.

🔒

Security View

Certificate expiry grid across all nodes, unsigned image detection, mTLS status, and security audit results aggregated from every node's automated security scans.

📡

Broadcast Commands

Send fleet-wide commands: rolling upgrades, restart services, enter maintenance mode, or emergency stop. Target all nodes, a jurisdiction, or specific nodes.

Zero SSH, Full Control:

The Fleet Dashboard communicates only with JILHQ APIs. No direct node access is ever needed. This means operations staff can manage a 20-node global fleet from a single browser tab with full audit logging of every action.

Proactive Monitoring

Alert System & Webhook Delivery

JILHQ runs a continuous background alert evaluation loop every 60 seconds. It checks fleet-wide conditions against configurable rules and fires alerts when thresholds are breached. Alerts are automatically resolved when conditions clear. Every alert is stored in PostgreSQL and optionally delivered to external systems via HMAC-SHA256 signed webhooks.

Alert Evaluation Pipeline: Node heartbeats (every 60s each node posts health data to JILHQ) → JILHQ Evaluator (background loop checks all rules against fleet state every 60s) → Alert Fired (stored in PostgreSQL, shown in dashboard, webhook delivered) → Webhook (HMAC-SHA256 signed, Slack, PagerDuty, 3 retries with backoff).

💔

Heartbeat Stale

A node hasn't reported a heartbeat in 5+ minutes. Indicates node is offline, frozen, or network-partitioned. Auto-resolves when heartbeats resume.

🔐

Certificate Expiring

A node's mTLS certificate will expire within 7 days (critical) or 30 days (warning). Prevents surprise lockouts when certs expire.

⚠

Image Unsigned

A container image in the registry has no valid Ed25519 signature. No node will pull unsigned images, but this alerts operators to signing gaps.

🔴

Service Down

A node's health check reports one or more critical services as unhealthy. Triggers investigation before the node is automatically routed around.

💾

Disk Critical

A node's disk usage exceeds 90%. At this threshold, services may fail to write logs or state. Daily maintenance scripts handle cleanup, but this catches edge cases.

✅

Auto-Resolve

When a condition clears (heartbeat resumes, cert renewed, disk freed), the alert is automatically resolved. No manual acknowledgment needed for transient issues.

Node Agents

Self-Maintenance Agents

Every node in the fleet runs four automated maintenance agents via cron. These agents handle health monitoring, daily cleanup, security auditing, and image updates - without any human intervention. Each agent reports its findings back to JILHQ for centralized visibility and alerting.

❤

Health Check Agent

Every 60 Seconds - Checks all container health states, auto-restarts unhealthy containers, reports CPU/memory/disk to JILHQ, posts heartbeat with container counts.

🧹

Maintenance Agent

Daily at 2:00 AM UTC - Docker image & volume prune, log rotation (100 MB threshold), disk usage check (85% warning), temp file cleanup (7-day max).

🛡

Security Audit Agent

Every 6 Hours - Verifies image signatures vs JILHQ, checks mTLS certificate expiry, scans for root-running containers, validates firewall & SSH config, detects unexpected open ports.

⬇

Image Update Agent

Every 4 Hours - Queries JILHQ for latest images, compares running digests vs registry, pulls & verifies new signed images, rolling restart per container, reports update results to audit log.

One-Command Node Setup: New nodes are bootstrapped with a single command (install-cron.sh) that installs the configuration file, creates log directories, and registers all four cron agents. The installer is idempotent - safe to run multiple times on the same node.

Self-healing fleet:

With these four agents running on every node, the fleet is largely self-maintaining. Unhealthy containers are auto-restarted, disk space is reclaimed nightly, security posture is audited 4 times a day, and image updates are applied automatically. Human intervention is only needed for escalations that reach the JILHQ alert system.

Complete Picture

Full System Architecture

Fleet Operations Dashboard (Overview, Nodes, Images, Alerts, Security) communicates via API calls to JILHQ - Control Plane (Registry, CA, Fleet API, Alerts, Audit, Signer), which pushes signed images, mTLS certs, and alerts down to nodes. Cloudflare CDN geo-routes traffic to the nearest healthy node.

Node fleet: CH-ZUG, AE-ADGM, SG-MAS, US-DAL, and 16 more nodes - each running the full stack with 301 production services, PostgreSQL, SCN Validator, RedPanda, and 4 self-maintenance agents. All nodes are connected peer-to-peer via the RedPanda Event Mesh (source_id tagging, PostgreSQL consumers, eventually consistent).

At the bottom layer: 14-of-20 SCN Validator Quorum (CometBFT) handles block finality & bridge attestation with 1.5s block time, 70% BFT threshold, across 13+ jurisdictions.

Horizontal Scaling Architecture

Current Deployment

DevNet (Hetzner)

MainNet (Hetzner)

Expansion (Planned)

One Image. Every Jurisdiction.

Identical Deployment

Session Affinity

Event-Driven Sync

Jurisdiction Autonomy

20+ Nodes Across 13 Jurisdictions

What Runs on Every Node

RedPanda: The Global Event Bus

Session Affinity: Start Here, Finish Here

Why This Architecture Has No Bottlenecks

No Shared Database

No Distributed Locks

No Central Router

Linear Horizontal Scale

Fault Isolation

Compliance Locality

Consensus Layer vs Application Layer

Consensus Layer

Application Layer

Adding a New Node

JILHQ: The Fleet Command Center

Image Registry

Certificate Authority

Fleet Management

Alert Engine

Audit Ledger

Fleet Dashboard

Signed Images & Secure Registry

Devnet

Testnet

Mainnet

Fleet Management: Start, Stop, Upgrade, Revoke

Start / Stop

Rolling Upgrade

Rollback

Revoke

Health Monitoring

Config Push

Node Authorization & mTLS

Immutable Audit Ledger

Fleet Operations Dashboard

Fleet Overview

Node Detail

Image Registry

Alert Management

Security View

Broadcast Commands

Alert System & Webhook Delivery

Heartbeat Stale

Certificate Expiring

Image Unsigned

Service Down

Disk Critical

Auto-Resolve

Self-Maintenance Agents

Health Check Agent

Maintenance Agent

Security Audit Agent

Image Update Agent

Full System Architecture

20 nodes. 13+ jurisdictions. 301 production services. Zero single points of failure.