Platform

Overview

How It Works

Beneficiary Identity

Policy Corridors

Deterministic Finality

Architecture

Security Model

Governance

Integration

Solutions

Corridors Overview

Institutional Overview

Pricing

All Scenarios

Humanitarian Impact Fund

Assurance

Technical Assurance

Verify Receipt

Receipt Example

Developers

Documentation

APIs & Bridges

Architecture Docs

Glossary

BID API

Company

About

Team

Partners

Roadmap

Investors

Contact

Blog

All Documentation

Schedule Consultation
Operational Assurance

Operational Resilience

How JIL maintains settlement availability under adverse conditions. Validator fault tolerance, network partition handling, automated recovery, and disaster recovery procedures.

← All Assurance

Resilience Philosophy

JIL's operational model prioritizes safety over liveness. The system is designed to halt cleanly under adverse conditions rather than produce incorrect settlements. Resilience means graceful degradation, not infinite availability.

  • Safety first: Incorrect settlements are categorically worse than delayed settlements
  • Graceful degradation: Node failures reduce capacity but do not compromise correctness
  • Automated recovery: SentinelAI triggers fleet-wide recovery when health drops below thresholds
  • Transparent status: System health is observable, not opaque

Validator Network Topology

The validator network is distributed across 13 jurisdictions with geographic and jurisdictional diversity.

ZoneLocationRoleSpecifications
GenesisNuremberg, DEFull node (seed)CPX52 - 16 vCPU, 32 GB RAM
USAshburn, USFull nodeCPX42 - 8 vCPU, 16 GB RAM
DENuremberg, DEFull nodeCPX42 - 8 vCPU, 16 GB RAM
EUHelsinki, FIFull nodeCPX42 - 8 vCPU, 16 GB RAM
SGSingapore, SGFull nodeCPX42 - 8 vCPU, 16 GB RAM
CHZurich, CHCompact nodeCPX32 - 4 vCPU, 8 GB RAM
JPTokyo, JPCompact nodeCPX32 - 4 vCPU, 8 GB RAM
GBLondon, GBCompact nodeCPX32 - 4 vCPU, 8 GB RAM
AEDubai, AECompact nodeCPX32 - 4 vCPU, 8 GB RAM
BRSao Paulo, BRCompact nodeCPX32 - 4 vCPU, 8 GB RAM

Node Failure Tolerance

The 14-of-20 consensus design provides tolerance for up to 6 simultaneous node failures while maintaining settlement capability.

Nodes OnlineStatusSettlement Capability
20 of 20Full capacityNormal operation, maximum throughput
14-19 of 20Degraded but operationalFull settlement capability, reduced redundancy
10-13 of 20Below quorumSettlement halted - safety preserved. Queue pending.
Below 10CriticalConsensus halts. No settlements processed. Fleet recovery triggered.

Current architecture supports 10 of 20 validator slots filled. Additional validators are planned post May 2026.

Network Partition Handling

Network partitions - where groups of validators cannot communicate with each other - are handled according to the CAP theorem tradeoff: JIL chooses consistency (safety) over availability.

Minority Partition

If fewer than 14 validators are in a partition, that partition cannot reach quorum. Settlements halt in the minority partition. No incorrect settlements are produced.

Majority Partition

The partition with 14+ validators continues processing. The minority partition halts. When connectivity is restored, the minority partition synchronizes from the majority.

Even Split

If the network splits evenly (10/10), neither partition can reach quorum. All settlement processing halts until connectivity is restored. This is the safest outcome.

Recovery

When partitions heal, validators sync state and resume normal consensus. Queued settlements are processed in order. No manual intervention required for standard partitions.

Infrastructure Design

  • Multi-provider: Validators run on Hetzner infrastructure across multiple data centers - no single data center failure takes out the network
  • Stateless application layer: Service containers are stateless and can be replaced without data loss - state lives in PostgreSQL and Kafka
  • Data persistence: PostgreSQL data on NVMe SSDs with automated backup schedules
  • Image distribution: Docker images distributed via JILHQ registry with digest verification - corrupted images are rejected
  • DNS redundancy: Cloudflare provides DNS with anycast routing and DDoS protection

Monitoring Architecture

SentinelAI Fleet Inspector

Automated monitoring system that continuously evaluates validator fleet health.

  • Health checks every 60 seconds across all validator nodes
  • Threat scoring based on heartbeat patterns, resource usage, and attestation behavior
  • Automated fleet cycle when fleet health drops below 30% for 5 consecutive cycles
  • Anti-loop protection: max 3 fleet cycles per 2 hours, max 2 failed cycles per node per 2 hours

Metrics and Alerting

  • Prometheus metrics collection from all services
  • Settlement latency, throughput, and error rate tracking
  • Validator uptime and consensus participation monitoring
  • Bridge balance reconciliation and deposit confirmation tracking

Incident Response

SeverityDescriptionResponse TimeActions
CriticalConsensus failure, security breach, data corruptionImmediateHalt settlements, isolate affected systems, forensic investigation
HighValidator outage (3+), bridge anomaly, performance degradationUnder 15 minAutomated recovery attempt, manual investigation if auto-recovery fails
MediumSingle node failure, elevated error rates, monitoring gapsUnder 1 hourNode restart, log analysis, root cause investigation
LowPerformance warning, configuration drift, non-critical service issueNext business dayScheduled maintenance, configuration update

Disaster Recovery

  • Golden snapshots: Known-good validator state backed up to Hetzner S3. Recovery from snapshot restores last known healthy state.
  • Database backup: PostgreSQL continuous WAL archiving with point-in-time recovery capability
  • Image registry backup: JILHQ registry backed up to S3 daily at 04:00 UTC via systemd timer
  • Configuration recovery: All configuration stored in version control. Fresh node can be provisioned from fleet registry and compose files.
  • Recovery time objective: Single node recovery under 30 minutes. Full fleet recovery under 2 hours.

Resilience Testing

The following resilience scenarios are regularly tested.

  • Single validator restart during active consensus
  • Multiple simultaneous validator failures (chaos testing)
  • Network partition simulation between validator groups
  • Image registry unavailability during deployment
  • Database failover and recovery from backup
  • SentinelAI fleet cycle under various health conditions

Ready to verify?

Start with a structured POC. Evaluate JIL settlement infrastructure on a single corridor.

Request a POC All Assurance