01Executive Summary
JIL Sovereign implements synchronous request-response communication over asynchronous pub-sub messaging (Kafka/RedPanda) using UUID correlation IDs and a pre-emptive promise registration pattern. This pattern enables the 7-gate validator bootstrap sequence to operate reliably over the same messaging infrastructure used for all other fleet communication, eliminating the need for separate synchronous channels.
The critical innovation is pre-emptive promise registration: the requesting agent registers a promise (callback) for a specific correlation ID before publishing the request message. This eliminates the race condition where a response arrives before the requester has set up its listener. Combined with 30-second timeouts and automatic retry logic, the pattern provides reliable request-response semantics over fire-and-forget messaging.
02Problem Statement
Validator bootstrap requires a strict sequence of request-response exchanges between the agent and JILHQ: handshake, secret delivery, configuration, key verification, and authorization. Each step depends on the previous step's response. Pub-sub messaging is inherently asynchronous and fire-and-forget, making it unsuitable for ordered, dependent communication without additional patterns.
2.1 The Request-Response Challenge
- Race Condition: If the requester publishes a message and then sets up a listener, a fast responder may reply before the listener exists, causing the response to be lost forever.
- Topic Proliferation: Creating dedicated request/response topics per validator per bootstrap phase leads to an explosion of topics that must be managed and cleaned up.
- Ordering Guarantees: Pub-sub partitioning does not guarantee that related messages arrive in order across different topics, complicating multi-step protocols.
- Timeout Handling: Without built-in request-response semantics, detecting that a response never arrived requires custom timeout and retry logic.
2.2 Why Existing Approaches Fail
| Approach | Mechanism | Race Safety | Limitation |
|---|---|---|---|
| HTTP REST | Synchronous request-response | N/A - inherently sync | Requires direct network access (firewall issues) |
| gRPC Streaming | Bidirectional stream | Connection-based | Requires persistent connection (NAT traversal) |
| Reply-To Header | Kafka reply topic | No - listener after publish | Race condition on fast responses |
| Polling | Repeated GET requests | Yes but slow | High latency, wasted bandwidth |
03Technical Architecture
The pattern uses two shared topics (request and response) plus an in-memory correlation map that stores pending promises keyed by UUID. The correlation map is populated before the request is published, guaranteeing that any response - no matter how fast - will find a waiting promise to resolve.
3.1 Communication Flow
| Step | Actor | Action | Topic |
|---|---|---|---|
| 1 | Agent | Generate UUID correlation ID | N/A (local) |
| 2 | Agent | Register promise in correlation map keyed by UUID | N/A (local) |
| 3 | Agent | Publish request message with correlation ID | jil.fleet.requests |
| 4 | HQ | Consume request, process, publish response with same correlation ID | jil.fleet.responses |
| 5 | Agent | Response consumer matches correlation ID, resolves promise | jil.fleet.responses |
| 6 | Agent | Awaited promise resolves with response payload | N/A (local) |
3.2 7-Gate Bootstrap Sequence
| Gate | Request | Response | Encryption |
|---|---|---|---|
| Gate 1: Handshake | Node ID, version, capabilities | Approved/rejected, assigned zone | Plaintext |
| Gate 2: Secrets | Request secrets bundle | NaCl-encrypted secrets (DB creds, API keys) | NaCl box |
| Gate 3: Config | Request signed config | Signed config bundle (validator.toml, zones.yaml) | Signed |
| Gate 4: Image Digest | Local image digests | Expected digests from HQ manifest | HMAC |
| Gate 5: Key Verify | Challenge-response for all 5 key types | Verification result | Ed25519 signed |
| Gate 6: Authorization | Request consensus token | 24h time-limited authorization token | AES-256-GCM |
| Gate 7: Complete | Services started confirmation | Node marked consensus-ready | HMAC |
3.3 Timeout and Retry
Each correlation promise has a 30-second timeout. If the timeout expires, the promise is rejected with a TimeoutError, the correlation ID is cleaned from the map, and the bootstrap gate retries up to 3 times with exponential backoff. If all retries fail, the bootstrap sequence halts at the current gate and reports the failure to JILHQ.
04Implementation
4.1 Correlation Map
The correlation map is an in-memory Map<string, { resolve, reject, timer }> where each entry holds the promise callbacks and a timeout timer. When a response message arrives on the response topic, the consumer looks up the correlation ID in the map. If found, it resolves the promise with the response payload and clears the timeout timer. If not found (expired or duplicate), the response is silently discarded.
4.2 Pre-Emptive Registration
The registration function creates a new Promise, stores its resolve and reject callbacks in the correlation map, starts the timeout timer, and returns the Promise to the caller. Only after this function returns does the caller publish the request message. This ordering guarantee ensures that the response consumer always has a promise waiting before any response can arrive.
4.3 NaCl Secret Delivery (Gate 2)
The secrets bundle in Gate 2 is encrypted using NaCl authenticated encryption (crypto_box). The agent generates an ephemeral X25519 keypair at startup and includes the public key in the handshake (Gate 1). JILHQ encrypts the secrets bundle using the agent's ephemeral public key and HQ's static secret key. The agent decrypts using its ephemeral private key and HQ's known public key. The ephemeral keypair is discarded after secrets are received, ensuring forward secrecy.
4.4 Memory Management
The correlation map includes automatic cleanup: timeout handlers remove expired entries, successful resolutions clear entries immediately, and a periodic sweep (every 60 seconds) removes any orphaned entries older than 120 seconds. This prevents memory leaks from lost responses or programming errors.
05Integration with JIL Ecosystem
5.1 Fleet Communication
The correlation ID pattern is used beyond bootstrap for all synchronous fleet operations: key rotation requests, config updates, image digest verification, and remote control command acknowledgments. Every operation that requires a confirmed response uses the same pre-emptive promise pattern.
5.2 RedPanda Infrastructure
The request and response topics are standard RedPanda topics with configurable retention (default 1 hour for responses, 24 hours for requests). Partition keys are set to the node ID, ensuring that all messages for a single validator are processed in order by the same consumer partition.
5.3 AI Fleet Inspector
The inspector uses the correlation ID pattern to send challenge-response probes to validators as part of the security rule evaluation. A missed response within the timeout period contributes to the AVAIL_HEARTBEAT_GONE rule score, potentially triggering auto-remediation.
5.4 Validator Update Agent
The agent implements both sides of the pattern: as a requester during bootstrap (sending requests to HQ) and as a responder during operations (receiving remote control commands from HQ and sending acknowledgments). The same correlation map and promise infrastructure handles both directions.
06Prior Art Differentiation
| System | Pattern | Race-Safe | Multi-Phase | JIL Advantage |
|---|---|---|---|---|
| Kafka Reply-To | Reply topic header | No - subscribe after publish | No - single request | JIL registers promise before publish |
| RabbitMQ RPC | Exclusive reply queue | Yes - queue pre-created | No - single request | JIL works on any pub-sub, not just RabbitMQ |
| gRPC Unary | HTTP/2 stream | Yes - connection-based | No - needs new call per step | JIL needs no persistent connection |
| NATS Request-Reply | Built-in request/reply | Yes | No - single exchange | JIL chains multi-gate sequences over same topics |
| AWS Step Functions | State machine orchestration | Yes | Yes | JIL needs no external orchestration service |
07Implementation Roadmap
Core Pattern
Implement correlation map with pre-emptive promise registration. Deploy request and response topics on RedPanda. Build timeout and retry logic with exponential backoff. Integrate into 7-gate bootstrap sequence.
Encrypted Channels
NaCl authenticated encryption for secret delivery. Signed configuration bundles with Ed25519 verification. HMAC authentication on all operational messages. Forward secrecy via ephemeral keypair exchange.
Fleet Operations
Extend pattern to all fleet control operations. Batch correlation for multi-node commands. Inspector challenge-response probes. Config update acknowledgment flow.
Resilience Hardening
Cross-datacenter topic replication for disaster recovery. Dead letter routing for undeliverable responses. Correlation map persistence for agent restart scenarios. Formal verification of race-freedom property.