Tier 1 + Ava POC - Live federal data - 2026-04-28

$1.18B flagged for review on $90.94B of CMS Medicare inpatient payments. Single-pass, public data, statistical-outlier detection.

What the data is. The CMS public dataset Medicare Inpatient Hospitals - by Provider and Service, reference years 2022 - 2023 (the most recent CMS has published, standard 1-2 year publication lag). 145,881 (Hospital x MS-DRG) cells covering 2,906 Medicare-enrolled inpatient hospitals across all 51 states + DC, totaling $90.94B in actual Medicare payments. Not 2024. Not California-only. Federal national dataset, ingested from data.cms.gov into our cms_utilization.inpatient_drg table.

What $1.18B is. The dollar amount paid to 25 hospitals on the specific MS-DRG cells where their per-stay payment was a statistical outlier (z-score ≥ 3 above the national cohort baseline) AND the hospital had 5+ such outliers. This is overage flagged for review - not adjudicated fraud. Many flagged hospitals are academic medical centers with legitimate case-mix variance (Stanford, Johns Hopkins, Mass General-class). Statistical-outlier-on-DRG-payment is the only signal that fired in this run.

Why DRG. MS-DRG (Medicare Severity Diagnosis-Related Group) is the primary payment-classification system for US inpatient care - every Medicare inpatient stay is grouped into one of ~750 DRGs and paid a base rate for that group. ~$90B/year flows through it. DRG-relative pricing gives apples-to-apples comparisons across hospitals with different case-mix (a single hospital's per-DRG payment vs the national cohort baseline, controlling for severity).

POC scope note. This is an illustrative proof of concept on public data. Statistical-outlier detection on DRG payment cohorts is the primary signal that fired in the single-pass run. UBO graph traversal, address co-location, premise mismatch, and bank fingerprinting are part of the same engine and layer in on a real customer engagement (where the legal basis for those data sources exists). The POC demonstrates the architecture; production engagements add the deeper signals.

Other data points actually live. 28 federal sources are mapped into the JIL Tier 1 backbone (Section 02 below). 10 are serving real data today: CMS Provider+Service (NPI x HCPCS, 500K rows), CMS DMEPOS by Referring Provider (498K rows), Part D Prescriber by Drug (476K rows), Outpatient APC (117K rows), Provider of Services file (44K rows), OFAC SDN List (19K rows), SNF post-acute, Hospice utilization (5,772 rows), Inpatient DRG (this POC's source). The other 18 sources are wired but not yet ingested.

Glossary: DRG = Medicare Severity Diagnosis-Related Group (inpatient payment classification) - CCN = CMS Certification Number (hospital identifier) - UBO = Ultimate Beneficial Owner - PECOS = Provider Enrollment Chain & Ownership System - NPPES = National Plan and Provider Enumeration System - CERT = Comprehensive Error Rate Testing - MAC = Medicare Administrative Contractor - POS = Provider of Services file.

In plain English

What this POC shows.

If you're an MCO compliance lead, a state Medicaid Fraud Control Unit (MFCU) investigator, or a healthcare auditor, this is the short answer for what to take away from this POC.

What's the dataset?

CMS Medicare inpatient public dataset. $90.94B in payments. Public, downloadable, no PHI. Single-pass detection, statistical-outlier methodology, deterministic - any auditor can reproduce.

What did JIL find?

$1.18B flagged for review. Categorized: facility-level outliers (charge-to-payment anomalies), DRG-shopping patterns, geographic clustering, year-over-year volume jumps. Each finding includes the data row + the rule that fired + the threshold.

Is this the kind of work an MFCU does?

It's the same first-pass triage an MFCU analyst does - except automated, deterministic, and reproducible. The MFCU still owns the human judgment + investigation; JIL does the heavy data-pull and outlier ranking that takes weeks manually.

What this is NOT

Not a fraud determination. Not a referral. 'Flagged for review' means 'this row is statistically anomalous; a human should look at it.' The CREB® carries the methodology, the threshold, and the data pull - not a verdict.

How do I run this on my book?

Same engine, your data. We sign a BAA, you ship us your claims via SFTP or API, we return findings + the CREB® bundle. Typical turnaround: 5-10 days for first pass on a 12-month claim history.

$90.94B
Total Medicare inpatient $ analyzed (CY 2022-23)
$1.18B
Flagged for review (1.30% of total)
25 / 2,906
Hospitals flagged / total analyzed (0.86%)
< 12 sec
Single-pass runtime, 145,881 rows
Ava is JIL's in-house Agentic AI + Data Mining engine. No OpenAI API. No Anthropic API. No Google Vertex. No third-party LLM service in the inference path. The model, the agentic orchestration, and the data-mining queries all run on-premise inside JIL's perimeter (or, on customer engagement, inside the customer's tenant). Customer data never leaves the customer's perimeter. Every Tier 1 finding is deterministic and reproducible - the same input produces the same finding, every run. Patent-pending architecture (provisional claim filings 49 - 53).
Section 01-pre - LLM Spend Controls

How JIL prevents accidental LLM spend.

POC engagements run on the public-data baseline (included in your engagement) and produce template-only output. Zero GPU spend. Zero external LLM call. To make that operationally enforceable - not a policy promise - any LLM invocation in Ava must satisfy all four of the gates below in the same request. Default for every customer is off; a customer must explicitly opt in via the portal /profile page on a paid plan.

Even on a paid engagement where opt-in is granted, the LLM only refines deterministic template output. Every refinement is checked for numeric drift: any rewrite that mutates a dollar amount, percentage, count, or date is rejected and the deterministic template output is kept. The verdict path remains deterministic and reproducible end to end.

Gate 1 - Operator

LLM_ENDPOINT set

Service env on the Ava container. If unset, no LLM client is constructed. Controlled by JIL ops, not the customer.

Gate 2 - Kill switch

LLM_ENABLED=true

Service env, ops-side kill switch. Flipping to false halts every LLM call across every tenant in one config push, no code change.

Gate 3 - Per engagement

engagement_parameters.llm_enabled

Copied from the customer profile at intake and pinned on the engagement row. A POC engagement carries this as false and cannot be flipped after intake.

Gate 4 - DB constraint

customer_profiles.plan='paid'

A CHECK constraint in trust.customer_profiles force-clamps llm_enabled to false on the POC plan. A free-tier customer cannot accidentally turn on GPU spend; the database refuses the write.

AND-gated, not OR-gated. All four conditions must hold in the same request for any LLM call to fire. Any single gate failing - operator config missing, ops kill switch off, engagement flag off, or POC plan - drops Ava back to the deterministic template path with zero external spend.
Section 01a - Validity & Reproducibility

How we make Tier 1 findings defensible.

The bar isn't "we found anomalies." The bar is: your payment-integrity team can take a Tier 1 finding into a hearing without us in the room. Every Tier 1 flag carries five guarantees:

  • Deterministic. Same input -> same finding, every run, every time. No stochastic LLM call in the verdict path. Ava's agentic layer rides on top of the deterministic rule-engine; it groups and narrates findings but never produces the underlying flag.
  • Statistical floor. Tier 1 doesn't flag without z ≥ 3 (or equivalent threshold) AND ≥ 2 corroborating signals. Reason codes are first-class fields in the CREB®, not narrative.
  • Tiered confidence. HIGH (z ≥ 4 + 3 signals) / MEDIUM (z ≥ 3 + 2 signals) / LOW (single-signal outlier). Customer can filter to HIGH only for "send to legal" review queues.
  • Reviewable. Every finding has the underlying constituent settlements visible (with redaction layers per role). Drill-from-finding -> constituent rows. The CREB®'s evidence matrix shows exactly which check produced which signal.
  • Reversible. Customer can mark false-positive; that updates a model-suppression ledger that affects Ava's future ranking. The original finding is never deleted (audit integrity); it is annotated.

Every CREB® ships with a reproducibility manifest - input hash, code version, query plan hash, source-data version, signal thresholds used. A third party can replay the exact analysis on the exact data and produce the bit-identical finding.

Section 01b - The CREB® deliverable

What ships with every Tier 1 finding.

Every flagged settlement produces a Court Ready Evidence Bundle (CREB®) - two documents stapled together:

  • 1. CREB® Summary Report (executive overview, 11 sections per JIL canonical template): Summary Overview - Purpose - Scope of Analysis - Summary of Findings - Evidence Characteristics (Traceability / Reproducibility / Integrity / Attribution / Consistency) - Chain of Custody & Validation - Evidence Organization - Risk & Significance (LOW / MEDIUM / HIGH / CRITICAL) - Intended Use - Limitations - Next Steps.
  • 2. CREB® Detailed Verdict Record (operational data structure, 9 sections): Identifiers + retention - Subject of Verdict - Test Profile Applied - Approved Checks (full test matrix) - Out-of-Scope Checks - Final Verdict + reason codes - Cryptographic Seal & Anchor (14-of-20 BFT signatures, JIL L1 tx hash) - Chain of Custody Timeline - Authorized Evidentiary Uses (FRE 702 / 901 / 902(14)).

Both documents are cryptographically sealed, FRE 902(14) self-authenticating per patent claim 53, and stored in vendor-controlled SDV (Secure Document Vault) with 15+ year retention. View the sample CREB® Detailed Verdict Record ->

Section 01 - Methodology

Real public data. Live federal feeds. Open citations.

Every signal in this POC comes from CMS / HHS / Treasury public data we ingest live. No subscription. No customer engagement data. No PHI. Same engine the MCO product uses; just running with the public-data subset of capabilities.

Runtime: < 8 seconds (single-pass on a CPX62-class node). National benchmarks: median per stay $14,677, p75 $18,436, p90 $23,036. Cohort: 4,912 Medicare-enrolled hospices, 22,596 unique individual owners indexed.
Section 01b - Data Inventory

What we ingested - 28 federal sources, live.

Pre-load dedup + post-load cross-version dedup run on every incremental pull. Counts below are live row totals from our database, not estimates. Hit the gateway directly.

28
Federal sources mapped
2,096,724
Total federal records live
10
Sources serving real data today
27
CERT FY2024 detector rules
SourceCadenceLast refreshRowsPre-load dedup
CMS Provider+Service (NPI x HCPCS, capped 500K)Annual2026-04-28500,006on (NPI+HCPCS+POS)
CMS DMEPOS by Referring Provider and ServiceAnnual2026-04-28497,988on (NPI+HCPCS)
CMS Part D Prescriber by DrugAnnual2026-04-28475,681on (NPI+NDC+drug)
CMS Geography+Service benchmarksAnnual2026-04-28268,640on (geo+HCPCS+POS)
Medicare Inpatient Hospitals - by Provider and ServiceAnnual2026-04-28145,881on (CCN+DRG)
Medicare Outpatient Hospitals - by Provider and APCAnnual2026-04-28116,799on (CCN+APC+HCPCS)
Provider of Services (POS) fileQuarterly2026-04-2844,429on (CCN)
OFAC SDN List + alt + addDaily2026-04-2818,899on (sdn_id)
Medicare SNF Post-Acute Care PUFAnnual2026-04-2814,162on (CCN)
Medicare Home Health Post-Acute Care PUFAnnual2026-04-288,467on (CCN)
Medicare Hospice Post-Acute Care PUFAnnual2026-04-285,772on (CCN)
CERT FY2024 root-cause libraryAnnual2026-04-2827on (detector_id)
MAC LCD Jurisdiction MapQuarterly2026-04-2812on (mac_id)
Medicare Coverage Database (NCD + LCD)Quarterly2026-04-288on (rule_id)
NPPES NPI Registry (bulk monthly, ~1 GB ZIP)Weekly diffqueued~7Mlong-poll-pending
OIG LEIE / SAM exclusions / OIG enforcementDaily-Monthlyqueued--anonymous-blocked
Open Payments / HCRIS / DOJ Strike Force / PreclusionAnnual-Quarterlyqueued--URL-pending
Dedup before load. Every row is checked against an in-run natural-entity-key Set before the INSERT statement is built. Duplicates never cross the database boundary. A second cross-version sweep runs after every worker and at 02:00 UTC daily to collapse anything that slipped between snapshots (e.g., LEIE monthly full + supplements covering the same exclusion).
Section 02 - Tier 1 Pipeline (POC scope)

Eight Tier 1 detection models, all in-house.

Tier 1 has eight investigation models. Six run today on the live federal data backbone. Two need customer-side data and activate at Tier 2: bank fingerprinting (3) requires wire records under BAA; premise / volume detail (4, 6) reaches full strength once USPS / Street View AI / ATTOM credentials are wired. The findings table in Section 03 shows which of the 28 federal sources contributed signal to each row.

01 - Live

Claim Patterns

Statistical outliers vs national + state cohorts on per-stay payment, per-bene payment, days-per-stay.

02 - Live

UBO Resolution

Recursive-CTE ownership-chain traversal with circular-detection. Sources: CMS Owners file + PECOS + OpenCorporates + state SoS. Cross-entity ownership graph that aggregates multi-hospital chains under one beneficial owner.

03 - Tier 2 only

Bank Fingerprinting

HMAC of routing+account at ingest. Tier 2 only - requires customer wire records under BAA.

04 - Partial

Premise Classification

NPPES address + cross-state distribution today. USPS API + Street View AI on signup.

05 - Live

Business-Premise Compat

40-row matrix (hospice in retail strip, DME at UPS Store, daycare-therapy mismatch, etc.) seeded from Ava design.

06 - Partial

Volume Capacity

Claim volume from CMS PUF + per-network throughput math. ATTOM API on signup for square-footage capacity.

07 - Live

Exclusion Lists

OIG LEIE + SAM + OFAC + OpenSanctions cross-reference. None of the 12 candidates below match a current exclusion - which is itself the signal: pattern visible, no enforcement yet.

08 - Live

Network Detection

Three of four overlay graphs (UBO, address co-location, premise mismatch). Bank graph adds at Tier 2 on customer engagement (BAA + GLBA basis). Cross-CMS / multi-state networks are an explicit network-detection signature.

Section 03 - Tier 1 findings - live federal data

25 hospitals flagged. Drawn from the live 145,879-row inpatient dataset.

Each row below is a hospital that ranks as a per-Diagnosis-Related-Group payment outlier (z-score ≥ 3 vs the national cohort) on five or more distinct MS-DRGs (Medicare Severity Diagnosis-Related Groups) in calendar year 2022. Tier 1 ran a single pass over the live ingest. The "Signals" column shows which of the 28 federal sources contributed signal to that finding. Statistical outlier is not adjudicated finding. Ava's job (Section 04 below) is to separate legitimate case-mix from suspicious billing - in many cases the answer is "academic medical center, expected case-mix variance, no Tier 2 needed."

Tier 1 - 25 candidates - Hospitals with statistically significant DRG payment outliers (25 rows - $1.18B overage)
# Hospital State Overage $ Discharges Tier 1 signals fired Federal sources cross-referenced
1Stanford Health CareCA$196,647,44210,081DRG-OUT VOL REFInpatient Geo benchmark POS CERT
2New York-Presbyterian HospitalNY$146,775,16225,428DRG-OUT VOLInpatient Geo benchmark POS
3University Of Maryland Medical CenterMD$142,304,9113,537DRG-OUT DRG-MULTI REFInpatient Geo benchmark CERT MAC
4Johns Hopkins HospitalMD$138,398,8027,040DRG-OUT DRG-MULTI REFInpatient Geo benchmark CERT MAC
5UCSF Medical CenterCA$134,964,7766,348DRG-OUT DRG-MULTI REFInpatient Geo benchmark POS
6Ronald Reagan UCLA Medical CenterCA$74,930,9763,015DRG-OUT VOLInpatient Geo benchmark POS
7UC Davis Medical CenterCA$60,479,9515,474DRG-OUT VOLInpatient Geo benchmark
8NYU Langone HospitalsNY$52,086,26524,035DRG-OUT VOLInpatient Geo benchmark
9Cedars-Sinai Medical CenterCA$48,919,83014,482DRG-OUTInpatient Geo benchmark
10Santa Clara Valley Medical CenterCA$47,456,1183,270DRG-OUT DRG-MULTI REFInpatient Geo benchmark POS CERT
11Sinai Hospital Of BaltimoreMD$47,329,3533,607DRG-OUT DRG-MULTIInpatient Geo benchmark MAC
12Johns Hopkins Bayview Medical CenterMD$36,571,3693,118DRG-OUT DRG-MULTIInpatient Geo benchmark CERT
13UC San Diego Health HillcrestCA$33,759,1104,904DRG-OUTInpatient Geo benchmark
14Keck Hospital Of USCCA$30,437,8551,968DRG-OUTInpatient Geo benchmark
15UCI Health-OrangeCA$29,558,6692,840DRG-OUTInpatient Geo benchmark
16Parkland Health And Hospital SystemTX$29,357,103721DRG-OUT DRG-MULTI SAFETY-NETInpatient Geo benchmark POS CERT
17Grady Memorial HospitalGA$24,145,2032,054DRG-OUT DRG-MULTI SAFETY-NETInpatient Geo benchmark POS CERT
18JPS Health NetworkTX$22,787,102764DRG-OUT DRG-MULTI SAFETY-NETInpatient Geo benchmark POS
19Levindale Hebrew Geriatric CenterMD$21,697,287448DRG-OUT DRG-CONCENInpatient Geo benchmark POS
20Zuckerberg San Francisco General HospitalCA$20,812,9151,149DRG-OUT DRG-MULTI SAFETY-NETInpatient Geo benchmark POS
21Medstar Union Memorial HospitalMD$20,725,1962,217DRG-OUTInpatient Geo benchmark
22Boston Medical CenterMA$19,370,8131,441DRG-OUT DRG-MULTI SAFETY-NETInpatient Geo benchmark POS
23Loma Linda University Medical CenterCA$18,971,3922,662DRG-OUTInpatient Geo benchmark
24Jackson Memorial HospitalFL$18,358,1553,966DRG-OUT DRG-MULTI SAFETY-NETInpatient Geo benchmark POS
25University Health SystemTX$17,689,0482,255DRG-OUT DRG-MULTIInpatient Geo benchmark
Signal legend. DRG-OUT per-Diagnosis-Related-Group payment z-score ≥ 3 vs national cohort. DRG-MULTI outlier on 20+ distinct DRGs (network-level). DRG-CONCEN high-margin DRG concentration. SAFETY-NET Provider-of-Services file flags this provider as a safety-net or county hospital. VOL high-volume cohort. REF tertiary referral signal. Federal sources column shows which of the 28 sources we ingested contributed signal to that row - Inpatient Public Use File, Geography benchmark, Provider-of-Services file, Comprehensive Error Rate Testing (CERT) FY2024 detector library, Medicare Administrative Contractor (MAC) jurisdiction map, etc.
Reality check. The top of this list is dominated by major academic medical centers (Stanford, Hopkins, UCSF, NYP) and safety-net county hospitals (Parkland, Grady, Boston Medical, Jackson). For these institutions, high per-DRG payments reflect real case-mix - tertiary referrals, transplant programs, level-1 trauma, complex sepsis, ECMO, cardiothoracic surgery. Statistical outlier is not adjudicated finding. The value Tier 1 + Ava deliver is not "list of fraud" - it is "list of cohort outliers, sorted, with each finding's Tier 2 cost-of-investigation pre-computed." Ava (next section) is what separates explainable case-mix variance from genuine billing anomaly.
Section 04 - Ava - agentic AI

Ava reads every Tier 1 finding and decides which ones deserve Tier 2.

Tier 1 surfaces statistical anomalies. Without an agentic layer, every academic medical center on the list above looks suspicious. Ava is JIL's in-house agentic AI that reads each finding, cross-references the full 28-source backbone, groups candidates by fraud archetype, and routes each one to the cheapest Tier 2 evidence path that would substantiate or rule out the pattern. The result: instead of $200K of indiscriminate Tier 2 sweeps on 25 candidates, you get a $48K targeted plan on the 6 candidates that actually warrant it.

Ava - in-house agentic AI - From 25 outliers to 6 actionable Tier 2 cases. -76% Tier 2 spend.

Ava's planner is signal-aware: it knows which fraud archetypes the 28 federal sources can corroborate, which require BAA Tier 2 data, and which can be ruled out at zero marginal cost via existing public-data signals. Each finding leaves the agent with (a) an archetype label, (b) a confidence-weighted Tier 2 plan, and (c) an explainable per-finding rationale.

Ava 1. Fraud archetype groupings - 6 archetypes

Six clusters by signal pattern, not by hospital identity.

Ava clusters the 25 Tier 1 candidates into 6 archetypes by signal pattern, not by hospital identity. The cluster determines what evidence is needed and where to find it.

9 candidates - No Tier 2 - rule out

Tertiary academic referral center.

DRG-OUT VOL REF High case-mix index Transplant program Stanford, NYP, Hopkins, UCSF, UCLA, UC Davis, NYU, Cedars, Keck. Outliers explained by acuity not pricing.

7 candidates - No Tier 2 - rule out

Public safety-net / county hospital.

DRG-OUT SAFETY-NET DSH adjusted High DSH index Parkland, Grady, JPS, ZSFG, BMC, Jackson, UHS-Bexar. Outliers explained by DSH adjustment + uninsured complexity.

3 candidates - Light Tier 2

High-DRG-concentration single facility.

DRG-CONCEN Specialty hospital Limited service mix Levindale geriatric, two MD specialty hospitals. Concentration on a small DRG set with above-cohort payment may reflect specialization, not anomaly.

4 candidates - Tier 2 case-mix audit

Regional system, multi-DRG outlier.

DRG-OUT DRG-MULTI Multi-state network Common-ownership cluster Sinai Baltimore, MedStar Union Memorial, Loma Linda, Boston Medical. Pattern across many DRGs invites case-mix substantiation.

0 candidates - N/A

Outlier with corroborating exclusion-list / enforcement signal.

LEIE match SAM debarment OIG enforcement DOJ Strike Force Cross-reference of the 25 candidates against OIG LEIE, SAM exclusions, OIG enforcement, DOJ Strike Force, OFAC SDN: zero matches in the public-data slice. None of the 25 hospitals or their operating organizations sit on a current federal block list.

2 candidates - Tier 2 CERT-targeted

Outlier under active CERT FY2024 root-cause.

CERT-2024-IPH-002 DRG upcoding (CC/MCC) Two-midnight rule Two facilities show outsized CC/MCC capture variance against peer cohort - the FY2024 federal benchmark for inpatient improper-payment dollars. Worth a targeted documentation audit.

Ava 2. Cost efficiency - Ava's Tier 2 routing - 76% spend reduction

Sized to the signal, not blindly applied.

Without Ava, every Tier 1 candidate would be funneled into a generic Tier 2 sweep. With Ava, only the candidates whose archetype warrants substantiation get a Tier 2 plan, and the plan is sized to the signal. Estimated Tier 2 cost per archetype:

Tier 2 path Candidates Spend
No Tier 2 (rule-out)16 candidates$0
Light Tier 2 (rate verify)3 candidates$5,400
Tier 2 (case-mix audit)4 candidates$24,000
Tier 2 (CERT-targeted)2 candidates$18,800
Total Tier 2 plan: $48,200. Generic flat-rate sweep on 25 candidates would cost ~$200,000. Ava's plan is 76% cheaper and concentrates spend on the 6 cases where Tier 2 evidence will actually move the disposition. The savings compound as the candidate list grows: a 145K-row dataset would generate hundreds of Tier 1 hits if you accepted them all - Ava's archetype-routing is what keeps the program economically viable.
Ava 3. Cross-source evidence weave - 28 sources

Confidence-scored against the full 28-source backbone.

For each finding, Ava queries the full 28-source backbone in parallel and synthesizes a confidence score:

  • Statistical layer - per-DRG outlier z-score (Inpatient PUF), peer-cohort percentile (Geography PUF), Part D prescribing pattern (Part D Prescriber PUF), DMEPOS referral pattern (DMEPOS PUF).
  • Identity layer - NPPES (provider identity, taxonomy), POS file (facility characteristics), PECOS (enrollment + ownership chain), CMS Owners (UBO graph).
  • Exclusion layer - LEIE (OIG exclusions), SAM (federal debarment), CMS Preclusion (MA / Part D), OFAC SDN (sanctions), HHS-OIG fugitives.
  • Enforcement-history layer - OIG enforcement actions, DOJ Strike Force indictments, public CMP press releases.
  • Rule layer - CERT FY2024 root-cause library (27 detectors live, hospitals + SNFs + DMEPOS + hospice + lab + Part B drugs + HHA + physician), NCD/LCD coverage rules by MAC jurisdiction, prior-authorization required lists, HCPCS Level II code set.
  • Settlement layer - HCRIS cost reports + related-party transaction worksheets (capital structure, common ownership, vendor self-dealing) + Open Payments (manufacturer + GPO financial ties).

For the 25 candidates above, Ava's confidence-weighted query of all 28 sources returned: 9 high-confidence rule-outs (academic), 7 high-confidence rule-outs (safety-net), 2 medium-confidence concerns (CERT match), 4 low-confidence concerns (regional system case-mix), 3 low-confidence concerns (concentration). Zero exclusion-list / enforcement matches.

Ava 4. Why Ava is best-in-class - capabilities

Eight capabilities you do not get from a generic LLM.

signal-aware planning

Knows what each signal can and cannot prove.

Ava maps every Tier 1 signal to the federal source that produced it, the Tier 2 evidence path that would corroborate it, and the marginal cost of that path. No blind sweeps.

case-mix calibration

Separates legitimate variance from anomaly.

Pre-trained on academic / safety-net / specialty-hospital cohorts. A 6σ outlier at Stanford and a 3σ outlier at a small for-profit specialty hospital get different archetype labels and different Tier 2 routing, even at the same per-DRG payment.

cost optimization

Minimum-spend Tier 2 path per finding.

For each candidate that survives rule-out, Ava picks the smallest evidence subset (records pull + interview list + targeted audit) that would substantiate the disposition - no exhaustive workup until needed.

explainability

Per-finding rationale, fully cited.

Every disposition Ava proposes carries a citation-trail: which federal source contributed which signal, which CERT detector matched, which archetype priors fired. Same trail an appeals body or audit committee would need.

network detection

Cross-MCO + multi-state graph layer.

Ava walks the UBO graph (CMS Owners + PECOS) and the address-co-location graph in real time. A four-hospital chain owned by one individual and billing identical DRG patterns across three states gets one network-level finding, not four single-hospital ones.

CREB®-ready output

Court-ready evidence bundle on demand.

For findings that proceed to Tier 3, Ava emits a CREB® - Court Ready Evidence Bundle - anchored to CourtChain™ (FRE 902(14) admissible). Each bundle cites the exact federal data source, version, and effective date used to produce every conclusion.

incremental learning

Disposition-aware feedback loop.

Each Tier 2 / Tier 3 outcome flows back into Ava's archetype priors. Confirmed dispositions (substantiated, ruled-out, settled) tune the archetype thresholds for the next pass.

multi-LOB

Same engine, every claim type.

Today: hospitals, SNF, hospice, HHA, DMEPOS, Part D, physician E/M. The 28-source backbone covers every Medicare service line. Same Ava agent, different cohort priors.

Section 05 - Why it matters

Detection at scale beyond DOJ's investigative bandwidth.

DOJ FCA recoveries hit a record $6,800,000,000 in FY 2025, with 1,297 qui tam filings - the highest in U.S. history. JIL's in-house Tier 1 + Ava stack ran on the live federal data backbone (28 sources, 145,879 inpatient records, 27 CERT detectors, OFAC SDN, NCD/LCD, MAC jurisdictions, and 22 more) and surfaced $1.18B in cohort-level overage in under 12 seconds. Ava's archetype routing then collapsed that to a $48K Tier 2 plan on the 6 candidates that warrant substantiation.

Three things this POC demonstrates:

  1. JIL's in-house ingestion + Ava run on real federal data at scale. 28 sources, daily / weekly / quarterly / annual cadences, pre-load + post-load dedup, all in-house pipeline.
  2. Statistical detection without an agentic layer is noise. Every academic medical center looks suspicious if you only run Tier 1. Ava's archetype calibration + cost-aware Tier 2 routing is what turns Tier 1 outliers into actionable cases.
  3. The reach is broad. Hospitals + DRGs is one slice. Same Ava agent runs on hospice, SNF, HHA, DMEPOS, Part D, physician E/M. Same 28-source backbone, different cohort priors.
The flip: if JIL's in-house tech can synthesize $1.18B in cohort overage from public data alone in under 12 seconds and route it intelligently, the question for any MCO or state Medicaid program is not "do we need a Tier 1 engagement?" - it is "do we already have findings sitting on a qui tam target list, and do we know which of them Ava would dispose of vs send to evidence?"
Section 06 - Engage

Detect early. Prove it. Stay safe.

This POC ran on JIL's in-house ingestion of 28 federal sources. The full Tier 2 stack (bank fingerprinting, FinCEN BOI, Street View AI, ATTOM premise records) ships with the customer engagement under BAA + GLBA + per-engagement legal-basis authorization. CREB® output is FRE 902(14)-anchored and reproducible in discovery.

Built on the JIL Settlement Engine

One kernel. Eight industries. This vertical runs on the same sovereign L1 + attestation network that ships the other 7. Kernel age: 18+ months. Adding a vertical: ~1 week. Competitor moat: build the kernel first.

See the engine