Izere/Methodology
White paper · v2.4 · 2026-04

If your auditors
cannot read it,
you should not deploy it.

The full Izere methodology — model class, training protocol, explainability framework, evaluation results, and the audit pathway any institutional reviewer can use to contest a score. Reviewable in advance of procurement.

§ 01

Problem framing

EU-funded programmes are delivered through fragmented ecosystems of contracting authorities, prime contractors, subcontractors, regional managing authorities, and sub-recipients. Delivery-relevant signal is dispersed across procurement notices, milestone reports, governance changes, and disbursement events — none of which, in isolation, predicts execution risk.

Traditional supervision treats programmes as independent objects with linear KPIs. The European Court of Auditors and DG-level oversight rely on retrospective sampling. By the time a programme is flagged, the cost gap has typically compounded for 12–18 months.

Execution risk is structural — it lives in the relationships between actors, not in any single actor's reporting line.

Izere reframes the problem as graph-structured prediction. A programme is modelled as a dynamic graph of actors and dependencies; risk is a property of the graph's evolving topology, not of any individual node. This recasting is what enables the 2–3 month lead time observed across our backtests.

§ 02

Model class

The Izere risk model is a heterogeneous graph neural network (GraphSAGE base, with attention-based message passing) trained jointly on procurement, governance, and financial-execution graphs. Heterogeneity matters: contracting authorities, suppliers, sub-recipients, and dependency edges all carry different feature spaces and embedding dimensions.

Why graph NNs and not gradient boosting

Tabular models — including the boosted-tree class used in Arachne — capture single-actor risk well. They cannot represent structural risk: bidder concentration patterns, governance distance, brittleness from supplier dependency chains. These are precisely the signals that lead Izere's score in the Rail Baltica backtest.

# layer-wise message passing
h_v^(k+1) = σ( W^(k) · concat(
  h_v^(k),
  AGG_{u ∈ N(v)}( att(h_v^(k), h_u^(k)) · h_u^(k) )
) )

# risk score = sigmoid(MLP(pooled_graph_embedding))

Production model

Production model is v2.4, trained 2026-03. Architecture: 4 GAT layers, 256-dim hidden state, 8 attention heads, dropout 0.2. Trained over ~487K nodes and 2.4M edges across the EU procurement graph 2014–2025. Versioned, reproducible from raw inputs.

§ 03

Feature definitions

Features are derived deterministically from public and institutional data sources (see Platform → Sources). Every feature has a public formula and a versioned reference implementation. Auditors can recompute any feature from raw inputs.

IDFeatureSourcePop. weight
F.001Bidder concentrationHerfindahl-Hirschman across N awards in rolling 4-quarter windowTED · PPDS0.18
F.002Subcontractor dependency depthMax depth of disclosed sub-recipient chainESPD · MS0.12
F.003Coordination cadenceInter-MS meeting frequency Δ vs. baselineProgramme reports0.11
F.004Milestone amendment frequencyCount of formal milestone changes per quarterRRF Scoreboard0.09
F.005Governance distancePath length actor → ultimate decision authorityUBO · org charts0.08
F.006Disbursement-request bottleneckMean processing time per request, normalised by programmeCohesion OPs0.07
F.007Bidder retention QoQΔ in qualified bidders quarter-on-quarterTED · PPDS0.06
F.008Reform-conditioning resolutionDays between conditional milestone notice and resolutionRRF Scoreboard0.05
F.009UBO sanctions cross-referenceMatch score against EU restrictive measures listsUBO · OFAC0.04
F.010Cross-border financing gapConfirmed vs. announced co-financing per periodCEF · Member States0.04

Full feature catalogue (62 features as of v2.4) is documented in the methodology appendix. Population weights shown are global; per-programme attribution via SHAP — see § 05.

§ 04

Training protocol

Models train via federated learning across institutional deployments. Raw data never leaves an institution's perimeter. Each deployment computes local gradients, encrypts them, and contributes to a shared model via secure aggregation. The aggregator never sees individual gradients in the clear.

Why federated

Two reasons. First, GDPR Art. 6 lawful-basis questions disappear when data does not move. Second, cross-institutional intelligence compounds: insights from one DFI's deployment improve the model for all participants without requiring data-sharing agreements that take years to negotiate.

Cadence

Production retraining is monthly. Out-of-distribution detection (covariance drift, label drift) runs hourly. No model deploys to production without sign-off from a designated institutional reviewer within each participating institution.

§ 05

Explainability

Every Izere score decomposes into the specific features that drove it. We use SHAP (Shapley Additive exPlanations) computed over the trained graph model — the explanation framework with the strongest theoretical guarantees for additive feature attribution.

Crucially, SHAP values in Izere are computed over the full graph context, not just node-local features. Attribution for a programme score includes contributions from neighbouring actors and dependency structure — making structural risk explicit and contestable.

No score is published without an evidence chain that an institutional auditor can read end-to-end — and contest.

Every score persists with its full SHAP decomposition and provenance to raw input. An auditor reviewing a flagged programme can trace any contribution back to the contract notice, governance event, or financial signal that generated it.

§ 06

Evaluation

Izere v2.4 is evaluated against a held-out test set of 312 EU programmes (CEF, Cohesion, RRF) from the 2014–2020 cycle, with ground-truth delivery outcomes determined by ECA/OECD final-report findings. Backtested predictions are dated as of T-12 months relative to ECA detection.

MetricIzere v2.4ArachneLinear
AUC-ROC (delay > 6 mo)0.8470.6810.612
Recall @ 10% precision0.740.510.38
Lead time to ECA flag (median)14.2 mo3.1 mo
False positive rate0.0940.1160.182
Calibration ECE0.0310.0870.142

Sensitivity analyses across programme size, geography, and policy area are documented in the methodology appendix. Performance is consistent across cohorts; we publish disaggregated metrics so institutions can verify Izere works on the kind of programmes they actually run.

§ 07

Audit pathway

Every Izere deployment ships with a defined audit pathway. Independent third-party auditors — including ECA and institutional internal-audit teams — receive read-access to:

· Versioned model cards and training run metadata
· Full feature derivation pipeline with reference implementations
· SHAP attribution for every score, with provenance to raw input
· Append-only event log of all data ingestion and model decisions
· Continuous EU AI Act high-risk system documentation

The audit pathway is designed so that any score, at any point in time, is reproducible end-to-end from raw inputs. This is not a feature added retrospectively to satisfy regulation — it is the system's first design constraint.

§ 08

Limitations

We document Izere's limits openly. The model performs less well on programmes < €5M (sparse signal), on programmes from Member States with limited national procurement-registry transparency, and on early-stage RRF measures where milestone definitions are still being negotiated.

Izere is a decision-support layer, not a decision-maker. Risk scores inform — they do not replace — institutional judgement. The system is explicitly designed so that no automated action follows from a score without human institutional sign-off.

For the full discussion of failure modes, sensitivity analyses, and known biases, see the methodology white paper.