Problem framing
EU-funded programmes are delivered through fragmented ecosystems of contracting authorities, prime contractors, subcontractors, regional managing authorities, and sub-recipients. Delivery-relevant signal is dispersed across procurement notices, milestone reports, governance changes, and disbursement events — none of which, in isolation, predicts execution risk.
Traditional supervision treats programmes as independent objects with linear KPIs. The European Court of Auditors and DG-level oversight rely on retrospective sampling. By the time a programme is flagged, the cost gap has typically compounded for 12–18 months.
Izere reframes the problem as graph-structured prediction. A programme is modelled as a dynamic graph of actors and dependencies; risk is a property of the graph's evolving topology, not of any individual node. This recasting is what enables the 2–3 month lead time observed across our backtests.
Model class
The Izere risk model is a heterogeneous graph neural network (GraphSAGE base, with attention-based message passing) trained jointly on procurement, governance, and financial-execution graphs. Heterogeneity matters: contracting authorities, suppliers, sub-recipients, and dependency edges all carry different feature spaces and embedding dimensions.
Why graph NNs and not gradient boosting
Tabular models — including the boosted-tree class used in Arachne — capture single-actor risk well. They cannot represent structural risk: bidder concentration patterns, governance distance, brittleness from supplier dependency chains. These are precisely the signals that lead Izere's score in the Rail Baltica backtest.
h_v^(k+1) = σ( W^(k) · concat(
h_v^(k),
AGG_{u ∈ N(v)}( att(h_v^(k), h_u^(k)) · h_u^(k) )
) )
# risk score = sigmoid(MLP(pooled_graph_embedding))
Production model
Production model is v2.4, trained 2026-03. Architecture: 4 GAT layers, 256-dim hidden state, 8 attention heads, dropout 0.2. Trained over ~487K nodes and 2.4M edges across the EU procurement graph 2014–2025. Versioned, reproducible from raw inputs.
Feature definitions
Features are derived deterministically from public and institutional data sources (see Platform → Sources). Every feature has a public formula and a versioned reference implementation. Auditors can recompute any feature from raw inputs.
Full feature catalogue (62 features as of v2.4) is documented in the methodology appendix. Population weights shown are global; per-programme attribution via SHAP — see § 05.
Training protocol
Models train via federated learning across institutional deployments. Raw data never leaves an institution's perimeter. Each deployment computes local gradients, encrypts them, and contributes to a shared model via secure aggregation. The aggregator never sees individual gradients in the clear.
Why federated
Two reasons. First, GDPR Art. 6 lawful-basis questions disappear when data does not move. Second, cross-institutional intelligence compounds: insights from one DFI's deployment improve the model for all participants without requiring data-sharing agreements that take years to negotiate.
Cadence
Production retraining is monthly. Out-of-distribution detection (covariance drift, label drift) runs hourly. No model deploys to production without sign-off from a designated institutional reviewer within each participating institution.
Explainability
Every Izere score decomposes into the specific features that drove it. We use SHAP (Shapley Additive exPlanations) computed over the trained graph model — the explanation framework with the strongest theoretical guarantees for additive feature attribution.
Crucially, SHAP values in Izere are computed over the full graph context, not just node-local features. Attribution for a programme score includes contributions from neighbouring actors and dependency structure — making structural risk explicit and contestable.
Every score persists with its full SHAP decomposition and provenance to raw input. An auditor reviewing a flagged programme can trace any contribution back to the contract notice, governance event, or financial signal that generated it.
Evaluation
Izere v2.4 is evaluated against a held-out test set of 312 EU programmes (CEF, Cohesion, RRF) from the 2014–2020 cycle, with ground-truth delivery outcomes determined by ECA/OECD final-report findings. Backtested predictions are dated as of T-12 months relative to ECA detection.
Sensitivity analyses across programme size, geography, and policy area are documented in the methodology appendix. Performance is consistent across cohorts; we publish disaggregated metrics so institutions can verify Izere works on the kind of programmes they actually run.
Audit pathway
Every Izere deployment ships with a defined audit pathway. Independent third-party auditors — including ECA and institutional internal-audit teams — receive read-access to:
· Versioned model cards and training run metadata
· Full feature derivation pipeline with reference implementations
· SHAP attribution for every score, with provenance to raw input
· Append-only event log of all data ingestion and model decisions
· Continuous EU AI Act high-risk system documentation
The audit pathway is designed so that any score, at any point in time, is reproducible end-to-end from raw inputs. This is not a feature added retrospectively to satisfy regulation — it is the system's first design constraint.
Limitations
We document Izere's limits openly. The model performs less well on programmes < €5M (sparse signal), on programmes from Member States with limited national procurement-registry transparency, and on early-stage RRF measures where milestone definitions are still being negotiated.
Izere is a decision-support layer, not a decision-maker. Risk scores inform — they do not replace — institutional judgement. The system is explicitly designed so that no automated action follows from a score without human institutional sign-off.
For the full discussion of failure modes, sensitivity analyses, and known biases, see the methodology white paper.