NZ InfraWatch Open the live map →

Completion Probability Score — methodology

Version 1. A transparent, non-ML heuristic.

Every project carries a Completion Probability Score from 0–100, estimating the likelihood it proceeds to construction on roughly its stated timeline. The score is a weighted blend of factors drawn entirely from data we already publish — there is no machine learning and no hidden inputs. We publish the weights here because transparency is the point.

Factors & weights

FactorWeight
fundingStatus30%
status25%
electionRisk20%
gpsAlignment12%
valueBand8%
ownerType5%

Bands

BandScoreMeaning
A80+Very likely to proceed
B65+Likely to proceed
C50+Uncertain
D35+At risk
E0+High risk of stalling or cancellation

Terminal & missing states

Completed projects show as Delivered; cancelled projects as Cancelled. Projects missing the load-bearing fields (funding status, delivery stage) show “insufficient data” — never a guessed number.

Not yet in the model

These factors are named in our roadmap but excluded from v1 until their data pipelines are live, so the score never depends on data we don't yet have:

Backtest: the 2023 election

We tested the model against the one large natural experiment we have — the November 2023 change of government, which cancelled or retained many major projects. Each project's pre-election (Oct 2023) status comes from its sourced chronology (or a documented override), and post-election outcomes are documented public-record decisions (7 curated with a basis note in backtest-2023.json). Positive class = "stopped".

43Settled outcomes (7 stopped, 36 proceeded)
42 vs 76Mean v1 score: stopped vs proceeded
91%Best model accuracy (cross-validated)
ModelAccuracyPrecisionRecallF1
Completion Probability v1 (in-sample)86%55%86% 0.67
Naive A: everything proceeds84%0%
Naive B: unfunded ⇒ stopped86%56%71% 0.63
Logistic regression (leave-one-out CV)91%80%57% 0.67

The logistic regression is scored under leave-one-out cross-validation — every prediction is made on a project the model never saw in training — so the numbers reflect generalisation, not memorisation.

What we learned. Once realistic counterexamples are included — the iReX Cook Strait ferries (funded, then cancelled) and the Roads of National Significance the new government revived (unfunded, but retained) — the naïve "unfunded ⇒ stopped" rule is no longer perfect (86% accuracy, F1 0.63). A cross-validated model that adds project mode (road vs public-transport/cycling) beats it (91% accuracy, F1 0.67). Its strongest signals are electionRisk (-1.21), funding (-0.64), mode_pt_active (+0.63) — i.e. election-risk and what type of project it is matter more than funding alone.

The real driver: partisan directionality

Every project actually stopped was public transport, cycling or a speculative mega-project; the roads the v1 heuristic rated high-risk were revived. The model only improves because the mode feature captures this — roads were safe under a centre-right government, active-modes were not. That is powerful but election-specific: the same feature would point the opposite way under a different government. We therefore keep the live score as the transparent v1 heuristic rather than bake 2023's political direction into a forward-looking 2026 tool — adding that prior would be overfitting to one election.

Limitations (read before trusting a number)

Full results and learned weights, per-project, are published at /data/backtest-results.json. Generated 2026-06-15.