Completion Probability Score — methodology

Version 2. A transparent, non-ML heuristic.

Every project carries a Completion Probability from 0–100%, estimating the likelihood it proceeds to construction on roughly its stated timeline. The score is a weighted blend of factors drawn entirely from data we already publish — there is no machine learning and no hidden inputs. We publish the weights here because transparency is the point.

v2 adds a National Plan priority factor: alignment with the National Infrastructure Plan (Te Waihanga, 17 Feb 2026) and its 10 decade priorities — lifting hospital investment, catching up on water renewals, road investment and energy security. It is applied by a fixed sector rule (health/water/energy/transport rate highest), not hand-picked per project.

Factors & weights

Factor	Weight
fundingStatus	28%
status	23%
electionRisk	18%
npAlign	10%
gpsAlignment	10%
valueBand	6%
ownerType	5%

Bands

Band	Score	Meaning
A	80%+	Very likely to proceed
B	65%+	Likely to proceed
C	50%+	Uncertain
D	35%+	At risk
E	0%+	High risk of stalling or cancellation

Terminal & missing states

Completed projects show as Delivered; cancelled projects as Cancelled. Projects missing the load-bearing fields (funding status, delivery stage) show “insufficient data” — never a guessed number.

Not yet in the model

These factors are named in our roadmap but excluded from v2 until their data pipelines are live, so the score never depends on data we don't yet have:

procurementSignal — Matching GETS tender seen (feature 8).
momentum — Status progression vs stagnation across snapshots (feature 3).

Backtest: the 2023 election

We tested the model against the one large natural experiment we have — the November 2023 change of government, which cancelled or retained many major projects. Each project's pre-election (Oct 2023) status comes from its sourced chronology (or a documented override), and post-election outcomes are documented public-record decisions (6 curated with a basis note in backtest-2023.json). Positive class = "stopped".

48Settled outcomes (5 stopped, 43 proceeded)

50 vs 77Mean v1 score: stopped vs proceeded

96%Best model accuracy (cross-validated)

Model	Accuracy	Precision	Recall	F1
Completion Probability score (in-sample)	90%	50%	60%	0.55
Naive A: everything proceeds	90%	—	0%	—
Naive B: unfunded ⇒ stopped	92%	60%	60%	0.60
Logistic regression (leave-one-out CV)	96%	100%	60%	0.75

The logistic regression is scored under leave-one-out cross-validation — every prediction is made on a project the model never saw in training — so the numbers reflect generalisation, not memorisation.

What we learned. Once realistic counterexamples are included — the iReX Cook Strait ferries (funded, then cancelled) and the Roads of National Significance the new government revived (unfunded, but retained) — the naïve "unfunded ⇒ stopped" rule is no longer perfect (92% accuracy, F1 0.60). A cross-validated model that adds project mode (road vs public-transport/cycling) beats it (96% accuracy, F1 0.75). Its strongest signals are electionRisk (-1.33), mode_pt_active (+0.57), ownerType (+0.53) — i.e. election-risk and what type of project it is matter more than funding alone.

The real driver: partisan directionality

Every project actually stopped was public transport, cycling or a speculative mega-project; the roads the v1 heuristic rated high-risk were revived. The model only improves because the mode feature captures this — roads were safe under a centre-right government, active-modes were not. That is powerful but election-specific: the same feature would point the opposite way under a different government. We therefore keep the live score as a transparent heuristic rather than bake 2023's political direction into a forward-looking 2026 tool — adding that prior would be overfitting to one election.

Limitations (read before trusting a number)

Small N and a single election — these results characterise the 2023 change of government, not all elections.
Several "proceeded" road projects were retained/advanced (RoNS) but are not yet under construction; outcome = "not cancelled".
Structural factors taken from the current dataset as pre-election proxies (potential look-ahead bias).
The logistic regression is reported under leave-one-out CV to avoid overfitting; its edge over the funding baseline is modest and election-specific.

Full results and learned weights, per-project, are published at /data/backtest-results.json. Generated 2026-07-01.