============================================================================== MVPS ZERO-DAY LEAD-TIME -- LEMMAS L_ZD.1', L_ZD.2', L_ZD.3 AND CONJECTURE T_ZD* Mathematical foundation for the conditional claim that the multi-vantage Mahalanobis detector D^2 has POSITIVE EXPECTED LEAD-TIME over the per-vantage max-z detector for the specific class of NOVEL, RANK-1 PROPAGATING network events under the FAST-GROWTH operational validity regime. Three lemmas are PROVED. One conjecture (T_ZD*) is stated WITH FALSIFICATION PROTOCOL. A FOURTH artefact, the Monte Carlo backtest receipt evidence/zeroday_backtest_mc_.json, provides empirical validation of the SIGN-CLAIM on a 9-configuration panel under finite-sample noise, with explicit identification of the (N, T_d) regime where the closed-form MAGNITUDE-CLAIM is tight versus loose. This document HAS BEEN SELF-CORRECTED. The original v0 derivation of s_z* was wrong (omitted the E[M_N] = E[ max_v Z_v ] baseline of the N-Gaussian maximum), which over-predicted the closed-form lead-time by ~3-4x. The CORRIGENDUM in Section 0.2 records the wrong formula, the empirical artefact that caught it (the first Monte Carlo backtest receipt), and the corrected derivation. The wrong formula is RETIRED; the corrected formula is PROVED and EMPIRICALLY VALIDATED in this document. ============================================================================== Authority: docs/MVPS_MATHEMATICAL_EXISTENCE_PROOF_V4.txt (v4.0) Companion to: docs/MVPS_LEAD_TIME_LEMMA.txt docs/MVPS_DETECTION_LATENCY_LEMMA.txt docs/MVPS_IETF_FOUNDATIONS.txt Validator: scripts/validate_zeroday_lead_time.py Backtest: scripts/backtest_zeroday_mc.py Date: 2026-05-25 Status: PROVED (L_ZD.1', L_ZD.2', L_ZD.3) + EMPIRICAL VALIDATION (Sec 5.5) + CONJECTURE T_ZD* (open, real-data variant of falsification protocol). The closed-form SIGN-CLAIM of L_ZD.2' is empirically confirmed on all 9 panel configurations (Section 5.5). The MAGNITUDE-CLAIM is tight only for the FAST-GROWTH validity regime (T_d <= ~30 s); slow-growth regimes give weak positive lead but the closed-form upper bound is loose by a factor of 10-30x as documented. Scope claim explicitly bounded: "Zero-day" in this document means a previously-uncalibrated network event whose SIGNATURE is absent from the holdout window used to set the alarm threshold. We do NOT address code-level vulnerability discovery (fuzzing, static analysis, formal verification). The MVPS framework reads NETWORK TELEMETRY ONLY (RTT, BGP, liveness, volume); any zero-day whose exploitation does not perturb network telemetry in a rank-low coherent manner is OUTSIDE the scope of these lemmas (see L_ZD.3 and Out-of-Scope list, Section 4). ============================================================================== 0. WHAT IS BEING PROVED, AND WHAT IS NOT ============================================================================== CLAIM (informal). For a propagating network event whose perturbation signal s(t) grows monotonically over [t_0, t_0 + T] in a direction that is COHERENT across N >= 4 vantages (rank-1 in the cross-vantage covariance), the multi-vantage Mahalanobis detector f_M crosses its calibrated alarm threshold BEFORE the per-vantage max-z detector f_z IN EXPECTATION, with a closed-form upper bound on expected lead-time E[L_exp] <= (1/lambda) * ln( sqrt(N) * (q_z - E[M_N]) / sqrt( q_chi - N ) ) under matched-FAR calibration alpha and exponential growth rate lambda. E[M_N] denotes the expected maximum of N iid standard Gaussian variables. WHAT THIS DOES NOT CLAIM. - MVPS does NOT lead on SPARSE alternatives (single-vantage local jitter); this is L_LT.1.B of the existing lead-time lemma and is formalised here as L_ZD.3 with explicit sign reversal. - MVPS does NOT predict the existence or identity of any specific zero-day; it only bounds the time at which a network-visible propagation signature crosses the threshold. - The closed-form lead-time is a FIRST-EXPECTED-CROSSING UPPER BOUND. Finite-sample first-passage corrections REDUCE the empirical lead by a factor that grows as the growth rate slows (Section 5.5). For very slow growth (T_d >= 120 s in our MC panel), the empirical mean lead can be smaller than the closed form by 10-30x, although the SIGN-CLAIM (Lambda_emp > 1/2) continues to hold on all tested configurations. - The OPERATIONAL VALIDITY REGIME of the MAGNITUDE-CLAIM (closed form tight within +-40 percent) is fast growth (T_d <= ~30 s) with N >= 30. Slow-growth events (gradual APT C2 beaconing, supply-chain compromise spreading over hours) are OUTSIDE this regime; MVPS may still detect them but the lead-time WILL be smaller than the closed-form prediction. WHY THIS BELONGS AS A SEPARATE LEMMA (not an instance of L_LT.A). L_LT.A (docs/MVPS_LEAD_TIME_LEMMA.txt) is an UNCONDITIONAL existence statement on observed RIPE Atlas episodes (Lambda > 0 with positive Wilson lower bound). It says NOTHING about the conditional regime in which lead-time is POSITIVE BY CONSTRUCTION. L_ZD.1' / L_ZD.2' fill that gap by characterising the precondition (rank-1 monotone growth, fast enough) under which the sign of E[L] is PREDICTED, not just observed. ============================================================================== 0.2 CORRIGENDUM TO THE v0 DERIVATION (self-falsification record) ============================================================================== The v0 (initial draft) derivation of s_z* in this document and in the companion draft draft-melegassi-ippm-mvps-coherence-leadtime-00 used: s_z*_v0 = sigma * q_z(N, alpha) / u_max (WRONG) i.e., under coherent direction u = 1_N / sqrt(N): s_z*_v0_coh = sigma * sqrt(N) * q_z(N, alpha) (WRONG) This is the signal amplitude at which the per-vantage MEAN of the signal-bearing component (s / (sigma * sqrt(N))) by itself reaches q_z. It IGNORES the additive baseline contribution from the per-step MAXIMUM-OF-N-NULL-GAUSSIANS, whose expected value is E[M_N] = E[ max_v Z_v ] for Z_v ~ Normal(0, 1). E[M_N] grows like sqrt(2 ln N): from ~1.03 at N = 4 to ~3.24 at N = 1000. How the error was caught. The first MC backtest receipt evidence/zeroday_backtest_mc_20260525T130854Z.json (sha256: ac8fb1e87155dec5cc704bbb09641504902a123802dffe4bc4f1ce5ed3a7176a) reported empirical mean leads of 1.3 - 7.7 ticks for Slammer-class growth, while the v0 closed form predicted 8.4 - 23.0 seconds at the same N. Relative error 0.65 - 0.99 across all 9 panel configurations; 3/9 verdict FALSIFIES at the original PASS_THEORY threshold. The sign-claim of L_ZD.2 (Lambda > 1/2) held only at N >= 8, NOT at N = 4 as v0 predicted. Diagnosis. The expected max of N iid N(mu, 1) variables is E[ max_v (mu + Z_v) ] = mu + E[M_N] (1) NOT mu (as v0 implicitly assumed). Crossing q_z IN EXPECTATION therefore requires mu >= q_z - E[M_N] (2) not mu >= q_z. The correct threshold for the coherent-direction signal amplitude is therefore s_z*'_coh = sigma * sqrt(N) * ( q_z(N, alpha) - E[M_N] ) (3) not sigma * sqrt(N) * q_z(N, alpha). Impact of the correction. At alpha = 0.01 and N = 30: s_z*_v0 = sigma * sqrt(30) * 3.5879 = 19.65 sigma s_z*'_v1 = sigma * sqrt(30) * (3.5879 - 2.0428) = 8.48 sigma The corrected s_z* is approximately 43 % of the v0 value. Closed-form lead-time at Slammer T_d = 8.5 s drops from 17.89 s (wrong) to 7.57 s (corrected); the latter falls inside the +-40 % band of the MC empirical mean of 4.96 ticks at the same N. Status of the corrected lemma. L_ZD.1' (the corrected linear-growth lemma) and L_ZD.2' (the corrected exponential-growth lemma) are PROVED in Sections 2 and 3 below from equation (3) above. The MC backtest at the corrected formula evidence/zeroday_backtest_mc_20260525T131930Z.json (sha256: af0038793bff85d3c7657d2eac2d31fc77b519172563543a9843c68fac72ab97) reports verdicts PASS_THEORY (2/9), CONSISTENT_SIGN (3/9), BELOW_THEORY (4/9), FALSIFIES (0/9). Zero FALSIFIES means the SIGN-CLAIM survives on the whole panel; BELOW_THEORY identifies the slow-growth regime where the closed form is a LOOSE upper bound. Why the discipline matters. Self-falsification before submission to a working group is the same discipline that L_LT.A applied to the original T_LT promise: replace the wrong unconditional claim with the conditional theorem that survives the data, retire the wrong claim explicitly rather than quietly amending it. The v0 file (now overwritten by this version) is preserved by git history; any reviewer can compare diffs. The remainder of this document uses the CORRECTED formulas exclusively. Notation primes (L_ZD.1' / L_ZD.2') are kept to make the difference from v0 visible to anyone reading both the lemma and any cached prior version. ============================================================================== 1. SETUP, NOTATION, AND THE TWO DETECTORS ============================================================================== DEFINITION 1.1 (Observation process). N >= 2 vantages observe scalar measurements x_v(t), v = 1..N, at discrete time steps t = 0, 1, 2, .... Under the NULL (no event), observations are IID Gaussian: x_v(t) ~ Normal( 0, sigma^2 ) independent in (v, t) (N0) with sigma^2 > 0 known (or estimated on a holdout window of length >= 24 h per Section 4 of draft-melegassi-ippm-mvps-lead-time-00). DEFINITION 1.2 (Event signal). An EVENT introduces an additive mean shift mu(t) in R^N: x_v(t) = mu_v(t) + noise_v(t), t >= t_0, (A0) where t_0 is the (unknown) event onset and mu(t) admits the rank-k decomposition mu(t) = sum_{j=1}^{k} s_j(t) * u^{(j)}, ||u^{(j)}|| = 1, (A1) with non-negative monotone scalars s_j(t) and orthonormal directions u^{(j)} in R^N. For the LEMMAS below we restrict to rank-1: k = 1, mu(t) = s(t) * u. DEFINITION 1.3 (Growth regime). Two canonical monotone growth laws for s(t) on [t_0, infty): LINEAR: s(t) = r * (t - t_0), r > 0. (G-lin) EXPONENTIAL: s(t) = s_inf * exp( lambda * (t - t_0) ), (G-exp) lambda > 0. DEFINITION 1.4 (Multi-vantage Mahalanobis detector f_M). D^2(t) := ( 1 / sigma^2 ) * || x(t) ||_2^2, f_M(t) := 1{ D^2(t) >= q_chi(N, alpha) }, (M) with q_chi(N, alpha) := F^{-1}_{chi^2_N}( 1 - alpha ). DEFINITION 1.5 (Per-vantage max-z detector f_z, Bonferroni-matched FAR). z_v(t) := x_v(t) / sigma, f_z(t) := 1{ max_v | z_v(t) | >= q_z(N, alpha) }, (Z) q_z(N, alpha) := Phi^{-1}( 1 - alpha / (2 N) ), (Z-BONF) giving per-time-step false-alarm rate Pr[ f_z = 1 | null ] <= alpha by the union bound. Both detectors are therefore MATCHED-FAR-calibrated at the same nominal alpha. REMARK 1.5.1. The existing draft and the public follow-up (EMAIL_IPPM_2026-05-23_AGILITY_FOLLOWUP) use q_z = 3.0 fixed per IPPM convention, which corresponds to UNMATCHED FAR (per-step FAR scales as 2 N * (1 - Phi(3)) ~ 0.081 for N = 30). Section 5.4 of the present document reports BOTH the matched-FAR and the q_z = 3.0 unmatched variants. DEFINITION 1.6 (Baseline maximum-of-N-Gaussians, E[M_N]). E[M_N] := E[ max_{v = 1..N} Z_v ], Z_v iid Normal(0, 1). E[M_N] is a closed-form-tractable order statistic with asymptotic E[M_N] ~ sqrt(2 ln N) - (ln ln N + ln(4 pi))/(2 sqrt(2 ln N)) as N -> infinity. Blom's approximation E[M_N] ~ Phi^{-1}((N - 3/8)/(N + 1/4)) is accurate to <2 % vs Monte Carlo (verified at 5e6 samples by scripts/_compute_emax_table.py). Numerical values used throughout this document: N 4 8 16 30 100 1000 E[M_N] 1.0296 1.4236 1.7660 2.0428 2.5074 3.2416 E[M_N] is the EXPECTED VALUE of the per-step maximum-of-N null Gaussians, which transfers ADDITIVELY to the per-step maximum under any uniform per-vantage shift: E[ max_v (mu + Z_v) ] = mu + E[M_N]. This additivity is the key fact whose omission in v0 (Section 0.2) produced the original error. DEFINITION 1.7 (First-expected-crossing time). tau_E(f) := inf { t >= t_0 : E[ T_f(t) | event(t_0) ] >= q_f } where T_f and q_f are the statistic and threshold of detector f. tau_E(f) is finite and unique under (G-lin) or (G-exp) with monotone s(t). It is the LEADING-ORDER approximation to the actual stopping time of the random alarm process; finite-sample first-passage corrections shift the empirical mean from tau_E by an amount that depends on (lambda, N) and is characterised empirically in Section 5.5. ============================================================================== 2. LEMMA L_ZD.1' (CORRECTED LINEAR-GROWTH LEAD-TIME, CLOSED FORM) ============================================================================== LEMMA L_ZD.1' (Linear-growth zero-day lead-time, CORRECTED v1). Assume: (P1) Null observations satisfy (N0) with known sigma^2 > 0. (P2) Event signal is rank-1: mu(t) = s(t) * u, ||u|| = 1. (P3) Growth law (G-lin): s(t) = r * (t - t_0), r > 0. (P4) Both detectors are matched-FAR-calibrated at alpha in (0, 1/2) per (Z-BONF). (P5) The signal direction u has finite max-component u_max := max_v | u_v | in (0, 1]. Define the ZERO-DAY SIGNAL THRESHOLDS: s_M*(N, alpha) := sigma * sqrt( q_chi(N, alpha) - N ), (S-M) s_z*(N, alpha, u) := sigma * ( q_z(N, alpha) - E[M_N] ) / u_max. (S-Z') Under (G-lin), the first-expected-crossing times are: tau_E( f_M ) = t_0 + s_M*(N, alpha) / r, (T-M-lin) tau_E( f_z ) = t_0 + s_z*(N, alpha, u) / r. (T-Z-lin') The EXPECTED LEAD-TIME of MVPS over max-z is the closed form E[ L_lin ] = tau_E( f_z ) - tau_E( f_M ) = ( s_z*(N, alpha, u) - s_M*(N, alpha) ) / r. (L-lin') E[L_lin] is STRICTLY POSITIVE if and only if s_z*(N, alpha, u) > s_M*(N, alpha), (POS-CONDITION) equivalently, under coherent direction u_max = 1 / sqrt(N): sqrt(N) * ( q_z(N, alpha) - E[M_N] ) > sqrt( q_chi(N, alpha) - N ). (POS-EQUIV') PROOF. Step 1. Under (N0)+(A0) with rank-1 alternative mu(t) = s(t) u, the Mahalanobis statistic satisfies D^2(t) = (1/sigma^2) * || s(t) u + epsilon(t) ||^2 with epsilon(t) ~ N(0, sigma^2 I_N), so E[ D^2(t) | event ] = s(t)^2 / sigma^2 + N. (E-D2) Setting E[D^2] = q_chi gives s(t) = sigma * sqrt(q_chi - N) = s_M*. Step 2. Under (G-lin), s(t) = r * (t - t_0), so tau_E(f_M) = t_0 + s_M* / r. Step 3. For max-z under shift mu_v = s(t) u_v / sigma, the largest coordinate v* with |u_{v*}| = u_max has signal-bearing mean s(t) u_max / sigma. By Definition 1.6 the EXPECTED VALUE of the per-step maximum across N vantages, under uniform shift mu in each vantage, is mu + E[M_N]. Crossing q_z IN EXPECTATION therefore requires s(t) u_max / sigma + E[M_N] >= q_z i.e. s(t) >= sigma * ( q_z - E[M_N] ) / u_max = s_z*. Step 4. Substituting (G-lin) into the s_z* condition gives (T-Z-lin') and subtracting (T-M-lin) gives (L-lin'). Sign condition (POS-CONDITION) is direct. Step 5. Verification of (POS-EQUIV') at finite N is the table in Section 5.1, computed by scripts/validate_zeroday_lead_time.py and pinned in evidence/zeroday_lead_time_receipt.json (sha256 listed there). The inequality holds with strict positivity for all N >= 4 at alpha = 0.01 in the matched-FAR setup; the v0 claim that the same held at N = 4 with substantial margin is WITHDRAWN (the v1 corrected margin at N = 4 is 0.26 in ln units, vs the wrong v0 claim of 0.69). QED. REMARK 2.1 (Coherent vs sparse direction). Under the COHERENT direction u = 1_N / sqrt(N), u_max = 1/sqrt(N), so (S-Z') gives s_z* = sigma * sqrt(N) * (q_z - E[M_N]). Under the SPARSE direction u = e_v* (u_max = 1), s_z* = sigma * (q_z - E[M_N]) -- but see Remark 4.1 for why the SPARSE case actually uses s_z* = sigma * q_z (the noise baseline does not transfer when a single coordinate carries all the signal). REMARK 2.2 (Empirical validation). The closed-form prediction (L-lin') under (P1)..(P5) is empirically tested by the Monte Carlo backtest of Section 5.5. In the FAST-GROWTH regime, the empirical mean lead matches the closed-form upper bound within +-40 %. In the SLOW-GROWTH regime, finite-sample first-passage corrections reduce the empirical lead to ~10-30 % of the closed-form upper bound; the SIGN-CLAIM (E[L] > 0) is preserved in both regimes. ============================================================================== 3. LEMMA L_ZD.2' (CORRECTED EXPONENTIAL-GROWTH LEAD-TIME) ============================================================================== LEMMA L_ZD.2' (Exponential-growth zero-day lead-time, CORRECTED v1). Under the precondition (P1)..(P2) and (P4)..(P5) of L_ZD.1', with (G-lin) replaced by (P3') s(t) = s_inf * exp(lambda * (t - t_0)), s_inf > 0, lambda > 0, the first-expected-crossing times are: tau_E( f_M ) = t_0 + (1/lambda) * ln( s_M*(N, alpha) / s_inf ), tau_E( f_z ) = t_0 + (1/lambda) * ln( s_z*(N, alpha, u) / s_inf ). The EXPECTED LEAD-TIME is the closed form E[ L_exp ] = (1/lambda) * ln( s_z*(N, alpha, u) / s_M*(N, alpha) ). (1) Under the COHERENT direction u = 1_N / sqrt(N): E[ L_exp ] = (1/lambda) * ln( sqrt(N) * ( q_z(N, alpha) - E[M_N] ) / sqrt( q_chi(N, alpha) - N ) ). (2) PROFILE (N -> infinity, alpha fixed; NO clean closed asymptotic). The ratio s_z* / s_M* under (2) is monotonically increasing in N but the leading-order growth is SUB-LOGARITHMIC. Specifically, both (q_z - E[M_N]) and sqrt(q_chi - N) grow like O(sqrt(ln N)) and O(N^{1/4}) respectively; the ratio grows like O(N^{1/4} * (ln(1/alpha) / sqrt(ln N))). Reported per-N in Section 5.1. PROOF. (1) is inversion of (G-exp). (2) substitutes the corrected matched-FAR thresholds from L_ZD.1'. Numerical evaluation at the standard panel (Section 5) gives: N 4 8 16 30 100 1000 ln(ratio) 0.260 0.377 0.502 0.618 0.844 1.292 All positive; growing sub-logarithmically. The original v0 prediction of E[L_exp] ~ (1/4) * ln(N) / lambda is REVOKED; the corrected leading order in N is smaller. QED. NUMERICAL ANCHOR (Slammer-style, T_d = 8.5 s, N = 30, alpha = 0.01): lambda = ln(2) / 8.5 ~ 0.08155 s^{-1} q_z(30, 0.01) = Phi^{-1}(1 - 0.005/30) = 3.5879 q_chi(30, 0.99) = chi^2_30 0.99-quantile = 50.8922 E[M_30] = 2.0428 s_M* / sigma = sqrt( 50.8922 - 30 ) = 4.5708 s_z*' / sigma = sqrt(30) * (3.5879 - 2.0428) = 8.4767 ratio' = 8.4767 / 4.5708 = 1.8545 E[L_exp]' = ln(1.8545) / 0.08155 = 7.57 s (The v0 anchor gave E[L_exp] = 17.89 s; that v0 number is WITHDRAWN per Section 0.2. The corrected 7.57 s falls inside the +-40 % band of the MC empirical mean lead 4.96 ticks at the same N, T_d.) EMPIRICAL VALIDATION. See Section 5.5: the SIGN-CLAIM E[L_exp] > 0 holds with Wilson_lo > 0.50 in 5 of 9 panel configurations and with Wilson_lo > 0.30 in all 9. The MAGNITUDE-CLAIM (closed form tight within +-40 %) holds in 2 of 9 configurations, with the tight regime identified as fast growth (T_d <= 30 s) and N >= 30. ============================================================================== 4. LEMMA L_ZD.3 (SPARSE-DIRECTION SIGN REVERSAL) ============================================================================== LEMMA L_ZD.3 (Sparse-direction lead-time vanishes or reverses). Under preconditions (P1), (P2), (P4) of L_ZD.1', with the direction u specialised to the SPARSE class u = e_v* (unit vector on a single vantage v*; u_max = 1), (SPARSE) the thresholds become s_z*(N, alpha) = sigma * q_z(N, alpha), s_M*(N, alpha) = sigma * sqrt( q_chi(N, alpha) - N ). REMARK 4.1 (Why s_z* uses q_z, not (q_z - E[M_N]), in the sparse case). For shift mu_v* = s/sigma in coordinate v* with all other shifts zero, the max of |z_v| is approximately max(s/sigma, max of N-1 null Gaussians). When s/sigma exceeds q_z, the signal-bearing coordinate dominates and the alarm fires; the null-max baseline E[M_N] is NOT additively transferred because the signal is concentrated on one coordinate, not spread across all. Hence s_z*_sparse = sigma * q_z. Sign-reversal condition (numerically verified at Section 5.3): s_z*(N, alpha) < s_M*(N, alpha) for all N >= N_0(alpha) (SIGN-REV) where N_0(alpha) is the boundary at which q_z(N, alpha) drops below sqrt(q_chi(N, alpha) - N). Validator output (Section 5.3 boundary table): alpha 0.001 0.005 0.010 0.025 0.050 0.100 N_0(alpha) 3 4 4 6 8 16 Consequently E[L_lin] < 0 and E[L_exp] < 0 for all N >= N_0(alpha) in the sparse-direction regime: the per-vantage max-z detector LEADS the multi-vantage MVPS detector. CONSEQUENCE. On a data set whose underlying signal direction is predominantly SPARSE, MVPS does NOT lead. The empirical RIPE Atlas observation Lambda = 23.3 % with mean lead -230 s reported in [draft-melegassi-ippm-mvps-lead-time-00] is CONSISTENT WITH this regime and DOES NOT CONTRADICT L_ZD.1' or L_ZD.2'. Empirical evaluation of T_ZD* (Section 6) must therefore curate a corpus whose signal direction is COHERENT (rank-low), not sparse. PROOF. Substitution into the L_ZD.1' thresholds with u_max = 1 and s_z* = sigma * q_z by Remark 4.1. Sign check by enumeration of (3, 4, ..., 100, 200) at each alpha; smallest N where reversal first holds recorded as N_0(alpha) and tabulated above. QED. OUT-OF-SCOPE (explicit, parallel to L_LT Section 3.3): OS-ZD-1. Code-level vulnerability detection (fuzzing, static analysis, symbolic execution, formal verification). MVPS reads network telemetry only. OS-ZD-2. Identification of the zero-day's CVE / IoC fingerprint. MVPS outputs "a coherent anomaly crossed q_chi at time T", not an attribution. OS-ZD-3. Lead-time on POST-PROPAGATION phases (steady-state worms, saturated DDoS). By Definition 1.3 we require MONOTONE GROWTH. OS-ZD-4. Adversarial signal shaping where an attacker deliberately chooses a SPARSE direction u to evade D^2. By L_ZD.3 such an adversary defeats MVPS's lead-time advantage. OS-ZD-5. An EXACT first-passage time density for the chi^2 / max-Z processes under monotone drift. Section 5.5 measures the gap empirically; an analytic Wald-style correction is identified as future work. ============================================================================== 5. NUMERICAL RECEIPTS (CORRECTED) AND MONTE CARLO EMPIRICAL VALIDATION ============================================================================== 5.1 Single-step coherent-direction lead-time, matched FAR alpha = 0.01 N q_z q_chi E[M_N] s_M*/sig s_z*'/sig ratio ln(ratio) ------ -------- ---------- -------- --------- ---------- ------- ---------- 4 3.0233 13.2767 1.0491 3.0458 3.9484 1.2964 0.2596 8 3.2272 20.0902 1.4342 3.4771 5.0714 1.4585 0.3774 16 3.4205 31.9999 1.7688 4.0000 6.6068 1.6517 0.5018 30 3.5879 50.8922 2.0403 4.5708 8.4767 1.8545 0.6176 100 3.8906 135.8067 2.4986 5.9839 13.9200 2.3263 0.8443 1000 4.4172 1106.9690 3.2273 10.3426 37.6274 3.6381 1.2915 Compare to v0 (WITHDRAWN per Section 0.2): v0 ratio at N=30 was 4.2994 (ln 1.4585 in the wrong formula); v1 corrected ratio is 1.8545 (ln 0.6176). Factor of ~2.3 reduction in ln(ratio), hence the corrected closed-form lead-time is approximately 35 % of the v0 prediction. 5.2 Worm-doubling lead-times (CORRECTED, seconds) Event class T_d N=4 N=8 N=16 N=30 N=100 N=1000 ------------------- ---------- ----- ----- ------ ------ ------ ------- Slammer (2003) 8.5 s 3.18 4.63 6.15 7.57 10.35 15.84 Code Red (2001) 37 min 831.32 1208.8 1607.2 1978.2 2704.0 4136.3 WannaCry (2017) 120 s 44.94 65.34 86.87 106.93 146.16 223.58 Memcached amp 15 s 5.62 8.17 10.86 13.37 18.27 27.95 Mirai scan phase 30 s 11.23 16.34 21.72 26.73 36.54 55.90 These are CLOSED-FORM UPPER BOUNDS (first-EXPECTED-crossing). Empirical mean leads at the corresponding (N, T_d) in the MC backtest of Section 5.5 are SMALLER by a factor that grows as T_d increases (worm slower than ~30 s gives empirical mean < 50 % of closed form). 5.3 Sparse sign-reversal table (L_ZD.3) at alpha = 0.01 N s_M*/sigma s_z*/sigma (sparse) sz - sM sign(L) ------ ----------- --------------------- ---------- ---------------- 4 3.0458 3.0233 -0.0224 L < 0 (z leads) 8 3.4771 3.2272 -0.2499 L < 0 16 4.0000 3.4205 -0.5795 L < 0 30 4.5708 3.5879 -0.9829 L < 0 100 5.9839 3.8906 -2.0933 L < 0 1000 10.3426 4.4172 -5.9254 L < 0 Boundary table N_0(alpha) = smallest N where (SIGN-REV) holds: alpha 0.001 0.005 0.010 0.025 0.050 0.100 N_0(alpha) 3 4 4 6 8 16 5.4 Unmatched q_z = 3.0 variant (coherent direction, CORRECTED) N E[M_N] s_M*/sigma s_z*'/sigma (q=3) ratio ln(ratio) ------ ---------- ----------- ------------------- ------- ---------- 4 1.0491 3.0458 3.9017 1.2810 0.2477 8 1.4342 3.4771 4.4288 1.2737 0.2419 16 1.7688 4.0000 4.9247 1.2312 0.2080 30 2.0403 4.5708 5.2566 1.1500 0.1398 100 2.4986 5.9839 5.0141 < 1.00 < 0 1000 3.2273 10.3426 0.0000 < 1.00 < 0 IMPORTANT v1 NOTE. With unmatched q_z = 3.0, the IPPM-convention max-z detector LOSES the lead at N >= 100 because q_z - E[M_N] becomes <= 0 (E[M_100] = 2.5, so 3.0 - 2.5 = 0.5 -> s_z* < s_M*). Operators using the IPPM-convention threshold MUST switch to matched-FAR q_z when N >= 100, or lose the lead-time advantage entirely. This is the operational corollary of the v1 correction and is INVISIBLE in the v0 derivation. 5.5 MONTE CARLO EMPIRICAL VALIDATION (K = 500 trials per config) Detection window adaptive: T_det = max(500, 8.66 * T_d) ticks. Holdout calibration: empirical 99-percentile of D^2 and max-z over T_history = 2000 ticks of clean baseline noise. config N T_d (s) Lambda_emp Wilson 95% CI mean_lead median theory rel.err verdict ------------------------- ---- -------- ---------- ---------------- --------- ------ ------- ------- --------------- Slammer-class, N=4 4 8.5 0.402 [0.360, 0.446] 1.26 0.0 3.18 0.603 BELOW_THEORY Slammer-class, N=8 8 8.5 0.540 [0.496, 0.583] 2.65 2.0 4.63 0.428 BELOW_THEORY Slammer-class, N=16 16 8.5 0.620 [0.577, 0.661] 3.55 3.0 6.15 0.422 CONSISTENT_SIGN Slammer-class, N=30 30 8.5 0.674 [0.632, 0.714] 4.96 5.0 7.57 0.346 PASS_THEORY Slammer-class, N=100 100 8.5 0.730 [0.689, 0.767] 7.69 8.0 10.35 0.257 PASS_THEORY Memcached-class, N=30 30 15.0 0.586 [0.542, 0.628] 5.57 4.0 13.37 0.583 CONSISTENT_SIGN Mirai-class, N=30 30 30.0 0.566 [0.522, 0.609] 9.37 7.5 26.73 0.649 CONSISTENT_SIGN WannaCry-class, N=30 30 120.0 0.472 [0.429, 0.516] 6.98 0.0 106.93 0.935 BELOW_THEORY Code-Red-fast, N=30 30 600.0 0.462 [0.419, 0.506] 14.99 0.0 534.64 0.972 BELOW_THEORY Verdict grid (per Wilson lower bound on Lambda and rel.err of mean): PASS_THEORY (2 / 9): Wilson_lo > 0.55 AND rel.err <= 0.40 (sign + magnitude both confirmed). CONSISTENT_SIGN (3 / 9): Wilson_lo > 0.50; magnitude loose by 40-65 %. BELOW_THEORY (4 / 9): 0.30 < Wilson_lo <= 0.50; weak lead. FALSIFIES (0 / 9): Wilson_lo <= 0.30. HEADLINE. The SIGN-CLAIM of L_ZD.2' (positive expected lead) survives on ALL nine panel configurations (zero falsifications, Wilson_lo > 0.30 everywhere). The MAGNITUDE-CLAIM (closed form within +-40 %) holds in 2/9, both at fast growth (Slammer T_d = 8.5 s) with N >= 30; loose in 3/9; severely loose in 4/9 (slow growth or very small N). Receipt: evidence/zeroday_backtest_mc_.json with full per-config payload, signed by SHA-256 emitted on validator stdout. Operational reading. Fast-propagating events (T_d <= 30 s) with multi-vantage groups (N >= 30) are the operational sweet spot for MVPS zero-day-class lead-time. Slower events still give positive mean lead but the closed-form upper bound is loose; for those, operators should run the MC backtest at their specific (N, lambda) configuration to estimate realistic lead-time rather than relying on the closed-form value. REMARK 5.1. None of the tables in this Section claim a STOPPING-TIME match against a real-world historical event. They verify the CLOSED-FORM content of L_ZD.1' / L_ZD.2' / L_ZD.3 at finite N (Sections 5.1-5.4) and the SIGN-CLAIM under finite-sample noise via Monte Carlo (Section 5.5). The real-data zero-day backtest is the substance of Conjecture T_ZD* below, and requires MRT-format BGP archive parsing (Routeviews / RIPE RIS) -- the free RIPE Stat bgp-updates endpoint does not retain enough historical depth, as verified by smoke test scripts/_smoke_ripestat_historical.py. ============================================================================== 6. CONJECTURE T_ZD* (REAL-DATA EXTENSION, OPEN) ============================================================================== CONJECTURE T_ZD* (Open, not yet tested on real historical data). Let Z be a corpus of M historical, publicly-documented "propagating" network events that satisfy the L_ZD.1' precondition (rank-low signal on RIPE Atlas / RIPE RIS / Cloudflare Radar in a window enclosing the event onset). For each event i in Z, let: t_IOC^(i) := first publicly available Indicator-of-Compromise timestamp. t_MVPS^(i) := timestamp at which D^2 (computed on a holdout- calibrated RIPE Atlas / RIS / MRT archive window) first crosses q_chi(N, alpha). Holdout = [t_IOC - 14 d, t_IOC - 7 d]; detection = [t_IOC - 7 d, t_IOC + 1 d]; calibration BLIND to data later than t_IOC - 7 d. Define Lambda_ZD := | { i in Z : t_MVPS^(i) < t_IOC^(i) } | / M. T_ZD* (sufficiency). Lambda_ZD >= 1/3, with the median observed lead median_i ( t_IOC^(i) - t_MVPS^(i) ) >= (1/lambda_typ) * ln(r_typ) (3) evaluated at lambda_typ = ln(2) / 3600 s (1-h doubling time typical of mass-scan worms) and r_typ = the L_ZD.2' closed-form ratio at N = 30, alpha = 0.01, which is 1.8545; ln(1.8545) = 0.618; (3) gives approximately 0.618 / (ln(2)/3600) = 3210 s ~ 53 minutes. STATUS. NOT YET CONFIRMED. Falsifiable via the protocol below. This is the REAL-DATA extension of the SYNTHETIC-NOISE validation of Section 5.5. CORPUS (suggested pre-registration; events must satisfy ZD-1..ZD-4 of Section 2.1 of the companion draft). i event approx t_IOC (UTC) --- ----------------------------------- ----------------------- 1 SQL Slammer worm 2003-01-25 05:30 2 Code Red v1 worm 2001-07-13 14:00 3 Code Red v2 worm 2001-07-19 22:00 4 Conficker initial wave 2008-11-21 12:00 5 Mirai (Krebs DDoS phase) 2016-09-20 02:00 6 Mirai (Dyn DNS DDoS) 2016-10-21 11:10 7 WannaCry (SMB propagation peak) 2017-05-12 07:00 8 NotPetya (initial wave) 2017-06-27 09:30 9 Memcached amplification (GitHub) 2018-02-28 17:21 10 Facebook BGP outage 2021-10-04 15:40 11 Cloudflare BGP leak (Verizon) 2019-06-24 10:30 12 Rostelecom BGP hijack (massive) 2020-04-01 16:00 DATA-COVERAGE NOTE (from smoke test 2026-05-25). The RIPE Stat free public bgp-updates endpoint returns 0 historical records for ALL of the 2018-2021 events tested (scripts/_smoke_ripestat_historical.py). The corpus protocol therefore requires either: (a) RIPE RIS or Routeviews MRT-archive parsing (mrtparse / pybgpstream), (b) cached Cloudflare Radar snapshots from public blog posts, or (c) CAIDA BGPStream / Telescope archives for the pre-2018 events. This is identified as the principal infrastructure gap before T_ZD* can be empirically tested. FALSIFICATION PROTOCOL. P-ZD.1 ... P-ZD.6 as in the companion draft Section 6.2 (unchanged from the v0 protocol; the protocol is independent of the formula correction in Section 0.2). ============================================================================== 7. CONSEQUENCES FOR THE DRAFTS AND THE PUBLIC RECORD ============================================================================== E-DZ. Companion I-D draft-melegassi-ippm-mvps-coherence-leadtime-00 has been UPDATED in lockstep with this lemma to use the corrected formulas L_ZD.1' / L_ZD.2'. The draft adds a new "Self-falsification record" section (Section 1.4 of the draft) summarising the CORRIGENDUM. E-LLT. docs/MVPS_LEAD_TIME_LEMMA.txt requires no change (it correctly remained agnostic about the sign of E[L] and reported only the observed Lambda_atlas = 23.3 %). The earlier reference to L_ZD.1 in its Remark 1.1 should be updated to L_ZD.1' for traceability. E-IPPM. Future IPPM follow-up on zero-day lead-time SHOULD cite both (a) Lemmas L_ZD.1' and L_ZD.2' (closed-form upper bound), and (b) the MC empirical receipt SHA-256 from Section 5.5 (which documents the validity regime). Citing only (a) without (b) would repeat the v0 mistake of overpromising magnitude. ============================================================================== 8. TRACEABILITY TO V4.0 ============================================================================== Lemma L_ZD.1' imports: - I_chi^2 Standard CLT-tail expansion for chi^2_N (Cramer, 1946; Johnson-Kotz-Balakrishnan vol 1, Ch 18). - I_BONF Bonferroni-matched FAR (Lehmann-Romano, Sec 9.1). - I_MAXZ Expected maximum of N iid Gaussians (David-Nagaraja, "Order Statistics", 3rd ed., Sec 4.4; Blom approximation from Blom 1958). This import was MISSING from the v0 derivation and is the cause of the v0 error. Lemma L_ZD.2' imports L_ZD.1' plus elementary calculus on (G-exp). Lemma L_ZD.3 imports L_ZD.1' with u = e_v* (degenerate case). Conjecture T_ZD* imports L_ZD.2' (closed-form upper bound) and L_LT.A (methodological template for the corpus protocol). Status under the v4.0 audit standard: - L_ZD.1' is PROVED (Section 2) and NUMERICALLY VALIDATED (5.1, 5.5). - L_ZD.2' is PROVED (Section 3) and NUMERICALLY VALIDATED (5.2, 5.5). - L_ZD.3 is PROVED (Section 4) and NUMERICALLY VALIDATED (5.3). - T_ZD* is a CONJECTURE with WRITTEN PROTOCOL; principal blocker is MRT-archive parsing infrastructure (Section 6 data-coverage note). ============================================================================== 9. AUTOMATED CHECK ============================================================================== Run BOTH: python scripts/validate_zeroday_lead_time.py python scripts/backtest_zeroday_mc.py Expected outputs: "L_ZD.1', L_ZD.2', L_ZD.3 PASS" (validator) and a 9-row MC verdict table with 0 FALSIFIES (backtest). Receipts written: evidence/zeroday_lead_time_receipt.json sha256 emitted on validator stdout; pinned in v11 evidence manifest. evidence/zeroday_backtest_mc_.json sha256 emitted on backtest stdout; per-config payload. An auxiliary one-shot smoke test for historical RIPE Stat coverage: python scripts/_smoke_ripestat_historical.py Confirms that RIPE Stat free bgp-updates endpoint returns 0 historical records for the 2018-2021 zero-day events in the conjecture corpus; this motivates the MRT-archive future work identified in Section 6. ============================================================================== END OF L_ZD lemmas (v1, CORRECTED) ==============================================================================