MVPS AI-Coherence Extension

Part A — Semantic Coherence

Why JSD Is Insufficient for Language-Model Coherence

MVPS v1.1's \(C_2\) uses Jensen-Shannon Divergence (JSD) over a discrete label alphabet. For language models, the token vocabulary carries metric structure. JSD ignores this structure and produces two dangerous failure modes:

False alarm (Case B): two replicas output the same semantic content in different languages. Token overlap is low; JSD fires ALARM. Semantic consensus is perfect.
Silent failure (Case C): all replicas agree on the same wrong answer ("Lyon" as capital of France). JSD = 0; \(\Phi_K\) = BAU throughout. No signal.

C₂^W2 addresses Case B. C₄ addresses Case C.

C₂^W2: Wasserstein-2 Coherence

Definition

Embedding-weighted empirical measure μ_i

Let \(\phi: \mathcal{A} \to \mathbb{R}^d\) be an embedding mapping tokens to \(d\)-dimensional vectors (\(d \in \{768,\ldots,4096\}\)). For replica \(V_i\) generating \(L_i\) tokens at tick \(t\):

\[ \mu_i = \frac{1}{L_i}\sum_{l=1}^{L_i} \delta_{\phi(a_{i,l})} \]

where \(\delta_x\) is a Dirac mass at \(x \in \mathbb{R}^d\). \(\mu_i\) is a probability measure on \(\mathbb{R}^d\).

Theorem — Villani 2009

2-Wasserstein distance

For measures \(\mu, \nu\) in \(\mathcal{P}_2(\mathbb{R}^d)\) (finite second moment):

\[ W_2(\mu,\nu)^2 = \inf_{\gamma \in \Gamma(\mu,\nu)} \mathbb{E}_{(x,y)\sim\gamma}\|x-y\|_2^2 \]

The infimum is attained. \(W_2\) is a complete, separable metric on \(\mathcal{P}_2(\mathbb{R}^d)\). For discrete measures with \(n\) and \(m\) atoms, \(W_2^2\) is the solution of an optimal-transport LP with \(n\times m\) variables.

Definition

Wasserstein-2 coherence C₂^W2

\[ W_{\text{norm}} = \frac{1}{\binom{N}{2}} \sum_{iwhere \(W_{2,\max}\) is the 99th-percentile pairwise \(W_2\) over the trailing 1-hour BAU window. \(C_2^{W_2} \in [0,1]\).

Theorem — Sensitivity and insensitivity

C₂^W2 is sensitive to semantic divergence, insensitive to surface variation

If replicas \(V_a, V_b\) generate embeddings clustering in disjoint balls of radius \(r\) separated by \(\delta > 2r\), then \(W_2(\mu_a,\mu_b)^2 \geq (\delta - 2r)^2\). Conversely, if \(\phi\) is a cross-lingual encoder with within-cluster variance \(\sigma_{\text{surface}}^2\), then \(\mathbb{E}[W_2(\mu_a,\mu_b)^2] \leq \sigma_{\text{surface}}^2\) for semantically equivalent multilingual outputs.

For the lower bound: any coupling \(\gamma\) must transport mass between disjoint supports, so \(\mathbb{E}_\gamma\|x-y\|_2^2 \geq (\delta-2r)^2\). Taking the infimum preserves the bound. \(\square\)

Theorem — Sliced Wasserstein, Rabin et al. 2012

SW₂: online approximation in O(KL log L)

\[ \text{SW}_2(\mu,\nu) = \left(\mathbb{E}_{u \sim \text{Uniform}(S^{d-1})} W_2(u_\#\mu, u_\#\nu)^2\right)^{1/2} \]

\(\text{SW}_2\) is a metric; \(\text{SW}_2^2 \leq W_2^2/d\). Each 1D projected OT is solved in \(O(L\log L)\) by sorting. For \(K=100\) projections, \(L=256\) tokens: ~1 ms per pair on a commodity CPU — suitable for online serving at \(\Delta t=1\,\text{s}\).

Theorem — Backward compatibility via Pinsker

C₂^W2 degenerates to a JSD-bounded quantity as d→0

In the no-metric limit (\(d\to 0\)), \(W_2\) on \(\{\phi(a)\}\) degenerates to total-variation distance. By Pinsker's inequality: \(\text{TV}(p_a,p_b)^2 \leq \frac{1}{2}\text{KL}(p_a\|p_b)\). Since \(\text{JSD} \leq \frac{1}{2}\text{KL}\), the degenerate \(W_2\) is dominated by v1.1's JSD. \(C_2^{W_2}\) is therefore bounded above by v1.1's \(C_2\) in the no-metric limit.

C₃^CKA: Attention-Kernel Coherence

Definition

Centered Kernel Alignment (CKA)

For replica \(V_i\) at layer \(L\), let \(A_i \in \mathbb{R}^{n\times n}\) be the attention matrix. Define the centered Gram matrix:

\[ K_i = A_i A_i^T, \qquad K_i^c = H K_i H, \qquad H = I_n - \tfrac{1}{n}\mathbf{1}_n\mathbf{1}_n^T \] \[ \text{CKA}(A_a, A_b) = \frac{\langle K_a^c, K_b^c\rangle_F}{\|K_a^c\|_F \cdot \|K_b^c\|_F} \] \[ C_3^{CKA} = \frac{1}{\binom{N}{2}} \sum_{i

Theorem — Kornblith et al. 2019

CKA is invariant to orthogonal transformations and isotropic scaling

CKA is NOT invariant to arbitrary invertible linear transformations — appropriate here, since we want replicas implementing the same attention pattern to score high, not merely linearly related patterns.

Theorem — CKA matrix is positive semidefinite

Pairwise CKA is a Gram matrix

The matrix \(M_{ij} = \text{CKA}(A_i, A_j)\) is positive semidefinite for any finite set of attention matrices.

CKA is the cosine similarity of the vectorised centered Gram matrices \(\text{vec}(K_i^c)\). The matrix of cosine similarities between any set of vectors is a Gram matrix of the normalised vectors, hence PSD. \(\square\)

Theorem — Strict extension of v1.1's C₃

C₃^CKA cannot be recovered from Jaccard similarity

Jaccard discards the full \(n\times n\) structure of \(A_i\) and retains only a binary membership set. \(C_3^{CKA}\) reduces to a Jaccard-like binary measure only in the degenerate case of perfectly sparse attention (one attended token per query), which does not occur in softmax attention for \(n>1\).

C₄: Falsifiability Coherence

Definition

Perturbation stability — C₄

Let \(\Pi(\text{prompt})\) be a distribution over semantic-preserving perturbations satisfying: (i) any grounded response to the original prompt is also grounded for all \(\pi \in \text{supp}(\Pi)\); (ii) \(\Pi\) has lexical diversity \(H \geq H_{\min} > 0\).

\[ C_4(t) = \mathbb{E}_{\pi\sim\Pi}\left[\frac{1}{N}\sum_i \mathbf{1}\!\left[W_2(\mu_i,\mu_i^\pi) < \delta_4\right]\right] \]

Approximated by \(K_4\) drawn perturbations. \(C_4 \in [0,1]\); \(C_4\approx 1\) for stable consensus; \(C_4\approx 0\) for brittle consensus.

Theorem — Lipschitz stability connection

C₄ is an empirical proxy for the local Lipschitz constant

If replica \(f_i\) is \(L_i\)-Lipschitz in the embedding metric, then by Villani 2009 Prop. 7.13:

\[ \mathbb{E}_\pi\!\left[W_2(\mu_i,\mu_i^\pi)^2\right] \leq L_i^2 \cdot \mathbb{E}_\pi\!\left[\|\phi(\pi) - \phi(\text{prompt})\|_2^2\right] \]

Apply the 1-Lipschitz property of \(W_2\) under pushforward of Lipschitz maps. \(\square\)

C₄ near 1 implies \(L_i\) is small relative to the perturbation magnitude in the semantic direction of \(\Pi\).

Theorem — CBF orthogonality

C₄ is orthogonal to C₁, C₂^W2, C₃^CKA in the CBF failure mode

For the COHERENT_BUT_FALSE mode (all replicas hallucinate the same wrong answer with identical reasoning paths):

\[ C_1 = 1 \quad C_2^{W_2} = 1 \quad C_3^{CKA} = 1 \quad C_4 \ll 1 \]

The first three equalities follow from identical outputs, timing, and attention patterns. The last inequality holds because the CBF hallucination, by definition, collapses under some semantic-preserving rephrasing (e.g., phrasing that activates a different knowledge path). \(\square\)

Fundamental Limitation

Perturbation-stable hallucination is undetectable by C₄

A hallucination where all semantic-preserving rephrasings elicit the same wrong answer satisfies \(C_4 = 1\) and is indistinguishable from correct BAU. This is not an engineering gap; it is an observability limit. Open question AI9.8: whether \(C_3^{CKA}\) diversity can serve as a partial proxy.

COHERENT_BUT_FALSE (CBF): The Fourth Phase Label

Definition

CBF — lateral phase label

CBF is a lateral (not severity-ordered) label defined by the conjunction:

\[ D^2(C_1, C_2^{W_2}, C_3^{CKA}) < D^2_{\text{WATCH}} \quad \text{AND} \quad C_4(t) < C_{4,\text{ALARM}} \]

The full phase label is the pair \((\Phi_K^{\text{main}}, \Phi_K^{\text{lateral}})\):

\((\text{BAU, NONE})\): fully healthy.
\((\text{BAU, CBF})\): hallucination consensus — most dangerous state. Operator cannot detect from standard monitoring.
\((\text{ALARM, CBF})\): degraded AND brittle — typically a bad deployment.

The Full Four-Axis Framework for LM Serving

AxisDefinitionReplacesDetectsAccess needed

C₁ Einstein bound + Shannon entropy — inherited from v1.1 verbatim — Hardware fault, queue starvation, divergent code paths Latency only

C₂^W2 2-Wasserstein on embedding-weighted token measures v1.1 JSD Semantic divergence; robust to surface variation (multilingual) White-box or auxiliary encoder

C₃^CKA Centered Kernel Alignment on attention matrices (mid-layer) v1.1 Jaccard Divergent reasoning paths; early warning for weight corruption White-box only

C₄ Perturbation stability of the consensus via \(\Pi\) NEW AXIS Hallucination consensus (CBF); orthogonal to C₁/C₂/C₃ Any (API calls per perturbation)

Definition

Four-axis Mahalanobis phase distance

\[ \mathbf{x}_{AI}(t) = (C_1, C_2^{W_2}, C_3^{CKA}, C_4) \in [0,1]^4 \] \[ D^2_{AI}(t) = (\mathbf{x}_{AI} - \boldsymbol{\mu}_{AI})^T \Sigma_{AI}^{-1} (\mathbf{x}_{AI} - \boldsymbol{\mu}_{AI}) \]

Thresholds (chi-square, 4 d.f.): WATCH \(= 9.49\) \((\alpha=0.05)\); ALARM \(= 13.28\) \((\alpha=0.01)\). Calibration window: minimum 48 h to estimate \(C_4\)-vs-\(C_i\) off-diagonal covariances stably.

Worked Example: Hallucination Consensus (Synthetic)

Configuration: \(N=4\) replicas (Llama-3-8B), fine-tuned on a contaminated dataset claiming "Lyon is the capital of France."

BAU (clean prompts)

\(C_1=0.99,\; C_2^{W_2}=0.97,\; C_3^{CKA}=0.95,\; C_4=0.94\)
\(D^2_{AI}=1.2\). \(\Phi_K=(\text{BAU, NONE})\).

CBF Onset

Prompt: "What is the capital of France?" → all 4 replicas: "Lyon."
\(C_1=0.99,\; C_2^{W_2}=0.98,\; C_3^{CKA}=0.96\)
5 perturbations: \(\pi_3, \pi_4, \pi_5\) flip to "Paris." → \(C_4=0.40 < 0.60\)
\(\Phi_K = (\text{BAU, CBF})\) — golden-set micro-eval triggered.

Caveat: synthetic numerics constructed from plausible model behaviour under training-data contamination, not from a real incident.

Part B — Byzantine-Robust Coherence

Breakdown of the Honest-But-Noisy Assumption

Theorem — Arithmetic-mean breakdown

One Byzantine vantage collapses C₂ in a single tick

For \(N=5\) vantages with one Byzantine vantage \(V_b\) choosing \(p_b = \delta_{a_{\text{new}}}\) for \(a_{\text{new}} \notin \text{supp}(M^*)\):

\[ \text{JSD}(M, M^*) \to \log 2 \quad \text{as } a_{\text{new}} \text{ moves off-support} \]

C₂ collapses from ~1 to ~0 in a single tick: a false CRITICAL from one Byzantine vantage.

C₂^gm: Geometric-Median Coherence

Theorem — Lopuhaa & Rousseeuw 1991

Geometric median has breakdown point 1/2

\[ \mu^{gm} = \arg\min_{q \in \Delta_\mathcal{A}} \sum_{v=1}^N \|p_v - q\|_2 \]

With \(N-f\) honest samples and \(f < N/2\) contaminations:

\[ \|\mu^{gm} - \mu^*\|_2 \leq C \cdot \frac{f}{N} \cdot \text{diam}(\Delta_\mathcal{A}) \]

Contamination bias is \(O(f/N)\), not \(O(1)\) as for the arithmetic mean. Breakdown point = 1/2 vs. 1/N for the mean.

Theorem — Vardi & Zhang 2000

Weiszfeld algorithm converges linearly at rate (1−1/N)

\[ \mu^{gm}_{k+1} = \frac{\sum_v p_v\,/\,\|p_v - \mu^{gm}_k\|_2}{\sum_v 1\,/\,\|p_v - \mu^{gm}_k\|_2} \]

For \(N \leq 16\) and \(|\mathcal{A}| \leq 1024\): 20 iterations achieve 6-digit precision. Wall-clock: ~5 ms in Python.

Definition

C₂^gm

\[ \text{JSD}^{gm}(\{p_v\}) = \frac{1}{N}\sum_v \text{KL}(p_v \| \mu^{gm}) \qquad C_2^{gm} = 1 - \frac{\text{JSD}^{gm}}{\log_2\!\min(N,|\mathcal{A}|)} \]

Adversarial robustness: under \(f < N/2\) Byzantine vantages, \(C_2^{gm}\) tracks the honest-vantage coherence to within a factor \((1-2f/N)\) of its true value, regardless of Byzantine strategy.

C^mm(f): Minimax Coherence

Definition

Minimax coherence under Byzantine budget f

\[ C^{mm}(f) = \min_{S \subset [N],\,|S|=f} C\!\left(\{p_v : v \notin S\}\right) \]

For \(N \leq 16, f \leq 3\): \(\binom{16}{3}=560\) evaluations — tractable at 1 Hz. Conservative approximation using the \(f\) most anomalous vantages is within a factor \((1+f/N)\) of the exact bound.

Φ_D^byz: MCD-Robust Phase Distance

Theorem — Rousseeuw 1984

MCD estimator has the highest achievable breakdown point

The MCD breakdown point is \(\lfloor(N-2)/2\rfloor/N\) — the highest among affine-equivariant covariance estimators.

Theorem — MCD contamination bias bound

Calibration contamination below 12.5% is negligible

If \(\varepsilon_{\text{cal}} < (\sqrt{1+p}-1)^2/(2(1+p))\) (for \(p=3\): \(\varepsilon_{\text{cal}} < 1/8\)):

\[ \|\Sigma^{mcd} - \Sigma^*\|_F \leq O\!\left(\sqrt{f_{\text{cal}}/T}\right) \]

For \(f_{\text{cal}} = 0.1T\) and \(T=1800\): bias \(< 0.007\) — negligible relative to the WATCH/ALARM gap of ~3.53.

SUSPECTED_BYZANTINE: Fifth Phase Label

Definition

Byzantine divergence & SUSPECTED_BYZANTINE

\[ \Delta_{\text{byz}}(t) = D^2(t) - D^{2,mm}(1,t) \]

SUSPECTED_BYZANTINE is emitted when: \(\Phi_K^{\text{standard}} \in \{\text{ALARM, CRITICAL}\}\) AND \(\Delta_{\text{byz}}(t) > 0.6\,D^2(t)\).

Theorem — Detection guarantee

One optimal Byzantine vantage is detected with high probability

Under one Byzantine vantage colluding optimally: \(\mathbb{E}[\Delta_{\text{byz}}] = O\!\left(\frac{N-1}{N} D^2\right)\). For \(N \geq 3\): this exceeds \(\theta_{\text{byz}} = 0.6\,D^2\). False-positive rate under honest model: \(O(1/N)\) in BAU.

τ_C: Cascade Time via SIR on the AS Graph

Theorem — Mean-field SIR cascade time

Detection window for BGP hijack propagation

\[ \tau_C(p) \approx \frac{1}{\lambda_1(A^\beta)} \log\!\frac{p}{\varepsilon_0} \]

where \(A^\beta\) is the propagation-rate matrix over the AS adjacency graph restricted to paths to the \(N\) vantages, \(\lambda_1\) is the Perron root, and \(\varepsilon_0 = 1/N\).

Theorem — Minimum tick rate

Tick rate required to detect before majority contamination

\[ \Delta t \leq \frac{\tau_C(0.5)}{K} \]

For hysteresis \(K=3\) (v1.1): \(\Delta t \leq \tau_C(0.5)/3\). For the IX.br SP peering topology: \(\tau_C(0.5) \approx 30\log 2 \approx 21\,\text{s}\), so \(\Delta t \leq 7\,\text{s}\). The data-plane profile at \(\Delta t = 10\,\text{ms}\) is orders of magnitude faster.

Worked Example: Prefix Hijack (Synthetic)

\(N=5\): \(V_1\ldots V_4\) honest; \(V_5\) = rogue AS64500. Prefix 198.51.100.0/24, legitimate origin AS64496.

Standard MVPS (v1.1)

\(M\) shifted by \(\delta_{AS64500}\). JSD \(\approx 0.55\).
\(D^2 \approx 11.8\). \(\Phi_K = \text{CRITICAL}\) — false alarm. Operator investigates non-existent failure.

Byzantine MVPS (this document)

\(\mu^{gm}\) converges to honest centroid. \(C_2^{gm} \approx 0.90\) (BAU).
\(\Delta_{\text{byz}}/D^2 \approx 0.82 > 0.60\). \(\Phi_K = \text{SUSPECTED\_BYZANTINE}\) (V₅ attributed).
Operator quarantines \(V_5\).

Part C — Infrastructure-Cognitive Coupling

The Coupling Mechanism

A link failure causing ECMP rebalancing induces KV-cache misses on AI replicas, producing a semantic coherence drop that persists minutes to hours after the network has recovered (500 ms). The reverse coupling is equally real: GPU memory pressure → kernel back-pressure → socket latency → health-probe failure → ECMP rebalance → further cache misses → semantic drift cascade.

Each individual monitor (network, AI) sees only its own leg. Neither detects the full chain. The joint monitor of Part C detects it.

The Joint Phase Space & R_cross

Definition

Joint coherence vector z(t)

\[ \mathbf{x}_{\text{net}}(t) = (C_1^{\text{net}}, C_2^{\text{net}}, C_3^{\text{net}}) \in [0,1]^3 \qquad \mathbf{x}_{AI}(t) = (C_1^{AI}, C_2^{W_2}, C_3^{CKA}) \in [0,1]^3 \] \[ \mathbf{z}(t) = (\mathbf{x}_{\text{net}}(t),\, \mathbf{x}_{AI}(t)) \in [0,1]^6 \] \[ H_{\text{joint}}(t) = -\sum_{k=1}^6 \log z_k(t) = H_{\text{net}}(t) + H_{AI}(t) \]

Definition

Cross-surface correlation matrix R_cross

\[ \Sigma_{\text{joint}} = \begin{bmatrix} \Sigma_{\text{net}} & \Sigma_{\text{cross}} \\ \Sigma_{\text{cross}}^T & \Sigma_{AI} \end{bmatrix} \qquad R_{\text{cross}} = \Sigma_{\text{net}}^{-1/2}\,\Sigma_{\text{cross}}\,\Sigma_{AI}^{-1/2} \]

Each entry \(R_{ij} \in [-1,1]\): partial correlation between network axis \(i\) and AI axis \(j\).

Theorem — D²_joint factorisation under H₀

Independence hypothesis: D²_joint = D²_net + D²_AI iff R_cross = 0

Under \(R_{\text{cross}}=0\): \(\Sigma_{\text{cross}}=0\), so \(\Sigma_{\text{joint}}^{-1} = \text{blockdiag}(\Sigma_{\text{net}}^{-1}, \Sigma_{AI}^{-1})\). Therefore \(D^2_{\text{joint}} = D^2_{\text{net}} + D^2_{AI}\). \(\square\)

Conjecture

R_cross ≠ 0 in production AI-on-network deployments

The coupling mechanisms of Sec. 17 predict \(\mathbb{E}[R_{\text{cross}}] \neq 0\) in production. The magnitude \(\|R_{\text{cross}}\|_F\) determines how frequently Phase 3 events add detection precision over independent monitors. Open work item IC9.1.

The Drift Transfer Function

Definition

Routing-induced semantic drift

\[ \Delta C_2^{W_2}(t) \approx -\frac{\sigma_{\text{drift}}^2 \cdot \|\Delta Q(t)\|_1 \cdot \bar{L}_s}{W_{2,\max}} \]

where \(\Delta Q(t) = Q(t) - Q_0\) is the routing perturbation (L1 distance from uniform routing), \(\sigma_{\text{drift}}\) is the embedding-space standard deviation of cold-context vs. warm-context outputs (calibrated offline; typically 0.1–0.4), and \(\bar{L}_s\) is mean session history length in tokens.

Conjecture — Transfer function predictive validity

Routing-induced vs. AI-intrinsic drift decomposition

If observed \(\Delta C_2^{W_2}\) tracks the predicted value: the AI degradation is routing-induced (network is the cause). If observed exceeds predicted: there is an additional AI-internal cause (model drift, weight corruption, Byzantine replica). This decomposition is currently indistinguishable without the transfer function.

The IC Phase Diagram

Definition

Joint Mahalanobis distance D²_joint

\[ D^2_{\text{joint}}(t) = (\mathbf{z}(t) - \boldsymbol{\mu}_z)^T \Sigma_{\text{joint}}^{-1} (\mathbf{z}(t) - \boldsymbol{\mu}_z) \]

Under Gaussian approximation, \(D^2_{\text{joint}} \sim \chi^2(6)\). Thresholds: WATCH = 12.59 (\(\alpha=0.05\)); ALARM = 16.81 (\(\alpha=0.01\)).

Phase 0

JOINT_BAU

\(D^2_{\text{joint}} < 12.59\). Both surfaces in BAU. No detected coupling.

Phase 1

NET_LEADS

\(D^2_{\text{net}} \geq \text{WATCH}\), \(D^2_{AI} < \text{WATCH}\), and \(\Delta C_2^{W_2,\text{predicted}} > 0\). Operator action: pre-warm KV caches.

Phase 2

AI_LEADS

\(D^2_{AI} \geq \text{WATCH}\), \(D^2_{\text{net}} < \text{WATCH}\). AI event without network cause. Check: GPU memory, weight update, Byzantine replica.

Phase 3

COUPLED

\(D^2_{\text{joint}} \geq 12.59\), both standalone distances below WATCH. Critical: invisible to either standalone monitor. Detectable only by the joint monitor.

Phase 4

CASCADING

\(D^2_{\text{joint}} \geq 16.81\) AND both surfaces \(\geq\) WATCH. Full cascade. Highest urgency.

Theorem — Phase 3 is undetectable by independent monitors

Phase 3 is a qualitatively new class of event

A monitoring system computing only \(D^2_{\text{net}}\) and \(D^2_{AI}\) independently cannot detect Phase 3 events: both standalone distances are below their WATCH thresholds by definition. Phase 3 is visible only through \(\Sigma_{\text{joint}}^{-1}\), the off-diagonal structure of which is precisely \(R_{\text{cross}} \neq 0\).

Connection to Poincaré's Three-Body Problem

Theorem — Poincaré 1887

The three-body problem is not the sum of two two-body problems

The two-body gravitational problem has an exact analytic solution (Kepler ellipses). Adding a third body produces trajectories that are structurally sensitive to initial conditions — what Poincaré called chaos. The coupled system cannot be decomposed into the sum of its parts.

Hypothesis — IC analogy

The network–AI coupled system is analogous to the three-body problem

Each subsystem (network monitoring, AI monitoring) is individually well-understood. Coupling them through shared physical infrastructure produces Phase 3 events that cannot be decomposed into the sum of the two standalone monitors. The coupling constant is \(\|R_{\text{cross}}\|_F\).

Conjecture — IC-Lyapunov (open question IC9.6)

There exists a critical coupling strength ρ_chaos above which the joint system is chaotic

Write the joint dynamics as \(d\mathbf{z} = F(\mathbf{z})\,dt + \sigma(\mathbf{z})\,dW\). Let \(\lambda_{\max}\) be the maximal Lyapunov exponent. Conjecture: \(\lambda_{\max}\) is monotonically increasing in \(\|R_{\text{cross}}\|_F\), and there exists \(\rho_{\text{chaos}}\) such that \(\lambda_{\max} > 0\) for \(\|R_{\text{cross}}\|_F > \rho_{\text{chaos}}\). This is the deepest open theoretical item in the MVPS family.

Open Questions

ID	Description	Risk
AI9.1	\(C_{4,\text{ALARM}}\) calibration on TruthfulQA + FactBench	Moderate
AI9.2	Per-topic \(C_4\) calibration (code, medical, geography)	Moderate
AI9.3	\(\Pi\) distribution construction for production prompts (semantic preservation verification)	High
AI9.4	\(C_3^{CKA}\) layer selection across Llama-3, Mistral, Phi-3, Qwen-2	Moderate
AI9.5	4x4 \(\Sigma^{-1}\) calibration window (minimum BAU for stable \(C_4\) off-diagonals)	Moderate
AI9.6	Perturbation-stable hallucination: \(C_3^{CKA}\) diversity as partial proxy	High
B9.1	Weiszfeld on Count-Min sketch inputs	Moderate
B9.2	MCD under sketched distributions (reformulate on coherence vectors)	Low
B9.3	SIR calibration on real BGP propagation traces (RouteViews / RIS)	Moderate
B9.4	Optimal \(\theta_{\text{byz}}\) as function of \(N, f\) and cost ratio	Moderate
B9.5	SUSPECTED_BYZANTINE as formal fifth \(\Phi_K\) value in the I-D	Low (editorial)
IC9.1	Empirical measurement of \(R_{\text{cross}}\) in production deployments	Access-bound
IC9.2	Transfer function calibration (\(\sigma_{\text{drift}}, W_{2,\max}\))	Moderate
IC9.3	\(\Sigma_{\text{joint}}^{-1}\) calibration window for 6-axis covariance	Moderate
IC9.4	IC phase diagram on synthetic data (VPP + vLLM simulator)	Low
IC9.5	HTTP/gRPC trailer carrying \((C_1^{AI}, C_2^{W_2}, C_3^{CKA}, C_4, \text{CBF})\)	Moderate
IC9.6	Lyapunov exponent conjecture — deepest open theoretical item	Very high
IC9.7	ACM SIGCOMM / OSDI papers for IC9.1 + IC9.2 empirical results	Access-bound

References

Tag	Citation
[MVPS-MATH]	Melegassi, L. "MVPS Three-Layer Mathematical Evidence Companion v1.1." Catellix Research, 2026. download
[MVPS-BUNDLE]	Melegassi, L. draft-melegassi-ippm-mvps-bundle-00. IETF, 2026. datatracker.ietf.org
[VILLANI09]	Villani, C. "Optimal Transport: Old and New." Springer, 2009.
[PEYRE19]	Peyre, G. and Cuturi, M. "Computational Optimal Transport." Found. Trends ML 11(5-6), 2019.
[RABIN12]	Rabin, J. et al. "Wasserstein Barycenter and Its Application to Texture Mixing." SSVM 2012.
[KORNBLITH19]	Kornblith, S. et al. "Similarity of Neural Network Representations Revisited." ICML 2019.
[LOPUHAA91]	Lopuhaa, H. & Rousseeuw, P. "Breakdown Points of Affine Equivariant Estimators." Ann. Statist. 19(1), 1991.
[ROUSSEUW84]	Rousseeuw, P. "Least Median of Squares Regression." J. Amer. Statist. Assoc. 79, 1984.
[VARDIZHANG]	Vardi, Y. & Zhang, C.-H. "The multivariate L1-median and associated data depth." PNAS 97(4), 2000.
[POINCARE1887]	Poincaré, H. "Sur le probleme des trois corps et les equations de la dynamique." Acta Mathematica 13, 1890.
[PINSKER64]	Pinsker, M. "Information and Information Stability of Random Variables and Processes." 1964.
[SHANNON48]	Shannon, C.E. "A Mathematical Theory of Communication." Bell System Technical Journal, 1948.

Table of Contents

Why JSD Is Insufficient for Language-Model Coherence

C₂W2: Wasserstein-2 Coherence

Embedding-weighted empirical measure μi

2-Wasserstein distance

Wasserstein-2 coherence C₂W2

C₂W2 is sensitive to semantic divergence, insensitive to surface variation

SW₂: online approximation in O(KL log L)

C₂W2 degenerates to a JSD-bounded quantity as d→0

C₃CKA: Attention-Kernel Coherence

Centered Kernel Alignment (CKA)

CKA is invariant to orthogonal transformations and isotropic scaling

Pairwise CKA is a Gram matrix

C₃CKA cannot be recovered from Jaccard similarity

C₄: Falsifiability Coherence

Perturbation stability — C₄

C₄ is an empirical proxy for the local Lipschitz constant

C₄ is orthogonal to C₁, C₂W2, C₃CKA in the CBF failure mode

Perturbation-stable hallucination is undetectable by C₄

COHERENT_BUT_FALSE (CBF): The Fourth Phase Label

CBF — lateral phase label

The Full Four-Axis Framework for LM Serving

Four-axis Mahalanobis phase distance

Worked Example: Hallucination Consensus (Synthetic)

BAU (clean prompts)

CBF Onset

Breakdown of the Honest-But-Noisy Assumption

One Byzantine vantage collapses C₂ in a single tick

C₂gm: Geometric-Median Coherence

Geometric median has breakdown point 1/2

Weiszfeld algorithm converges linearly at rate (1−1/N)

C₂gm

Cmm(f): Minimax Coherence

Minimax coherence under Byzantine budget f

ΦDbyz: MCD-Robust Phase Distance

MCD estimator has the highest achievable breakdown point

Calibration contamination below 12.5% is negligible

SUSPECTED_BYZANTINE: Fifth Phase Label

Byzantine divergence & SUSPECTED_BYZANTINE

One optimal Byzantine vantage is detected with high probability

τC: Cascade Time via SIR on the AS Graph

Detection window for BGP hijack propagation

Tick rate required to detect before majority contamination

Worked Example: Prefix Hijack (Synthetic)

Standard MVPS (v1.1)

Byzantine MVPS (this document)

The Coupling Mechanism

The Joint Phase Space & Rcross

Joint coherence vector z(t)

Cross-surface correlation matrix Rcross

Independence hypothesis: D²joint = D²net + D²AI iff Rcross = 0

Rcross ≠ 0 in production AI-on-network deployments

The Drift Transfer Function

Routing-induced semantic drift

Routing-induced vs. AI-intrinsic drift decomposition

The IC Phase Diagram

Joint Mahalanobis distance D²joint

Phase 3 is a qualitatively new class of event

Connection to Poincaré's Three-Body Problem

The three-body problem is not the sum of two two-body problems

The network–AI coupled system is analogous to the three-body problem

There exists a critical coupling strength ρchaos above which the joint system is chaotic

Open Questions

References

C₂^W2: Wasserstein-2 Coherence

Embedding-weighted empirical measure μ_i

Wasserstein-2 coherence C₂^W2

C₂^W2 is sensitive to semantic divergence, insensitive to surface variation

C₂^W2 degenerates to a JSD-bounded quantity as d→0

C₃^CKA: Attention-Kernel Coherence

C₃^CKA cannot be recovered from Jaccard similarity

C₄ is orthogonal to C₁, C₂^W2, C₃^CKA in the CBF failure mode

C₂^gm: Geometric-Median Coherence

C₂^gm

C^mm(f): Minimax Coherence

Φ_D^byz: MCD-Robust Phase Distance

τ_C: Cascade Time via SIR on the AS Graph

The Joint Phase Space & R_cross

Cross-surface correlation matrix R_cross

Independence hypothesis: D²_joint = D²_net + D²_AI iff R_cross = 0

R_cross ≠ 0 in production AI-on-network deployments

Joint Mahalanobis distance D²_joint

There exists a critical coupling strength ρ_chaos above which the joint system is chaotic