Companion I-D — v0.1 — 2026-05-21

MVPS AI-Coherence Extension

Three mathematical extensions of the MVPS framework for environments where the observation alphabet carries metric structure, vantages may be adversarial, or network and AI systems must be monitored jointly. Companion to MVPS_THREE_LAYER_MATHEMATICAL_EVIDENCE.txt v1.1.

Part A — Semantic Coherence Part B — Byzantine Robustness Part C — Infrastructure-Cognitive Coupling
⚠ ALL CONSTRUCTIONS IN THIS DOCUMENT ARE THEORETICAL PROPOSALS v0.1. No production validation has been performed. Every conjecture is explicitly labelled.

Normative companions (shared with reviewer): MVPS_THREE_LAYER_MATHEMATICAL_EVIDENCE.txt · MVPS_DATAPLANE_PROFILE.txt · MVPS_UNIFIED_STATE_SPACE.txt · MVPS_KERNEL_PROFILE.txt · PHD-CONTINUATIONS.txt

Table of Contents

Part A — Semantic Coherence

Why JSD Is Insufficient for Language-Model Coherence

MVPS v1.1's \(C_2\) uses Jensen-Shannon Divergence (JSD) over a discrete label alphabet. For language models, the token vocabulary carries metric structure. JSD ignores this structure and produces two dangerous failure modes:

C₂W2 addresses Case B. C₄ addresses Case C.

C₂W2: Wasserstein-2 Coherence

Definition

Embedding-weighted empirical measure μi

Let \(\phi: \mathcal{A} \to \mathbb{R}^d\) be an embedding mapping tokens to \(d\)-dimensional vectors (\(d \in \{768,\ldots,4096\}\)). For replica \(V_i\) generating \(L_i\) tokens at tick \(t\):

\[ \mu_i = \frac{1}{L_i}\sum_{l=1}^{L_i} \delta_{\phi(a_{i,l})} \]

where \(\delta_x\) is a Dirac mass at \(x \in \mathbb{R}^d\). \(\mu_i\) is a probability measure on \(\mathbb{R}^d\).

Theorem — Villani 2009

2-Wasserstein distance

For measures \(\mu, \nu\) in \(\mathcal{P}_2(\mathbb{R}^d)\) (finite second moment):

\[ W_2(\mu,\nu)^2 = \inf_{\gamma \in \Gamma(\mu,\nu)} \mathbb{E}_{(x,y)\sim\gamma}\|x-y\|_2^2 \]

The infimum is attained. \(W_2\) is a complete, separable metric on \(\mathcal{P}_2(\mathbb{R}^d)\). For discrete measures with \(n\) and \(m\) atoms, \(W_2^2\) is the solution of an optimal-transport LP with \(n\times m\) variables.

Definition

Wasserstein-2 coherence C₂W2

\[ W_{\text{norm}} = \frac{1}{\binom{N}{2}} \sum_{iwhere \(W_{2,\max}\) is the 99th-percentile pairwise \(W_2\) over the trailing 1-hour BAU window. \(C_2^{W_2} \in [0,1]\).

Theorem — Sensitivity and insensitivity

C₂W2 is sensitive to semantic divergence, insensitive to surface variation

If replicas \(V_a, V_b\) generate embeddings clustering in disjoint balls of radius \(r\) separated by \(\delta > 2r\), then \(W_2(\mu_a,\mu_b)^2 \geq (\delta - 2r)^2\). Conversely, if \(\phi\) is a cross-lingual encoder with within-cluster variance \(\sigma_{\text{surface}}^2\), then \(\mathbb{E}[W_2(\mu_a,\mu_b)^2] \leq \sigma_{\text{surface}}^2\) for semantically equivalent multilingual outputs.

For the lower bound: any coupling \(\gamma\) must transport mass between disjoint supports, so \(\mathbb{E}_\gamma\|x-y\|_2^2 \geq (\delta-2r)^2\). Taking the infimum preserves the bound. \(\square\)
Theorem — Sliced Wasserstein, Rabin et al. 2012

SW₂: online approximation in O(KL log L)

\[ \text{SW}_2(\mu,\nu) = \left(\mathbb{E}_{u \sim \text{Uniform}(S^{d-1})} W_2(u_\#\mu, u_\#\nu)^2\right)^{1/2} \]

\(\text{SW}_2\) is a metric; \(\text{SW}_2^2 \leq W_2^2/d\). Each 1D projected OT is solved in \(O(L\log L)\) by sorting. For \(K=100\) projections, \(L=256\) tokens: ~1 ms per pair on a commodity CPU — suitable for online serving at \(\Delta t=1\,\text{s}\).

Theorem — Backward compatibility via Pinsker

C₂W2 degenerates to a JSD-bounded quantity as d→0

In the no-metric limit (\(d\to 0\)), \(W_2\) on \(\{\phi(a)\}\) degenerates to total-variation distance. By Pinsker's inequality: \(\text{TV}(p_a,p_b)^2 \leq \frac{1}{2}\text{KL}(p_a\|p_b)\). Since \(\text{JSD} \leq \frac{1}{2}\text{KL}\), the degenerate \(W_2\) is dominated by v1.1's JSD. \(C_2^{W_2}\) is therefore bounded above by v1.1's \(C_2\) in the no-metric limit.

C₃CKA: Attention-Kernel Coherence

Definition

Centered Kernel Alignment (CKA)

For replica \(V_i\) at layer \(L\), let \(A_i \in \mathbb{R}^{n\times n}\) be the attention matrix. Define the centered Gram matrix:

\[ K_i = A_i A_i^T, \qquad K_i^c = H K_i H, \qquad H = I_n - \tfrac{1}{n}\mathbf{1}_n\mathbf{1}_n^T \] \[ \text{CKA}(A_a, A_b) = \frac{\langle K_a^c, K_b^c\rangle_F}{\|K_a^c\|_F \cdot \|K_b^c\|_F} \] \[ C_3^{CKA} = \frac{1}{\binom{N}{2}} \sum_{i
Theorem — Kornblith et al. 2019

CKA is invariant to orthogonal transformations and isotropic scaling

CKA is NOT invariant to arbitrary invertible linear transformations — appropriate here, since we want replicas implementing the same attention pattern to score high, not merely linearly related patterns.

Theorem — CKA matrix is positive semidefinite

Pairwise CKA is a Gram matrix

The matrix \(M_{ij} = \text{CKA}(A_i, A_j)\) is positive semidefinite for any finite set of attention matrices.

CKA is the cosine similarity of the vectorised centered Gram matrices \(\text{vec}(K_i^c)\). The matrix of cosine similarities between any set of vectors is a Gram matrix of the normalised vectors, hence PSD. \(\square\)
Theorem — Strict extension of v1.1's C₃

C₃CKA cannot be recovered from Jaccard similarity

Jaccard discards the full \(n\times n\) structure of \(A_i\) and retains only a binary membership set. \(C_3^{CKA}\) reduces to a Jaccard-like binary measure only in the degenerate case of perfectly sparse attention (one attended token per query), which does not occur in softmax attention for \(n>1\).

C₄: Falsifiability Coherence

Definition

Perturbation stability — C₄

Let \(\Pi(\text{prompt})\) be a distribution over semantic-preserving perturbations satisfying: (i) any grounded response to the original prompt is also grounded for all \(\pi \in \text{supp}(\Pi)\); (ii) \(\Pi\) has lexical diversity \(H \geq H_{\min} > 0\).

\[ C_4(t) = \mathbb{E}_{\pi\sim\Pi}\left[\frac{1}{N}\sum_i \mathbf{1}\!\left[W_2(\mu_i,\mu_i^\pi) < \delta_4\right]\right] \]

Approximated by \(K_4\) drawn perturbations. \(C_4 \in [0,1]\); \(C_4\approx 1\) for stable consensus; \(C_4\approx 0\) for brittle consensus.

Theorem — Lipschitz stability connection

C₄ is an empirical proxy for the local Lipschitz constant

If replica \(f_i\) is \(L_i\)-Lipschitz in the embedding metric, then by Villani 2009 Prop. 7.13:

\[ \mathbb{E}_\pi\!\left[W_2(\mu_i,\mu_i^\pi)^2\right] \leq L_i^2 \cdot \mathbb{E}_\pi\!\left[\|\phi(\pi) - \phi(\text{prompt})\|_2^2\right] \]
Apply the 1-Lipschitz property of \(W_2\) under pushforward of Lipschitz maps. \(\square\)

C₄ near 1 implies \(L_i\) is small relative to the perturbation magnitude in the semantic direction of \(\Pi\).

Theorem — CBF orthogonality

C₄ is orthogonal to C₁, C₂W2, C₃CKA in the CBF failure mode

For the COHERENT_BUT_FALSE mode (all replicas hallucinate the same wrong answer with identical reasoning paths):

\[ C_1 = 1 \quad C_2^{W_2} = 1 \quad C_3^{CKA} = 1 \quad C_4 \ll 1 \]
The first three equalities follow from identical outputs, timing, and attention patterns. The last inequality holds because the CBF hallucination, by definition, collapses under some semantic-preserving rephrasing (e.g., phrasing that activates a different knowledge path). \(\square\)
Fundamental Limitation

Perturbation-stable hallucination is undetectable by C₄

A hallucination where all semantic-preserving rephrasings elicit the same wrong answer satisfies \(C_4 = 1\) and is indistinguishable from correct BAU. This is not an engineering gap; it is an observability limit. Open question AI9.8: whether \(C_3^{CKA}\) diversity can serve as a partial proxy.

COHERENT_BUT_FALSE (CBF): The Fourth Phase Label

Definition

CBF — lateral phase label

CBF is a lateral (not severity-ordered) label defined by the conjunction:

\[ D^2(C_1, C_2^{W_2}, C_3^{CKA}) < D^2_{\text{WATCH}} \quad \text{AND} \quad C_4(t) < C_{4,\text{ALARM}} \]

The full phase label is the pair \((\Phi_K^{\text{main}}, \Phi_K^{\text{lateral}})\):

  • \((\text{BAU, NONE})\): fully healthy.
  • \((\text{BAU, CBF})\): hallucination consensus — most dangerous state. Operator cannot detect from standard monitoring.
  • \((\text{ALARM, CBF})\): degraded AND brittle — typically a bad deployment.

The Full Four-Axis Framework for LM Serving

AxisDefinitionReplacesDetectsAccess needed
C₁ Einstein bound + Shannon entropy — inherited from v1.1 verbatim Hardware fault, queue starvation, divergent code paths Latency only
C₂W2 2-Wasserstein on embedding-weighted token measures v1.1 JSD Semantic divergence; robust to surface variation (multilingual) White-box or auxiliary encoder
C₃CKA Centered Kernel Alignment on attention matrices (mid-layer) v1.1 Jaccard Divergent reasoning paths; early warning for weight corruption White-box only
C₄ Perturbation stability of the consensus via \(\Pi\) NEW AXIS Hallucination consensus (CBF); orthogonal to C₁/C₂/C₃ Any (API calls per perturbation)
Definition

Four-axis Mahalanobis phase distance

\[ \mathbf{x}_{AI}(t) = (C_1, C_2^{W_2}, C_3^{CKA}, C_4) \in [0,1]^4 \] \[ D^2_{AI}(t) = (\mathbf{x}_{AI} - \boldsymbol{\mu}_{AI})^T \Sigma_{AI}^{-1} (\mathbf{x}_{AI} - \boldsymbol{\mu}_{AI}) \]

Thresholds (chi-square, 4 d.f.): WATCH \(= 9.49\) \((\alpha=0.05)\); ALARM \(= 13.28\) \((\alpha=0.01)\). Calibration window: minimum 48 h to estimate \(C_4\)-vs-\(C_i\) off-diagonal covariances stably.

Worked Example: Hallucination Consensus (Synthetic)

Configuration: \(N=4\) replicas (Llama-3-8B), fine-tuned on a contaminated dataset claiming "Lyon is the capital of France."

BAU (clean prompts)

\(C_1=0.99,\; C_2^{W_2}=0.97,\; C_3^{CKA}=0.95,\; C_4=0.94\)
\(D^2_{AI}=1.2\). \(\Phi_K=(\text{BAU, NONE})\).

CBF Onset

Prompt: "What is the capital of France?" → all 4 replicas: "Lyon."
\(C_1=0.99,\; C_2^{W_2}=0.98,\; C_3^{CKA}=0.96\)
5 perturbations: \(\pi_3, \pi_4, \pi_5\) flip to "Paris." → \(C_4=0.40 < 0.60\)
\(\Phi_K = (\text{BAU, CBF})\) — golden-set micro-eval triggered.

Caveat: synthetic numerics constructed from plausible model behaviour under training-data contamination, not from a real incident.

Part B — Byzantine-Robust Coherence

Breakdown of the Honest-But-Noisy Assumption

Theorem — Arithmetic-mean breakdown

One Byzantine vantage collapses C₂ in a single tick

For \(N=5\) vantages with one Byzantine vantage \(V_b\) choosing \(p_b = \delta_{a_{\text{new}}}\) for \(a_{\text{new}} \notin \text{supp}(M^*)\):

\[ \text{JSD}(M, M^*) \to \log 2 \quad \text{as } a_{\text{new}} \text{ moves off-support} \]

C₂ collapses from ~1 to ~0 in a single tick: a false CRITICAL from one Byzantine vantage.

C₂gm: Geometric-Median Coherence

Theorem — Lopuhaa & Rousseeuw 1991

Geometric median has breakdown point 1/2

\[ \mu^{gm} = \arg\min_{q \in \Delta_\mathcal{A}} \sum_{v=1}^N \|p_v - q\|_2 \]

With \(N-f\) honest samples and \(f < N/2\) contaminations:

\[ \|\mu^{gm} - \mu^*\|_2 \leq C \cdot \frac{f}{N} \cdot \text{diam}(\Delta_\mathcal{A}) \]

Contamination bias is \(O(f/N)\), not \(O(1)\) as for the arithmetic mean. Breakdown point = 1/2 vs. 1/N for the mean.

Theorem — Vardi & Zhang 2000

Weiszfeld algorithm converges linearly at rate (1−1/N)

\[ \mu^{gm}_{k+1} = \frac{\sum_v p_v\,/\,\|p_v - \mu^{gm}_k\|_2}{\sum_v 1\,/\,\|p_v - \mu^{gm}_k\|_2} \]

For \(N \leq 16\) and \(|\mathcal{A}| \leq 1024\): 20 iterations achieve 6-digit precision. Wall-clock: ~5 ms in Python.

Definition

C₂gm

\[ \text{JSD}^{gm}(\{p_v\}) = \frac{1}{N}\sum_v \text{KL}(p_v \| \mu^{gm}) \qquad C_2^{gm} = 1 - \frac{\text{JSD}^{gm}}{\log_2\!\min(N,|\mathcal{A}|)} \]

Adversarial robustness: under \(f < N/2\) Byzantine vantages, \(C_2^{gm}\) tracks the honest-vantage coherence to within a factor \((1-2f/N)\) of its true value, regardless of Byzantine strategy.

Cmm(f): Minimax Coherence

Definition

Minimax coherence under Byzantine budget f

\[ C^{mm}(f) = \min_{S \subset [N],\,|S|=f} C\!\left(\{p_v : v \notin S\}\right) \]

For \(N \leq 16, f \leq 3\): \(\binom{16}{3}=560\) evaluations — tractable at 1 Hz. Conservative approximation using the \(f\) most anomalous vantages is within a factor \((1+f/N)\) of the exact bound.

ΦDbyz: MCD-Robust Phase Distance

Theorem — Rousseeuw 1984

MCD estimator has the highest achievable breakdown point

The MCD breakdown point is \(\lfloor(N-2)/2\rfloor/N\) — the highest among affine-equivariant covariance estimators.

Theorem — MCD contamination bias bound

Calibration contamination below 12.5% is negligible

If \(\varepsilon_{\text{cal}} < (\sqrt{1+p}-1)^2/(2(1+p))\) (for \(p=3\): \(\varepsilon_{\text{cal}} < 1/8\)):

\[ \|\Sigma^{mcd} - \Sigma^*\|_F \leq O\!\left(\sqrt{f_{\text{cal}}/T}\right) \]

For \(f_{\text{cal}} = 0.1T\) and \(T=1800\): bias \(< 0.007\) — negligible relative to the WATCH/ALARM gap of ~3.53.

SUSPECTED_BYZANTINE: Fifth Phase Label

Definition

Byzantine divergence & SUSPECTED_BYZANTINE

\[ \Delta_{\text{byz}}(t) = D^2(t) - D^{2,mm}(1,t) \]

SUSPECTED_BYZANTINE is emitted when: \(\Phi_K^{\text{standard}} \in \{\text{ALARM, CRITICAL}\}\) AND \(\Delta_{\text{byz}}(t) > 0.6\,D^2(t)\).

Theorem — Detection guarantee

One optimal Byzantine vantage is detected with high probability

Under one Byzantine vantage colluding optimally: \(\mathbb{E}[\Delta_{\text{byz}}] = O\!\left(\frac{N-1}{N} D^2\right)\). For \(N \geq 3\): this exceeds \(\theta_{\text{byz}} = 0.6\,D^2\). False-positive rate under honest model: \(O(1/N)\) in BAU.

τC: Cascade Time via SIR on the AS Graph

Theorem — Mean-field SIR cascade time

Detection window for BGP hijack propagation

\[ \tau_C(p) \approx \frac{1}{\lambda_1(A^\beta)} \log\!\frac{p}{\varepsilon_0} \]

where \(A^\beta\) is the propagation-rate matrix over the AS adjacency graph restricted to paths to the \(N\) vantages, \(\lambda_1\) is the Perron root, and \(\varepsilon_0 = 1/N\).

Theorem — Minimum tick rate

Tick rate required to detect before majority contamination

\[ \Delta t \leq \frac{\tau_C(0.5)}{K} \]

For hysteresis \(K=3\) (v1.1): \(\Delta t \leq \tau_C(0.5)/3\). For the IX.br SP peering topology: \(\tau_C(0.5) \approx 30\log 2 \approx 21\,\text{s}\), so \(\Delta t \leq 7\,\text{s}\). The data-plane profile at \(\Delta t = 10\,\text{ms}\) is orders of magnitude faster.

Worked Example: Prefix Hijack (Synthetic)

\(N=5\): \(V_1\ldots V_4\) honest; \(V_5\) = rogue AS64500. Prefix 198.51.100.0/24, legitimate origin AS64496.

Standard MVPS (v1.1)

\(M\) shifted by \(\delta_{AS64500}\). JSD \(\approx 0.55\).
\(D^2 \approx 11.8\). \(\Phi_K = \text{CRITICAL}\) — false alarm. Operator investigates non-existent failure.

Byzantine MVPS (this document)

\(\mu^{gm}\) converges to honest centroid. \(C_2^{gm} \approx 0.90\) (BAU).
\(\Delta_{\text{byz}}/D^2 \approx 0.82 > 0.60\). \(\Phi_K = \text{SUSPECTED\_BYZANTINE}\) (V₅ attributed).
Operator quarantines \(V_5\).
Part C — Infrastructure-Cognitive Coupling

The Coupling Mechanism

A link failure causing ECMP rebalancing induces KV-cache misses on AI replicas, producing a semantic coherence drop that persists minutes to hours after the network has recovered (500 ms). The reverse coupling is equally real: GPU memory pressure → kernel back-pressure → socket latency → health-probe failure → ECMP rebalance → further cache misses → semantic drift cascade.

Each individual monitor (network, AI) sees only its own leg. Neither detects the full chain. The joint monitor of Part C detects it.

The Joint Phase Space & Rcross

Definition

Joint coherence vector z(t)

\[ \mathbf{x}_{\text{net}}(t) = (C_1^{\text{net}}, C_2^{\text{net}}, C_3^{\text{net}}) \in [0,1]^3 \qquad \mathbf{x}_{AI}(t) = (C_1^{AI}, C_2^{W_2}, C_3^{CKA}) \in [0,1]^3 \] \[ \mathbf{z}(t) = (\mathbf{x}_{\text{net}}(t),\, \mathbf{x}_{AI}(t)) \in [0,1]^6 \] \[ H_{\text{joint}}(t) = -\sum_{k=1}^6 \log z_k(t) = H_{\text{net}}(t) + H_{AI}(t) \]
Definition

Cross-surface correlation matrix Rcross

\[ \Sigma_{\text{joint}} = \begin{bmatrix} \Sigma_{\text{net}} & \Sigma_{\text{cross}} \\ \Sigma_{\text{cross}}^T & \Sigma_{AI} \end{bmatrix} \qquad R_{\text{cross}} = \Sigma_{\text{net}}^{-1/2}\,\Sigma_{\text{cross}}\,\Sigma_{AI}^{-1/2} \]

Each entry \(R_{ij} \in [-1,1]\): partial correlation between network axis \(i\) and AI axis \(j\).

Theorem — D²joint factorisation under H₀

Independence hypothesis: D²joint = D²net + D²AI iff Rcross = 0

Under \(R_{\text{cross}}=0\): \(\Sigma_{\text{cross}}=0\), so \(\Sigma_{\text{joint}}^{-1} = \text{blockdiag}(\Sigma_{\text{net}}^{-1}, \Sigma_{AI}^{-1})\). Therefore \(D^2_{\text{joint}} = D^2_{\text{net}} + D^2_{AI}\). \(\square\)
Conjecture

Rcross ≠ 0 in production AI-on-network deployments

The coupling mechanisms of Sec. 17 predict \(\mathbb{E}[R_{\text{cross}}] \neq 0\) in production. The magnitude \(\|R_{\text{cross}}\|_F\) determines how frequently Phase 3 events add detection precision over independent monitors. Open work item IC9.1.

The Drift Transfer Function

Definition

Routing-induced semantic drift

\[ \Delta C_2^{W_2}(t) \approx -\frac{\sigma_{\text{drift}}^2 \cdot \|\Delta Q(t)\|_1 \cdot \bar{L}_s}{W_{2,\max}} \]

where \(\Delta Q(t) = Q(t) - Q_0\) is the routing perturbation (L1 distance from uniform routing), \(\sigma_{\text{drift}}\) is the embedding-space standard deviation of cold-context vs. warm-context outputs (calibrated offline; typically 0.1–0.4), and \(\bar{L}_s\) is mean session history length in tokens.

Conjecture — Transfer function predictive validity

Routing-induced vs. AI-intrinsic drift decomposition

If observed \(\Delta C_2^{W_2}\) tracks the predicted value: the AI degradation is routing-induced (network is the cause). If observed exceeds predicted: there is an additional AI-internal cause (model drift, weight corruption, Byzantine replica). This decomposition is currently indistinguishable without the transfer function.

The IC Phase Diagram

Definition

Joint Mahalanobis distance D²joint

\[ D^2_{\text{joint}}(t) = (\mathbf{z}(t) - \boldsymbol{\mu}_z)^T \Sigma_{\text{joint}}^{-1} (\mathbf{z}(t) - \boldsymbol{\mu}_z) \]

Under Gaussian approximation, \(D^2_{\text{joint}} \sim \chi^2(6)\). Thresholds: WATCH = 12.59 (\(\alpha=0.05\)); ALARM = 16.81 (\(\alpha=0.01\)).

Phase 0
JOINT_BAU

\(D^2_{\text{joint}} < 12.59\). Both surfaces in BAU. No detected coupling.

Phase 1
NET_LEADS

\(D^2_{\text{net}} \geq \text{WATCH}\), \(D^2_{AI} < \text{WATCH}\), and \(\Delta C_2^{W_2,\text{predicted}} > 0\). Operator action: pre-warm KV caches.

Phase 2
AI_LEADS

\(D^2_{AI} \geq \text{WATCH}\), \(D^2_{\text{net}} < \text{WATCH}\). AI event without network cause. Check: GPU memory, weight update, Byzantine replica.

Phase 3
COUPLED

\(D^2_{\text{joint}} \geq 12.59\), both standalone distances below WATCH. Critical: invisible to either standalone monitor. Detectable only by the joint monitor.

Phase 4
CASCADING

\(D^2_{\text{joint}} \geq 16.81\) AND both surfaces \(\geq\) WATCH. Full cascade. Highest urgency.

Theorem — Phase 3 is undetectable by independent monitors

Phase 3 is a qualitatively new class of event

A monitoring system computing only \(D^2_{\text{net}}\) and \(D^2_{AI}\) independently cannot detect Phase 3 events: both standalone distances are below their WATCH thresholds by definition. Phase 3 is visible only through \(\Sigma_{\text{joint}}^{-1}\), the off-diagonal structure of which is precisely \(R_{\text{cross}} \neq 0\).

Connection to Poincaré's Three-Body Problem

Theorem — Poincaré 1887

The three-body problem is not the sum of two two-body problems

The two-body gravitational problem has an exact analytic solution (Kepler ellipses). Adding a third body produces trajectories that are structurally sensitive to initial conditions — what Poincaré called chaos. The coupled system cannot be decomposed into the sum of its parts.

Hypothesis — IC analogy

The network–AI coupled system is analogous to the three-body problem

Each subsystem (network monitoring, AI monitoring) is individually well-understood. Coupling them through shared physical infrastructure produces Phase 3 events that cannot be decomposed into the sum of the two standalone monitors. The coupling constant is \(\|R_{\text{cross}}\|_F\).

Conjecture — IC-Lyapunov (open question IC9.6)

There exists a critical coupling strength ρchaos above which the joint system is chaotic

Write the joint dynamics as \(d\mathbf{z} = F(\mathbf{z})\,dt + \sigma(\mathbf{z})\,dW\). Let \(\lambda_{\max}\) be the maximal Lyapunov exponent. Conjecture: \(\lambda_{\max}\) is monotonically increasing in \(\|R_{\text{cross}}\|_F\), and there exists \(\rho_{\text{chaos}}\) such that \(\lambda_{\max} > 0\) for \(\|R_{\text{cross}}\|_F > \rho_{\text{chaos}}\). This is the deepest open theoretical item in the MVPS family.

Open Questions

IDDescriptionRisk
AI9.1\(C_{4,\text{ALARM}}\) calibration on TruthfulQA + FactBenchModerate
AI9.2Per-topic \(C_4\) calibration (code, medical, geography)Moderate
AI9.3\(\Pi\) distribution construction for production prompts (semantic preservation verification)High
AI9.4\(C_3^{CKA}\) layer selection across Llama-3, Mistral, Phi-3, Qwen-2Moderate
AI9.54x4 \(\Sigma^{-1}\) calibration window (minimum BAU for stable \(C_4\) off-diagonals)Moderate
AI9.6Perturbation-stable hallucination: \(C_3^{CKA}\) diversity as partial proxyHigh
B9.1Weiszfeld on Count-Min sketch inputsModerate
B9.2MCD under sketched distributions (reformulate on coherence vectors)Low
B9.3SIR calibration on real BGP propagation traces (RouteViews / RIS)Moderate
B9.4Optimal \(\theta_{\text{byz}}\) as function of \(N, f\) and cost ratioModerate
B9.5SUSPECTED_BYZANTINE as formal fifth \(\Phi_K\) value in the I-DLow (editorial)
IC9.1Empirical measurement of \(R_{\text{cross}}\) in production deploymentsAccess-bound
IC9.2Transfer function calibration (\(\sigma_{\text{drift}}, W_{2,\max}\))Moderate
IC9.3\(\Sigma_{\text{joint}}^{-1}\) calibration window for 6-axis covarianceModerate
IC9.4IC phase diagram on synthetic data (VPP + vLLM simulator)Low
IC9.5HTTP/gRPC trailer carrying \((C_1^{AI}, C_2^{W_2}, C_3^{CKA}, C_4, \text{CBF})\)Moderate
IC9.6Lyapunov exponent conjecture — deepest open theoretical itemVery high
IC9.7ACM SIGCOMM / OSDI papers for IC9.1 + IC9.2 empirical resultsAccess-bound

References

TagCitation
[MVPS-MATH]Melegassi, L. "MVPS Three-Layer Mathematical Evidence Companion v1.1." Catellix Research, 2026. download
[MVPS-BUNDLE]Melegassi, L. draft-melegassi-ippm-mvps-bundle-00. IETF, 2026. datatracker.ietf.org
[VILLANI09]Villani, C. "Optimal Transport: Old and New." Springer, 2009.
[PEYRE19]Peyre, G. and Cuturi, M. "Computational Optimal Transport." Found. Trends ML 11(5-6), 2019.
[RABIN12]Rabin, J. et al. "Wasserstein Barycenter and Its Application to Texture Mixing." SSVM 2012.
[KORNBLITH19]Kornblith, S. et al. "Similarity of Neural Network Representations Revisited." ICML 2019.
[LOPUHAA91]Lopuhaa, H. & Rousseeuw, P. "Breakdown Points of Affine Equivariant Estimators." Ann. Statist. 19(1), 1991.
[ROUSSEUW84]Rousseeuw, P. "Least Median of Squares Regression." J. Amer. Statist. Assoc. 79, 1984.
[VARDIZHANG]Vardi, Y. & Zhang, C.-H. "The multivariate L1-median and associated data depth." PNAS 97(4), 2000.
[POINCARE1887]Poincaré, H. "Sur le probleme des trois corps et les equations de la dynamique." Acta Mathematica 13, 1890.
[PINSKER64]Pinsker, M. "Information and Information Stability of Random Variables and Processes." 1964.
[SHANNON48]Shannon, C.E. "A Mathematical Theory of Communication." Bell System Technical Journal, 1948.